Event Driven Systems on AWS

Craig Tweedy -

CTO
#aws#engineering#events

The world today is ever more connected, and product development along with it. With the rising popularity of methodologies such as the JAMStack, increasing complexity of our data sets, and platforms which span multiple concurrent devices such as IoT, we're connecting far more systems to each other than ever before. We're moving ever further away from the feasibility of having all tasks happen in the same request, so how do we handle this?

One of the more popular options, and a technique we're big on at Dines, is to move to an events driven system - one which pushes as much processing away from the consumer as possible, allowing other systems to handle processing at their leisure. This technique forms the foundation of many growing products and platforms, providing many benefits such as:

  • Improved UX as the consumer is not waiting for all processes to complete

  • Potential cost reductions and scalability as processing is moved to more efficient services

  • The ability to use best in class services for the job at hand

We use event systems to drive many of our behind the scenes processes at Dines, such as email receipts, referral management, and automation of our discount schedules. The key thing here is that none of these processes need to be synchronous, meaning we don't have to have them happen immediately to our requests to be successful.

The Environment

Thankfully, we no longer have to worry about scaling up a fleet of servers, installing all the relevant tools and libraries, configuring networks and worrying about managing the entire system - cloud services such as AWS or Google Cloud now help us immeasurably by abstracting all of that pain away, leaving us with simple services and connectors in order to fulfil these outcomes. We use AWS here at Dines, so we'll walk through how we accomplished this.

The Tools

We've got three main components in our events system - SNS, SQS, and Lambda.

SNS

SNS is AWS's subscription and publication system, which has a myriad of uses. Before moving to an events driven system, we had two use cases with it - push notifications and email alerts for events from CloudWatch. Now we use it for a third reason, publishing events from our main platform, which will fan out to the rest of our events system. This is the entry point for the rest of our processes.

Events are sent to SNS topics, and consumers can subscribe to topics. A topic is essentially a grouping mechanism, allowing a single event to hit multiple subscribes at once.

SQS

AWS's queuing system, which is quietly powerful and absolutely vital for handling our events. SQS can ensure we don't overload any handlers with concurrency limits, can handle retries with its redrive policy, and can push processes that can't be handled into a dead letter queue, to allow developers to understand why tasks couldn't be completed.

SQS can subscribe directly to SNS events, allowing us to take those events and start to schedule them to be worked on via SQS.

Lambda

The popular function-as-a-service tool provided by AWS, where the work actually gets done. Lambda can poll SQS and start to pull events out of it, allowing us to start working on all the events that are getting pushed into our queues. Here we can do all of the processing we've been building up too - whether that's processing analytics for orders that come through, updating our search clusters with new data, or sending emails.

At this point we're so far detached from the core system that first sent the event, that the consumer has likely already moved on. We've already told the consumer that all is well, and they don't have to wait for us to process all this information - an added benefit for UX and performance.

The Setup

Let's look at how this connects together in the simplest sense.

Here, we our have our main server publishing an event out, so what happens next?

  • The event is sent to an SNS topic

  • An SQS queue, which has been configured to subscribe to the SNS topic, takes that event and pushes it into its queue.

  • A Lambda function configured to listen to the queue pulls the item from the queue

  • The Lambda function handles the event, doing whatever task it was defined to do with the event.

By leveraging AWS, we can connect all of these services easily, especially with a tool like CloudFormation or Terraform, so we don't do it manually or worry too much about networks and access control (make sure you've set up your IAM roles appropriately however...), and we end up with a great foundation for further work.

Going further

So what if you want to start pushing more events out? Well, you could start publishing to multiple topics for different events:

Here we will end up with an SNS topic per event, which allows us to start handling all our our events completely independently from each other, with their own queues and Lambda tasks.

However, we can use some interesting features of SNS to potentially clean this up a bit.

SNS Filter Policies

SNS allows us to subscribe to a topic which means we'll start getting all events come through to our handler, however, if we only care about certain events we can set up a filter policy. A filter policy allows the subscriber to tell SNS that they only want to be notified when certain events happen, ignoring all other events that hit the topic.

For example, we may have a system set up which sends events such as:

  • restaurant.created

  • restaurant.updated

  • bill.paid

In the standard setup, if we send all of these events to the same SNS topic then we'll only be able to differentiate between the events once we get to the Lambda function, which will increase the complexity of this function as it attempts to handle an ever increasing amount of events. With filter policies, we can resolve this issue.

When we set up our SQS queue and set up a subscription, we can alert the subscription that we only care about a certain event, such as bill.paid. This allows us to send all events to the same topic, but have our queues only react to certain events:

Here, you can see that we only use a single SNS topic, but we can still fan out into multiple queues, allowing for simpler queues and Lambda tasks which are only handling the events they care about. There are multiple types of filter policies available, so try out which is most appropriate for your use case.

Finally

From this foundation, you end up with an incredibly powerful pattern which can drive your own events in your products. However, with all things, be careful to consider your own requirements - for example in this pattern you may not end up with events being delivered in chronological order, and at very high rates, SNS can start to get expensive (although it offers a generous free tier).

Of course, SNS isn't the only way to handle events from your systems - for more real time systems, you may want to check out AWS Kinesis Firehose or even tooling from AWS IoT.

If you're interested in learning more, I highly recommend checking out the AWS Serverless Application Model, and specifically the Serverless Application Repository, which have come to be invaluable tools. As well as this, we heavily use Serverless Framework in combination with CloudFormation to help us manage these projects, which lets us get from zero to production project incredibly easily.