How To: S3 Multi-Region Replication (And Why You Should Care)

If you’re familiar with the idea of multi-region replication, feel free to skip to the Overview section. If you don’t know what multi-region replication is, why it’s important, or aren’t convinced that it is, I’d like you to imagine you’ve just sat down to breakfast in a small cafe. You’ve had a long night and your body is craving some refined sugar, so you decide to order a stack of toast (obvs).

After what seems like ages, your waiter finally returns with the promised mountain of carbs. You groggily reach for a slice, and in a moment of awe, realize that the toast has a depiction of a pug emblazoned on it. A PUG! You nearly collapse into tears of joy – if only you could bring this bliss to other toast lovers across the world! But wait! You remember that once upon a time you spent 4 years in an ivy league university learning the complex inner workings of all things computer.

Time to put that CS degree to work!

Well, first things first, you’re gonna need a place to store all your crispy carb creations. You start to look around at different offerings to see who can host the most toast.

Instagram is a non-starter. It’s saturated with exorbitantly decorated, almost unrecognizable slices of bread, and you can’t have your purist toast creations mingling around with those millennial abominations. I mean, why even is avocado toast? So next, you consider an already existent photo hosting company. But blackmail really isn’t your jam, so you keep searching.

Then finally, after much more glazing around, you discover Amazon’s S3 offering. Unlimited storage? Well defined APIs? And you can serve up your photos with a pluggable CDN? It doesn’t get much butter than that.

So finally, your photos have a home and you start spreading delicious, starchy joy to all corners of the earth. But then, the unthinkable happens! An engineer commits a typo! The seemingly innocuous mistake brings down your toast storage, and also some other, but much less important, crumbs of the internet. Luckily you’ve set up your CDN cache, so your users are still able to access some batches of your toast. However, they’re unable to see any non-cached toast, and you’re unable to upload any fresh carbs.

In order to prevent this lack of toasty goodness in the future, you decide to create a backup bucket for your photos in a different region. You read about Amazon’s cross-region replication functionality and implement a replicated bucket accordingly.

Everything seems to be going well with your new backup bucket. Your user base continues to grow, and soon you’re serving up millions of toast photos a day. However, your CDN can only cache so much data, and your users like variety in their grains. This means that many of your international users are getting cache misses, and are having to wheat for toast from your s3 bucket to travel across the world to them. You now realize that you knead to replicate your toast across multiple, international regions to ensure that any cache misses will only have to travel as far as the nearest international bucket. But as you rye to set up more replicas, you realize that you can’t daisy chain replications or specify more than a single bucket in the native replication configuration.

So what do you dough?

Overview

You roll your own replication! (I promise I’m done with the puns now). By combining Amazon’s S3, SNS, SQS and Lambda technologies, we can create our own replica set. An overview of the system is as follows:

Screen Shot 2017-08-30 at 7.32.49 PM

  1. Toast is uploaded to us-east-1 bucket
  2. A “write” event trigger sends the write event to an SNS topic in us-east-1
  3. Lambdas from eu-west-2 and ap-northeast-1 that are subscribed to that topic receive the write event, then copy the initially written object form us-east-1 to their respective buckets
  4. If a write event fails, its’ acting lambda will write the failed event out to an SQS queue

Now, let’s look at each of these pieces in detail. We’re going to examine each of the architectural sections via Amazon’s CloudFormation template syntax. If you’re not familiar with this, go ahead and take a minute to read up on the basics. Most of the template snippets are pretty straight forward, but it never hurts to understand what’s going on in detail.

Additionally, the following information assumes basic knowledge regarding Amazon’s S3, lambda, SQS, SNS, and IAM offerings.

Architecture

S3

ToastHost:
  Type: "AWS::S3::Bucket"
  Properties:
    BucketName: !Join [ "-", [ !Ref UniqueIdentifier, "toasthost", !Ref "AWS::Region" ] ]
    NotificationConfiguration:
    TopicConfigurations:
      - Event: "s3:ObjectCreated:*"
    Topic: !Ref ToastNotificationTopic
  DependsOn:
    - ToastNotificationTopicPolicy

The first thing we need to spin up are the actual S3 buckets that will be hosting our images. Some notes about what we’re doing in this snippet:

  • Bucket names must be unique across all of amazon
    • we include a unique identifier parameter in the cloudformation template for this purpose
  • Bucket names cannot contain uppercase letters, so don’t use them in your unique identifier
  • We’re triggering events only off of ‘ObjectCreated’ events
    • We could trigger events off of any subset of supported events, but not propagating deletes is a safe first step
  • We need to make sure the bucket isn’t created until our Topic is, so we force the bucket to wait on the Topic’s Policy (which is created post-topic as we’ll see later on)

SNS

ToastNotificationTopic:
  Type: "AWS::SNS::Topic"
  Properties:
    DisplayName: "TnT"
    TopicName: !Join [ "-", [ !Ref UniqueIdentifier, "ToastNotificationTopic", !Ref "AWS::Region" ] ]
ToastNotificationTopicPolicy:
  Type: "AWS::SNS::TopicPolicy"
  Properties:
    PolicyDocument: # allow s3 to write to this sns topic
      Id: ToastNotificationTopicPolicy
      Statement:
      - Sid: !Join [ "", [ !Ref UniqueIdentifier, "ToastNotificationTopicPolicy", !Ref "AWS::Region" ] ]
        Effect: Allow
        Action: SNS:Publish
        Resource: "*"
        Principal:
          AWS: "*"
        Condition:
          ArnLike:
            aws:SourceArn: !Join
              - ""
              - - "arn:aws:s3:*:*:"
                - !Join [ "-", [ !Ref UniqueIdentifier, "toasthost", !Ref "AWS::Region" ] ]
                - "*"
    Topics:
      - !Ref ToastNotificationTopic

We now need to create the topic that our write events are sent to. Some notes about what we’re doing in this snippet:

  • Topic display names can only be 10 characters or less, which is why we use the ‘TnT’ (DY-NO-MITE) shorthand
  • The Topic Policy allows the toasthost bucket in the same region as the topic to write events to said topic

SQS

DeadToastQueue:
  Type: "AWS::SQS::Queue"
  Properties:
    QueueName: !Join [ "-", [ !Ref UniqueIdentifier, "DeadToast", !Ref "AWS::Region" ] ]

This one is pretty straight forward. All it does is create an SQS queue with default settings.

Lambda

AWS Lambdas are transient compute units. They are triggered by events (either an action or time event), and spin up compute resources for the duration of running that event through their code. So, for our replication lambda, let’s look at the code snippet first.

Code

import ast
import boto3
import botocore
import os
import urllib
def _get_key_exists(bucket, key):
  try:
    boto3.resource('s3').Object(bucket, key).load()
  except botocore.exceptions.ClientError as e:
    if e.response['Error']['Code'] == "404":
      return False
    else:
      raise e
  return True
def lambda_handler(event, context):
  s3 = boto3.client('s3') 
  sns_message = ast.literal_eval(event['Records'][0]['Sns']['Message'])
  source_bucket = str(sns_message['Records'][0]['s3']['bucket']['name'])
  dest_bucket = os.environ.get('BUCKET_NAME')
  key = str(urllib.unquote_plus(sns_message['Records'][0]['s3']['object']['key']).decode('utf8'))
  if not _get_key_exists(dest_bucket, key):
    copy_source = {'Bucket':source_bucket, 'Key':key}
    s3.copy_object(Bucket=dest_bucket, Key=key, CopySource=copy_source)

This python code block reads in a trigger-event (a bucket write event in our case), evaluates the event to ensure it’s a bucket-based event, then either duplicates or ignores the image depending on if the image already exists in this lambda’s region (to prevent infinite duplication).

Function

ToastReplicator:
  Type: "AWS::Lambda::Function"
  Properties:
    Code:
      ZipFile: |
        
  DeadLetterConfig:
    TargetArn: !GetAtt DeadToastQueue.Arn
  Environment:
    Variables:
      BUCKET_NAME: !Join [ "-", [ !Ref UniqueIdentifier, "toasthost", !Ref "AWS::Region" ] ]
  FunctionName: !Join [ "-", [ !Ref UniqueIdentifier, "ToastNotifier", !Ref "AWS::Region" ] ]
  Handler: "index.lambda_handler"
  Role: !GetAtt ToastReplicatorRole.Arn
  Runtime: python2.7
  MemorySize: 256
  Timeout: 60

This CF section describes the actual lambda function. Some notes about what we’re doing in this snippet:

  • For a lambda function you can either provide in-line code, or provide an s3 location for the lambda to pull the code from. We’ve opted for the in-line code here as it reduces overall complexity
  • The ‘DeadLetterConfig’ directive sets up the event DLQ for this lambda
  • ‘Handler’ refers to the method inside the lambda code block to pass events to
  • ‘MemorySize’ is also directly correlated to compute power: the higher MemorySize is, the higher your compute power will be

Permissions

By default, lambdas have no permissions. This means that we need to explicitly define what our lambda is able to do.

ToastReplicatorRole:
  Type: "AWS::IAM::Role"
  Properties:
    RoleName: !Join [ "-", [ !Ref UniqueIdentifier, "ToastReplicatorRole", !Ref "AWS::Region" ] ]
    AssumeRolePolicyDocument:
    Statement:
      - Sid: !Join [ "", [ !Ref UniqueIdentifier, "ToastReplicatorRolePolicy" ] ]
    Effect: "Allow"
    Principal:
      Service:
        - "lambda.amazonaws.com"
    Action:
      - "sts:AssumeRole"
Policies:

This creates the lambda role, and allows the lambda to assume said role. Now we’ll look at the policies under this role.

PolicyName: "ToastNotificationLoggingPolicy"
PolicyDocument: # allow lambda to write logs
Id: ToastNotificationLoggingPolicy
Statement:
  - Sid: !Join [ "", [ !Ref UniqueIdentifier, "ToastNotificationLoggingPolicy" ] ]
  Effect: Allow
  Action:
    - "logs:CreateLogGroup"
    - "logs:CreateLogStream"
    - "logs:PutLogEvents"
  Resource: "*"

This allows the lambda to write out logging events.

PolicyName: "ToastNotificationDLQPolicy"
PolicyDocument: # allow lambda to write to DLQ
Id: ToastNotificationDLQPolicy
Statement:
  - Sid: !Join [ "", [ !Ref UniqueIdentifier, "ToastNotificationDLQPolicy" ] ]
  Effect: Allow
  Action: "sqs:*"
  Resource: !GetAtt DeadToastQueue.Arn

This allows the lambda to write events out to its’ DLQ.

PolicyName: "ToastHostReplicationPolicy"
PolicyDocument:
Id: ToastHostReplicationPolicy
Statement:
  - Sid: !Join [ "", [ !Ref UniqueIdentifier, "ToastHostReplicationPolicy" ] ]
  Effect: Allow
  Action:
    - "s3:Get*"
    - "s3:List*"
    - "s3:Put*"
  Resource: !Join [ "", [ !GetAtt ToastHost.Arn, "*" ] ]

This allows the lambda to read/write from the s3 host bucket in its’ region.

Subscriptions

All of the CF snippets above create a base stack for each region. Now we need to wire all the pieces together. Note that each of these snippets are meant to be run once for each region that this stack is replicating to. We represent the current region we’re wiring to via a ‘ToRegion’ variable in the CF parameters section.

SNS Lambda Invocation

ToastReplicationPermission:
  Action: "lambda:InvokeFunction"
  FunctionName: !Join
    - ""
      - - "arn:aws:lambda:"
        - !Ref "AWS::Region"
        - ":"
        - !Ref "AWS::AccountId"
        - ":function:"
        - !Join [ "-", [ !Ref UniqueIdentifier, "ToastNotifier", !Ref "AWS::Region" ] ]
  Principal: "sns.amazonaws.com"
  SourceArn: !Join
    - ""
    - - "arn:aws:sns:"
      - !Ref ToRegion
      - ":"
      - !Ref "AWS::AccountId"
      - ":"
      - !Join [ "-", [ !Ref UniqueIdentifier, "ToastNotificationTopic", !Ref ToRegion ] ]

This gives the SNS topic permission to invoke the replication lambdas in the other replication regions.

Remote Toast Host Reads

RemoteToastHostReplicationReadPolicy:
  Properties:
    PolicyName: !Join [ "-", [ !Ref UniqueIdentifier, "RemoteToastHostReplicationReadPolicy", !Ref ToRegion ] ]
    PolicyDocument:
      Id: RemoteToastHostReplicationReadPolicy
      Statement:
        - Sid: "RemoteToastHostReplicationReadPolicy"
      Effect: Allow
      Action:
        - "s3:Get*"
        - "s3:List*"
      Resource: !Join
        - ""
        - - "arn:aws:s3:::"
          - !Join [ "-", [ !Ref UniqueIdentifier, "toasthost", !Ref ToRegion ] ]
          - "*"
      Roles:
        - !Join [ "-", [ !Ref UniqueIdentifier, "ToastReplicatorRole", !Ref "AWS::Region" ] ]

This gives the replication lambda permission to read data out of the s3 buckets in the other replication regions.

SNS Subscription

ToastReplicationNotificationSubscription:
  Type: "AWS::SNS::Subscription"
  Properties:
    Endpoint: !Join
      - ""
      - - "arn:aws:lambda:"
        - !Ref ToRegion
        - ":"
        - !Ref "AWS::AccountId"
        - ":function:"
        - !Join [ "-", [ !Ref UniqueIdentifier, "ToastNotifier", !Ref ToRegion ] ]
    Protocol: "lambda"
    TopicArn: !Join
      - ""
      - - "arn:aws:sns:"
        - !Ref "AWS::Region"
        - ":"
        - !Ref "AWS::AccountId"
        - ":"
        - !Join [ "-", [ !Ref UniqueIdentifier, "ToastNotificationTopic", !Ref "AWS::Region" ] ]
  DependsOn:
    - ToastReplicationPermission
    - RemoteToastHostReplicationReadPolicy

Lastly, this subscribes the replication lambda to the SNS Topics in the other replication regions. It requires that all necessary permissions are in place before it can subscribe, which is why we include the ‘DependsOn’ directive.

TL;DR // Deploy

So now that we have all of the pieces, you mix that repo with that dough-oh, make a Texas Lambdaaa.

yonce.gif

And by that I mean, follow these commands to deploy you some replication. These commands assume you have the aws cli set up and configured. If you haven’t, please follow the instructions here before proceeding.

  1. Clone the Template Repo
    git clone https://github.com/jessicalucci/s3-multi-region.git \
      && cd s3-multi-region.git
    export UI=
  2. Deploy the replication base stacks (do this for each region you want to replicate into)
    aws --profile  cloudformation create-stack \
      --stack-name ToastTest --template-body file://toast-base.yaml \
      --parameters ParameterKey=UniqueIdentifier,ParameterValue="$UI" \
      --capabilities CAPABILITY_NAMED_IAM --region
  3. Deploy the replication subscription stacks (do this for each region you want to replicate into, for each region also in the replica set)
    aws --profile  cloudformation create-stack \
      --stack-name ToastTest \
      --template-body file://toast-subscription.yaml \
      --parameters ParameterKey=UniqueIdentifier,ParameterValue="$UI" \
          ParameterKey=ToRegion,ParameterValue= \
      --capabilities CAPABILITY_NAMED_IAM --region
  4. Start replicating some toast! Upload your favorite toasty image to any bucket in your replica set. Wait a hot minute, then check the other buckets in your replica set to see your toast copies!
  5. Yes, it would be much easier to deploy all of these stacks via a script that manages all the naming/region mappings. I’m lazy.

Feel free to leave any questions or comments below!

3 thoughts on “How To: S3 Multi-Region Replication (And Why You Should Care)

  1. Very nice walkthrough, Jessica! ^_^
    I’m wondering if we could simplify the architecture a bit.
    For example, we could have only one bucket and one Lambda Function in each region. Each Lambda Function could be triggered by S3 directly, iterate over the target regions (stored as env variable), and copy the S3 object into all of them (excluding its own). DLQ would still be possible, and no SNS Topic is needed anymore.
    Although this approach requires a re-deploy of all the Lambda Functions every time you add a new region (I’d recommend using all of them by default, since it costs ~$0), it simplifies the architecture a bit and hopefully reduces the total execution time (with Node.js you could easily run all the s3-cp commands in parallel).
    You could simplify the architecture even more if you can assume that all the objects are uploaded into one main region, and the other S3 buckets are used only for delivery.
    Looking forward to hearing your thoughts 🙂

    Like

    1. Hi Alex! Thanks for reading, and I’m glad you enjoyed it! (:

      Here are my thoughts on your suggestions:

      Push-Fanout
      ================
      The main downside of implementing this approach is that you’d lose your built-in event re-play ability.
      Let’s say we have buckets A, B, and C in our replica set, replicating data with the described push-fanout approach. Assume a write to Bucket A successfully replicates to bucket C, but fails to replicate to bucket B. We dead letter the write, and are alerted that we need to replay this event. Since we’re using the write event itself to trigger the lambda, we’d need to delete then re-write the same data to bucket A (or C) to re-trigger the event – probably not the most efficient or safest approach.
      Alternatively, we could write a separate tool to analyze the DLQ message and perform a copy of the failed failed file manually, but then we’d need to maintain more code, more permissions, etc.

      I definitely agree that we’d reduce complexity of architecture by moving replication behavior into the lambda function, but doing this then makes lambda function itself more complex. It’s possible that the effort to maintain the more complex lambda function would be greater than that of maintaining the more complex architecture. (No idea which one is actually the harder problem, but something we would need to consider)

      And to elaborate on the new concerns in the replicating lambda –

      * requires tracking of each individual bucket to replicate to (which you already mentioned and provided a potential solution to)

      * need to implement error handling within lambda
      ** Do we fail immediately on a single region write fail?
      ** Do we allow all write attempts to complete, then fail based on the outcome?
      ** What happens if a single region write times out?

      * need to manage a new set of cross-region write permissions for the lambdas
      ** Wouldn’t be difficult to add to existing CF Template (the read path already exists), but increases attack surface

      We could probably solve some of those considerations by triggering a lambda per region on a write event, but we still wouldn’t get back the queue re-play functionality.

      Single Lambda
      ===============
      Replay would still be a concern here, but if we didn’t care about writing to remote regions (only creating read-replicas), having a single queue with a bunch of lambdas (or a single multi-threaded lambda!) would solve that concern, and simplify the approach.

      Also, I don’t think I did a great job of describing the use case in my post (it was hard to relate toast to data bytes hah), but there are definitely times when you’d want read-write replicas. (Imagine trying to quickly write a 500 GB file halfway across the world) ¯\_(ツ)_/¯

      Curious to hear what you think about my concerns with the push-fanout implementation! I agree that the queue-approach seems overly complex for the problem it’s solving (why can’t we just daisy chain replications Amazon? whyy? D:) , and would love to come up with a simpler solution if possible. 😀

      Liked by 2 people

  2. Hey Jessica, just wanted to say great write-up and nice Cloudcraft diagram! I have a similar use case where we want bi-directional (multi-directional?) replication, so I’m glad I stumbled across your solution. I’m still digesting how the DLQ will fit in and how to handle Lambda failures gracefully (like maybe a periodic `aws s3 sync` to ensure any missed files have been propagated).

    Thanks for writing this up.

    Liked by 1 person

Leave a comment