Learning Rust for Use in AWS - Part 2 | Another Tech Blog, but with Banjos

Overview

I spend the past week an a half reading through the Rust Book and I’m going to try a little experiment, comparing the functionality of a Rust lambda to a Python lambda.

The Experiment

I’m going to capture a pretty straightforward lambda function use case with S3, where a user will upload an image to a bucket, and a lambda function will run when the image is uploaded, make a smaller, thumbnail version of the image, and save it to a new location in the same bucket.

After enabling this functionality, I’m also going to maintain an index in DynamoDB, where we can store metadata about the image, including its thumbnail location.

Using 2 different languages for our Lambda functions, we can compare overall resource usage, which could extrapolate to potential cost savings if the Lambda executions span past the AWS Free Tier allocation limits.

Architecture

In order to build this workflow, we’ll need to set up a few things:

An S3 bucket to store our images
An S3 event trigger to notify our lambda that a new image has been uploaded
2 Lambda functions (one in Python and one in Rust) to do the size conversion and store the relevant metadata
A DynamoDB to store our metadata - We’ll add this functionality later in Part 3

Infrastructure

Let’s get started on configuring our AWS resources. We’re going to manage all of our resources in Terraform, which will make our infrastructure easy to create, update, and destroy. Installation and instructions about how or why to use Terraform is outside the scope of this post. For more information, please refer to the Terraform website.

Let’s start by creating an new project, and creating a folder for our Terraform code:

$ mkdir -p lambda_comparison/terraform
$ cd lambda_comparison/terraform

Before we start, it’s always a good idea to configure Terraform and your providers. Create a new file called terraform.tf and add the following NOTE: Setting up default tags in the provider makes it easy to tag all your resources instead of passing a local.tags variable around to all your resources

`terraform.tf`

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.64.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"
  default_tags {
    tags = {
      Owner       = "banjackal"
      Provisioner = "Terraform"
      Project     = "Rust-Python Lambda Comparison"
    }
  }
}

Now, we need to define our resources. We’re going to separate out our services into a few separate files just to keep our logical pieces together and to keep individual files small. We’ll start with our S3 resources in a file called s3.tf NOTE: Replace <unique bucket name> below with a name of your choosing

`touch s3.tf`_

resource "aws_s3_bucket" "images" {
  bucket = "<unique bucket name>"
}

resource "aws_s3_bucket_public_access_block" "block_public_access" {
  bucket = aws_s3_bucket.images.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

Next, we can add our lambda configurations. Note that we’re configuring the modules slightly differently for Python and Rust Because Rust does not have a provider runtime, we are going to use the provided.al2 runtime on arm64 architecture. The Rust Runtime for AWS Lambda will create a bootstrap.zip binary that we then use as our handler.

`lambdas.tf`

data "aws_iam_policy_document" "lambda_policy" {
  statement {
    effect = "Allow"
    actions = [
      "s3:GetObject",
      "s3:ListBucket",
      "s3:PutObject",
    ]
    resources = [
      aws_s3_bucket.images.arn,
      "${aws_s3_bucket.images.arn}/*"
    ]
  }

  statement {
    effect = "Allow"
    actions = [
      "dynamodb:GetItem",
      "dynamodb:BatchGetItem",
      "dynamodb:Query",
      "dynamodb:PutItem",
      "dynamodb:UpdateItem",
      "dynamodb:DeleteItem",
      "dynamodb:BatchWriteItem"
    ]
    resources = [
      module.dynamodb_table.dynamodb_table_arn
    ]
  }
}

module "python_lambda_function" {
  source = "terraform-aws-modules/lambda/aws"

  publish = true

  function_name = "python_image_processor"
  description   = "Creates thumbnails of uploaded S3 images"
  handler       = "lambda_function.lambda_handler"
  runtime       = "python3.10"
  source_path   = "../python/"

  allowed_triggers = {
    S3Events = {
      principal  = "s3.amazonaws.com"
      source_arn = "arn:aws:s3:::*"
    }
  }

  attach_policy_json = true
  policy_json        = data.aws_iam_policy_document.lambda_policy.json

  memory_size = 128
  timeout     = 5
}

module "rust_lambda_function" {
  source = "terraform-aws-modules/lambda/aws"

  publish = true

  function_name          = "rust_image_processor"
  description            = "Creates thumbnails of uploaded S3 images"
  handler                = "bootstrap"
  runtime                = "provided.al2"
  architectures          = ["arm64"]
  create_package         = false
  local_existing_package = "../rust-image-processor/target/lambda/rust-image-processor/bootstrap.zip"

  allowed_triggers = {
    S3Events = {
      principal  = "s3.amazonaws.com"
      source_arn = "arn:aws:s3:::*"
    }
  }

  attach_policy_json = true
  policy_json        = data.aws_iam_policy_document.lambda_policy.json

  memory_size = 128
  timeout     = 5
}

THIS NEXT STEP IS VERY IMPORTANT, PLEASE READ CAREFULLY

We now need to set up our S3 triggers. We’re going to create our trigger for each of our lambda functions based on an S3 prefix. It is important to specify the prefix so our Read, Transform, Upload workflow does not continue triggering itself. The durable autoscaling on Lambda means a mistake here could get expensive. Additionally, when we later write the new files, we want to make sure we’re uploading them to different prefixes.

`s3_event_trigger.tf`

resource "aws_s3_bucket_notification" "bucket_notifications" {
  bucket = aws_s3_bucket.images.id

  lambda_function {
    lambda_function_arn = module.python_lambda_function.lambda_function_arn
    events              = ["s3:ObjectCreated:*"]
    filter_prefix       = "upload_for_python/"
  }

  lambda_function {
    lambda_function_arn = module.rust_lambda_function.lambda_function_arn
    events              = ["s3:ObjectCreated:*"]
    filter_prefix       = "upload_for_rust/"
  }
}

Lastly, we just need a simple DynamboDB table

`dynamodb.tf`

module "dynamodb_table" {
  source = "terraform-aws-modules/dynamodb-table/aws"

  name     = "image_metadata"
  hash_key = "id"

  attributes = [
    {
      name = "id"
      type = "S"
    }
  ]
}

Python Code

For starters, we’re just going to worry about reading from S3, resizing the image, then writing it to S3. Convenietly enough, AWS provides great documentation, including example code in a tutorial that we can borrow from. NOTE: The example from the tutorial has been modified to upload the resized image to a different prefix in the same S3 bucket instead of a different bucket

Before we start, let’s hop out of our terraform directory and make a new folder at the project root called python

$ cd ..
$ mkdir python/
$ cd python/

Now, add the lambda function and the requirements.txt

`lambda_function.py`

import boto3
import os
import sys
import uuid
from urllib.parse import unquote_plus
from PIL import Image
import PIL.Image
            
s3_client = boto3.client('s3')
            
def resize_image(image_path, resized_path):
  with Image.open(image_path) as image:
    image.thumbnail(tuple(x / 2 for x in image.size))
    image.save(resized_path)
            
def lambda_handler(event, context):
  for record in event['Records']:
    bucket = record['s3']['bucket']['name']
    key = unquote_plus(record['s3']['object']['key'])
    print(f'resizing {key}')
    tmpkey = key.replace('/', '')
    print(f'tmpkey {key}')
    download_path = '/tmp/{}{}'.format(uuid.uuid4(), tmpkey)
    upload_path = '/tmp/resized-{}'.format(tmpkey)
    s3_client.download_file(bucket, key, download_path)
    resize_image(download_path, upload_path)

    upload_key = 'resized-python/{}'.format('/'.join(key.split('/')[1:]))
    print(f'uploading to {upload_key}')
    s3_client.upload_file(upload_path, '{}'.format(bucket), upload_key)

`requirements.txt`

Pillow == 9.5.0

Rust Code

Now, assuming we already have Rust and Cargo installed, we need to add the AWS Rust Lambda Runtime Install cargo-lambda Using Rust Lambda Runtime

Again, let’s hop out of our last directory and use cargo lambda to create our new project folder

$ cd ..
$ cargo lambda new rust-image-processor

Running cargo lambda new will then give you some prompts:

For the first prompt, select N
For the second prompt, scroll down until you find S3Event
The result should look like this

Now, let’s jump into our Rust project update our Lambda function code

$ cd rust-image-processor

First, update the Cargo.toml to include all our necessary crates

`Cargo.toml`

[package]
name = "rust-image-processor"
version = "0.1.0"
edition = "2021"

# Starting in Rust 1.62 you can use `cargo add` to add dependencies 
# to your project.
#
# If you're using an older Rust version,
# download cargo-edit(https://github.com/killercup/cargo-edit#installation) 
# to install the `add` subcommand.
#
# Running `cargo add DEPENDENCY_NAME` will
# add the latest version of a dependency to the list,
# and it will keep the alphabetic ordering for you.

[dependencies]
aws-config = "0.55.1"
aws-sdk-dynamodb = "0.26.0"
aws-sdk-s3 = "0.26.0"
aws_lambda_events = { version = "0.7.3", default-features = false, features = ["s3"] }
image = "0.24.6"

lambda_runtime = "0.8.0"
openssl = { version = "0.10.51", features = ["vendored"] }
rust-s3 = "0.33.0"
tokio = { version = "1", features = ["macros"] }
tracing = { version = "0.1", features = ["log"] }
tracing-subscriber = { version = "0.3", default-features = false, features = ["fmt"] }
urlencoding = "2.1.2"

`src/main.rs`

use aws_lambda_events::event::s3::S3Event;
use aws_lambda_events::s3::S3EventRecord;
use lambda_runtime::{run, service_fn, Error, LambdaEvent};
use image::ImageError;
use s3::bucket::Bucket;
use s3::creds::Credentials;
use std::io::Cursor;

/// This is the main body for the function.
/// Write your code inside it.
/// There are some code example in the following URLs:
/// - https://github.com/awslabs/aws-lambda-rust-runtime/tree/main/examples
/// - https://github.com/aws-samples/serverless-rust-demo/
async fn function_handler(event: LambdaEvent<S3Event>) -> Result<(), Error> {
    // Extract some useful information from the request
    for record in event.payload.records {
        let _ = process_record(record).await;
    }
    Ok(())
}

async fn process_record(record: S3EventRecord) -> Result<(), Error> {
        //extract fields from event record
        let bucket_name = record.s3.bucket.name.unwrap();
        //string is urlencoded, need to convert
        let object_key = record.s3.object.key.unwrap();
        let object_key = urlencoding::decode(&object_key).expect("UTF-8");

        let region = record.aws_region.unwrap().parse()?;

        //initialize bucket
        let credentials = Credentials::default()?;
        let bucket = Bucket::new(&bucket_name, region, credentials).expect("Unable to connect to bucket");
        
        //Get object
        let object = bucket.get_object(&object_key).await?;
        // let object = image::load_from_memory(object.bytes()).unwrap();
        let reader = image::io::Reader::new(Cursor::new(object.bytes())).with_guessed_format()?;

        //gets the source format so we can use it in our write
        let object_format = reader.format().unwrap();
        let object = reader.decode()?;

        //Scale image
		let scale_ratio = 0.5;
        let resized = resize_image(&object, &scale_ratio).unwrap();

        // Create new S3 key name from source without the prefix
        let removed_root_folder = get_route_without_root(&object_key);
        let target = format!("resized-rust{}", removed_root_folder);
		println!("Uploading resized image to {}", target);

        //write to bytes
        let mut bytes: Vec<u8> = Vec::new();
		resized.write_to(&mut Cursor::new(&mut bytes), image::ImageOutputFormat::from(object_format))?;

        // Upload new image to s3
        let _uploaded = bucket.put_object(&target, &bytes).await?;
        println!("Uploaded resized image");

        Ok(())

}

 fn get_route_without_root(path: &str) -> &str {
 	let bytes = path.as_bytes();
 	for (i, &item) in bytes.iter().enumerate() {
 		if item == b'/' {
 			return &path[i..];
 		}
 	}
 	&path
 }

fn resize_image(img: &image::DynamicImage, ratio: &f32) -> Result<image::DynamicImage, ImageError> {
    let old_w = img.width() as f32;
    let old_h = img.height() as f32;
    let new_w = (old_w * ratio).floor();
    let new_h = (old_h * ratio).floor();

    let scaled = img.resize(new_w as u32, new_h as u32, image::imageops::FilterType::Lanczos3);

    Ok(scaled)
}

#[tokio::main]
async fn main() -> Result<(), Error> {
    tracing_subscriber::fmt()
        .with_max_level(tracing::Level::INFO)
        // disable printing the name of the module in every log line.
        .with_target(false)
        // disabling time is handy because CloudWatch will add the ingestion time.
        .without_time()
        .init();

    run(service_fn(function_handler)).await
}

Now, we build our lambda package with cargo lambda build

$ cargo lambda build --release --arm64 --output-format zip

You can see the output bootstrap.zip was created at target/lambda/rust-image-processor/

Time to Deploy

Now, let’s go back into our terraform directory, and run our terraform init and terraform plan

$ cd ../terraform
$ terraform init

terraform init

terraform plan should want to create 25 resources

$ terraform plan

terraform plan

Now, run a terraform apply and select ‘yes’ when prompted

$ terraform apply

terraform apply prompt

Testing

Now, we can use the AWS CLI to upload images to our bucket and validate the output. For my testing, I’m just using a simple screenshot PNG image

$ aws s3 cp 'test-screenshot.png' s3://<bucket name>/upload_for_rust/
$ aws s3 cp 'test-screenshot.png' s3://<bucket name>/upload_for_python/

Then we can check the output

$ aws s3 ls s3://<bucket name>/resized-rust/
2023-04-25 10:59:27      61322 test-screenshot.png

$ aws s3 ls s3://<bucket name>/resized-python/
2023-04-25 10:59:32      45821 test-screenshot.png

Great, our lambda functions worked as expected.

Metrics

Using the AWS Console, I want to see how these lambdas performed in this one instance, so I’ll check the Logs in the Monitoring section of the Lambda console:

Python Lambda

python cold start

Rust Lambda

rust cold start

The Rust cold start is sligthtly faster and uses significantly less memory. Let’s explore how much the performance gap expands later after we add DynamoDB integration and send more requests to the functions.

Cleanup

If you’ve been following along, feel free to empty the S3 bucket and tear down your Terraform stack

$ aws s3 rm s3://<bucket name>/ --recursive
$ terraform destroy