Reimagine the Image Pipeline

Thomas Berger
24. July 2023
Reading time: 4 min
Reimagine the Image Pipeline

Handling incoming images is a very central topic for many companies active in the media industry. Managing these images is a key process day in and day out. 

Many years ago, a company that is amongst the top 10 websites in Austria, started to build its image delivery applicationon their own and hosted it on-premises. This application served as the central pipeline for images used for their website. They added feature by feature and everything was working pretty well. But over the years, with hardly any refactoring of the application and infrastructure, things started to get worse.  And neither the increasing demand for performance nor new feature requests could be satisfied anymore. It was time for something completely new. 

After getting to know their image processing requirements and image publishing pipelines, we designed an image delivery system without any dependencies on

their legacy systems. The new image delivery system consists of an imgproxy Docker container as the core image processing component and a highly available AWS infrastructure around it. All of this was built with CDK (Cloud Development Kit) from AWS for all the infrastructure as code (IaC) benefits. 

The following diagram shows our serverless architectural approach: 

 

This entire serverless architecture provides two functionalities: 

  • An API for image upload requests and image deletion requests 
  • The image delivery system 

The image upload and delete API 

The image API we implemented serves two functionalities. With it, you can upload images into the image store, and you can trigger image deletes. 

The image upload allows image files as parameters or an image URL to download. In case an URL is given a lambda function performs the download of the image file and puts

it into the images store afterward. The image delete method removes the given image from the images store and triggers a CDN purge request to get rid of the cached image. 

With the following AWS components, we achieved this: 

The front-end interface is a REST API created with the AWS API Gateway. With this, we can provide an authenticated public URL. To that API we attached AWS Lambda functions. AWS Lambda is an event-driven service, where you can run your code without thinking about servers. Those lambda functions handle the image uploads or image deletes while doing some additional checks. For storing the images, we have chosen an AWS S3 Bucket which is a serverless object-store

The image delivery system

For delivering the images to the clients, we built the following system: 

Client requests are handled by the AWS CloudFront CDN which has an Application Load Balancer (ALB) configured as its origin. Both are protected by the Web Application Firewall (WAF). The load balancer is targeting the image service which is hosted within an ECS Cluster utilizing AWS Fargate. ECS stands for Elastic Container Service and orchestrates the Docker containers while. Fargate provides a serverless compute layer Based on the current request the image service fetches the image from the S3 Bucket, processes it (like cropping or resizing), and delivers it via CDN. The CDN is used as a caching layer for already processed images.

Conclusion and business benefits

With this entirely new approach of delivering images to their users, the company achieves a lot of improvements and benefits compared to their previous solution.  

The previous application cached its images locally on the server it was running on and therefore was always facing the problem of running out of storage space. The new system uses S3 for that, which is an infinitely scaling cloud-based object store.  

Performance and cost-optimization improvements got realized and are implemented with a target tracking scaling for Fargate. This ensures that, there is no CPU time and memory wasted as scale n and out operations are now automatically performed based on the current load. This can be very well observed daily during the high demand in the morning and low times over the night but also allows the system to handle the extreme load during special events.

Not to mention all the flexibility and agility it gains for future requirements all backed with CloudWatch real-time monitoring and logging insights. 

Overall, this project is a perfect example to display how a company can profit from the AWS serverless infrastructure capabilities.