For a company which is hosting a popular jobs and real estate website, it is essential to have a healthy and resilient data import pipeline in place.
Before migrating into AWS, the company was
self-hosting their software application and infrastructure responsible for the data imports on-premises. The import data was flowing via FTP-uploads to the importer service, which was then loading the assets, like images, into an on-premises data store and putting the document data of the single advertisements into a SQL database.
With their self-hosted solution, they had to face several challenges:
These requirements led us to create the following infrastructure:
As the interface for the import data upload, we created a REST API with the AWS API Gateway, providing an authenticated public URL.
The import pipeline behind the API Gateway consists of a chain of AWS Lambda and AWS SQS to separate the lambda code to single concerns as described below. Lambdas are serverless event-driven functions that execute custom code which can be used to process data in real-time. SQS is a fully managed message queue service that enables you to decouple and scale microservices, distributed systems, and serverless applications. With SQS, you can easily manage the flow of data between your Lambdas, ensuring that your data import pipelines are reliable and scalable:
The Elasticsearch cluster, which is a highly scalable and distributed search engine in AWS, is used as searchable document store for the consuming job search applications
– and real – search applications.
Self-hosting crucial services on-premises like the import data pipeline in this case, can present several challenges like lacking scalability lacking availability. By moving the workload to the AWS cloud, the company took advantage of the scalability, resiliency and reliability of cloud-based solutions while ensuring the performance of real-time data processing and the safety of a durable object store. Also, backed with Elasticsearch, the company could provide a fast and flexible object-search application to their customers.