Without much knowledge of serverless technologies, the customer approached Velotio to set up a serverless datalake that can scale to store petabyte-scale data.
The customer is a B2B Customer Data Platform providing a unified view of the customer across all platforms, with leading brands like Staples, Walmart, and Cisco as their customers.
The client wanted to set up a multi-tenant serverless data lake with real-time and batch data ingestion and processing. The data ingestion system needed to support multiple file formats (CSV, TSV, XLS) and different sources - AWS S3 Buckets, FTP, and Dropbox among others.
The current CDP platform was built using traditional technologies like Hadoop, Hive, HDFS, and YARN which was difficult to manage, scale, and upgrade. The new solution needed to have Minimal infrastructure maintenance and remove the undifferentiated heavy lifting of managing infrastructure as demand changes and technologies evolve.
As the client was signing on larger enterprises, the expected data storage was expected to increase 10x from Terabytes to Petabytes but the current platform could not store unprocessed raw data in a cost-effective way. The data warehouse gets data from a range of services. In the current data warehouse, any updates to those services required manual updates to ETL jobs and tables. The response times for these data sources are critical. This required the Velotio team to take a data-driven approach to select a high-performance architecture.
This was our first time working with a remote team, but Velotio’s team didn't miss any deadlines despite having a tight schedule and won our trust early in the project. They excelled at reporting and addressing issues quickly. The communication with our on-site team was also extremely smooth. We're extremely happy with the progress we have made with them.
Velotio worked with the customer to understand the existing platform, data characteristics, and end goals.
Based on the requirements listed above, Velotio decided to change the data warehouse both operationally and architecturally. From an operational standpoint, we designed a new shared responsibility model for data ingestion. Architecturally, we chose a serverless model over a traditional relational database. These two decisions ended up driving every design and implementation decision that we made in our migration.
Velotio built the solution on AWS using serverless technologies like AWS Step Functions, AWS Lambda, AWS Glue, AWS Athena, and AWS S3. Velotio built a proof-of-concept in one month to demonstrate the solution addressing all the challenges. The complete solution was built in 4 months.
The team developed the solution as follows:
The new serverless data analytics reduced the cost of data processing and storage by 10x.
AWS S3 with Athena can easily scale to store and process 10s of petabytes of data.
Leveraging AWS services and the serverless model reduced the ongoing operational costs by 50-60%.
The current platform enables the ability to run Tensorflow-based Machine Learning models and analytics to understand customer behavior.
Over 90 global customers, including NASDAQ-listed enterprises, unicorn startups, and cutting-edge product companies have trusted us for our technology expertise to deliver delightful digital products.
We leverage emerging technologies to build products that are designed for scalability and better usability.
With us as your tech partners, you get access to a pool of digital strategists, engineers, architects, project managers, UI/UX designers, Cloud & DevOps experts, product analysts and QA managers.
At Velotio, we hold ourselves to sky-high standards of excellence and expect the same from our customers.