Navan Tech Blog
How Navan’s Engineers Solved Bulk Data Downloads

How Navan’s Engineering Team Cracked the Code for Bulk Data Downloads

The Navan Team

28 Oct 2024
5 minute read
How Navan’s Engineering Team Cracked the Code for Bulk Data Downloads
By Alex Choy, Nikita Ivanov, and Shubhra Raj

See how Navan engineers architected a scalable solution for high-volume data processing to help customers with monthly reconciliation.

In the world of business travel and expense management, efficiency is paramount. After all, managing expenses means dealing with huge amounts of data — sometimes hundreds of thousands of receipts and invoices.

Admins who use Navan need a way to download millions of receipts from a CSV file at once. Manually clicking through each link to download individual receipts was previously time-consuming and could lead to missed documents. For businesses, especially those with a high volume of expenses, downloading receipts in bulk is an important step in accurate record-keeping, financial reporting, and maintaining compliance with tax and audit requirements. 

That’s why Navan’s engineering team developed a batch document processing feature. The goal was to create efficiency for accounting and finance teams by simplifying the process of downloading and processing a massive amount of documents — up to 500 GB of data. 

Navan’s bulk download feature streamlines the entire workflow by empowering finance and accounting teams to efficiently manage large datasets while avoiding the pitfalls of manual retrievals, such as missing critical documents that could lead to errors or regulatory issues during audits.

While our initial solution — zipping files locally before uploading them — worked for smaller accounts, there were memory and storage issues when applied to larger datasets. As Navan scaled and brought on more customers, challenges arose around storage limitations and the need to support larger companies.

Our customers are always our top priority, so we set out to deliver a solution that allows admins to quickly download all necessary files for a given statement period with just a few clicks.

The team diligently conducted several experiments before settling on the final solution that we use today. Let’s dig in.

The Search for a Scalable Solution

Given the complexity of this feature, which involved several integrated applications and components, it was critical that no requests were lost or dropped. To manage this requirement, we used a system that allows different services — SQS and Kafka — to communicate with each other. 

When a service completes its task, it triggers a message to the next service. If a message fails, an automated retry mechanism is in place, followed by alerts and notifications for any necessary manual intervention. This strategy keeps the entire process running smoothly without missing any key steps.

With that framework in place, here were the three approaches we tried:

Approach 1: Local Zipping and Uploading

Our first attempt at solving this problem was straightforward: Zip the files locally on the service, then upload the zip file to Amazon S3. While this approach was simple and easy to implement, it wasn’t scalable. The service would hit memory limits with larger companies, causing performance bottlenecks and potential failures.

Pros:

  • Quick implementation
  • Easy to understand

Cons:

  • Not scalable for large datasets
  • High memory usage

Approach 2: Multi-Part Uploads

As we explored alternatives, our focus shifted to multi-part uploads, a feature offered by Amazon S3 for handling large files in chunks. This approach allowed us to break the upload process into manageable parts, solving the memory issue that plagued our first attempt.

This approach divided files into smaller chunks (a minimum of 5 MB) and uploaded in parts. Once all parts were uploaded, Amazon S3 automatically assembled them into a single file. This drastically reduced the strain on local storage, enabling us to handle much larger files.

Pros:

  • Memory usage reduced
  • AWS SDK provides built-in support
  • Automatic error handling for incomplete uploads

Cons:

  • More complex than the first approach
  • Unfamiliar territory — potential risks from using a new feature

Approach 3: Lambda’s Lightweight but Time-Limited Functions

Another possible solution was AWS Lambda, which provides a lightweight and cost-effective option for running multi-part uploads. However, AWS Lambda functions come with a time limit of 15 minutes per execution. While that might work for smaller companies, we risked hitting this time limit with larger datasets.

Pros:

  • Lighter than EC2 instances
  • Low cost and easy to deploy

Cons:

  • Time constraints of 15 minutes may not support large datasets

Finding Success: Multi-Part Upload at Scale

After evaluating the current infrastructure, different components, and alternative approaches, we decided to use multi-part uploads without AWS Lambda. 

The process is simple but highly effective:

  • Initiate a multi-part upload request to Amazon S3.
  • Break the files into 10 MB chunks, upload them to Amazon S3, and delete the local files to free up space.
  • Repeat until all files are uploaded.
  • Once complete, notify the user via email that their zip file is ready to download.

The Impact

This solution significantly improved the scalability of the bulk download feature. By uploading files in manageable parts, we eliminated the memory issues of the earlier approaches. 

Now, even the largest companies can easily download thousands of receipts and invoices in bulk. Amazon S3’s built-in error handling automatically cleans up incomplete uploads, further improving reliability.

With this solution, Navan users no longer need to worry about manually collecting and downloading their receipts. They simply request the bulk download and the system takes care of the rest — delivering a neatly packaged zip file ready for download.

To help with performance and scalability, the bulk download service processes files in chunks of 10 to 50 MB at a time, depending on the file size. Over the past two quarters, customers worldwide have downloaded 600 GB of invoices and receipts, with an average zip file size of 50 MB. Typically, it takes just one minute to generate the files and notify users that their downloads are ready.

Scaling the bulk download feature required us to consider the limitations of our local services and AWS infrastructure. Multi-part uploads provided the answer. This approach enables us to support large datasets and improve the overall user experience for our customers. 

By leveraging this solution architecture and Amazon S3’s capabilities, Navan can offer fast and scalable bulk download capabilities that are future-proofed to accommodate and foster the continued growth of our customers’ volume of expenses.

Are you an engineer who thrives in a dynamic, experimental environment? Join the Navan engineering team and push the boundaries of what’s possible.

Return to blog

More content you might like