In the world of business travel and expense management, efficiency is paramount. After all, managing expenses means dealing with huge amounts of data — sometimes hundreds of thousands of receipts and invoices.
Admins who use Navan need a way to download millions of receipts from a CSV file at once. Manually clicking through each link to download individual receipts was previously time-consuming and could lead to missed documents. For businesses, especially those with a high volume of expenses, downloading receipts in bulk is an important step in accurate record-keeping, financial reporting, and maintaining compliance with tax and audit requirements.
That’s why Navan’s engineering team developed a batch document processing feature. The goal was to create efficiency for accounting and finance teams by simplifying the process of downloading and processing a massive amount of documents — up to 500 GB of data.
Navan’s bulk download feature streamlines the entire workflow by empowering finance and accounting teams to efficiently manage large datasets while avoiding the pitfalls of manual retrievals, such as missing critical documents that could lead to errors or regulatory issues during audits.
While our initial solution — zipping files locally before uploading them — worked for smaller accounts, there were memory and storage issues when applied to larger datasets. As Navan scaled and brought on more customers, challenges arose around storage limitations and the need to support larger companies.
Our customers are always our top priority, so we set out to deliver a solution that allows admins to quickly download all necessary files for a given statement period with just a few clicks.
The team diligently conducted several experiments before settling on the final solution that we use today. Let’s dig in.
Given the complexity of this feature, which involved several integrated applications and components, it was critical that no requests were lost or dropped. To manage this requirement, we used a system that allows different services — SQS and Kafka — to communicate with each other.
When a service completes its task, it triggers a message to the next service. If a message fails, an automated retry mechanism is in place, followed by alerts and notifications for any necessary manual intervention. This strategy keeps the entire process running smoothly without missing any key steps.
With that framework in place, here were the three approaches we tried:
Our first attempt at solving this problem was straightforward: Zip the files locally on the service, then upload the zip file to Amazon S3. While this approach was simple and easy to implement, it wasn’t scalable. The service would hit memory limits with larger companies, causing performance bottlenecks and potential failures.
As we explored alternatives, our focus shifted to multi-part uploads, a feature offered by Amazon S3 for handling large files in chunks. This approach allowed us to break the upload process into manageable parts, solving the memory issue that plagued our first attempt.
This approach divided files into smaller chunks (a minimum of 5 MB) and uploaded in parts. Once all parts were uploaded, Amazon S3 automatically assembled them into a single file. This drastically reduced the strain on local storage, enabling us to handle much larger files.
Another possible solution was AWS Lambda, which provides a lightweight and cost-effective option for running multi-part uploads. However, AWS Lambda functions come with a time limit of 15 minutes per execution. While that might work for smaller companies, we risked hitting this time limit with larger datasets.
After evaluating the current infrastructure, different components, and alternative approaches, we decided to use multi-part uploads without AWS Lambda.
The process is simple but highly effective:
This solution significantly improved the scalability of the bulk download feature. By uploading files in manageable parts, we eliminated the memory issues of the earlier approaches.
Now, even the largest companies can easily download thousands of receipts and invoices in bulk. Amazon S3’s built-in error handling automatically cleans up incomplete uploads, further improving reliability.
With this solution, Navan users no longer need to worry about manually collecting and downloading their receipts. They simply request the bulk download and the system takes care of the rest — delivering a neatly packaged zip file ready for download.
To help with performance and scalability, the bulk download service processes files in chunks of 10 to 50 MB at a time, depending on the file size. Over the past two quarters, customers worldwide have downloaded 600 GB of invoices and receipts, with an average zip file size of 50 MB. Typically, it takes just one minute to generate the files and notify users that their downloads are ready.
Scaling the bulk download feature required us to consider the limitations of our local services and AWS infrastructure. Multi-part uploads provided the answer. This approach enables us to support large datasets and improve the overall user experience for our customers.
By leveraging this solution architecture and Amazon S3’s capabilities, Navan can offer fast and scalable bulk download capabilities that are future-proofed to accommodate and foster the continued growth of our customers’ volume of expenses.
Are you an engineer who thrives in a dynamic, experimental environment? Join the Navan engineering team and push the boundaries of what’s possible.