In the vast ocean of expense transactions on Navan, the Elasticsearch search and analytics engine is a trusty lighthouse that casts light on waves of data for queries and analysis. But a lighthouse’s beam only shines so far. Elasticsearch’s default search method offers a glimpse of only 10,000 results — possibly just a single day’s transactions for our larger clients.
To ensure customers have complete visibility into every transaction on Navan, our software engineers set off on a quest to unlock the full potential of Elasticsearch as a navigation tool. Here’s a chronicle of how our journey unfolded.
What is Elasticsearch?
Elasticsearch is a powerful search engine that software engineers use to organize and quickly find specific information within large collections of data.
Previously, Navan used Elasticsearch’s search API with from and size parameters to page through search results. However, this method is not suitable for search results larger than 10,000.
From Elasticsearch’s documentation:
By default, you cannot use from and size to page through more than 10,000 hits. This limit is a safeguard set by the index.max_result_window index setting. If you need to page through more than 10,000 hits, use the search_after parameter instead.
To overcome this limitation, we turned to Elasticsearch’s recommended alternative: using search_after and sort parameters to paginate through the large set of results.
The search_after and sort parameters allow Elasticsearch to operate on the principle of sorting and retrieving results based on a specified sort order and a unique marker for each search result.
Suppose we have a query yielding 20,000 search hits, and we aim to paginate these results with a page size of 500. Our sort criteria include sorting by the dateCreated field in descending order.
Initially, to obtain the first page of results, we only need to specify the sort parameter
Upon the initial query, Elasticsearch returns the first 500 search hits, each of which are accompanied by a list sort, which contains the value of dateCreated. Think of sort as the unique identifier for the search hits.
To retrieve subsequent pages, we utilize the search_after parameter as we would use the from parameter. But search_after tells Elasticsearch the specific search result from which we’d like to retrieve the next 500 results.
Setting search_after to the sort list of the last document of the current page helps Elasticsearch locate the particular search result whose sort equals search_after and then retrieve the 500 results that follow.
Despite its efficacy, the search_after parameter presented its own set of challenges, primarily around data navigation and sorting.
While search_after facilitates forward pagination, navigating backward requires a workaround. We reversed the sort order and set search_after to the sort_fields of the first element on the current page.
The above search request gets us to the previous page of results sorted by dateCreated in ascending order.
To align with the desired descending order, the last step is to reverse the search hits you get from this search request.
Sorting by nullable fields poses a distinct challenge. To understand what goes on under the hood when sorting by a nullable field, it’s important to consider the missing parameter.
This parameter tells Elasticsearch to put the documents missing the sort field at the top or bottom of the result set. By default, missing = “_last”.
To understand how Elasticsearch assigns the identifier sort to the documents missing the sort values, here’s an example. Suppose we are sorting by dateModified, a nullable date field.
Given the sort order is descending and the requirement to put the documents missing the sort field at the end, Elasticsearch will assign the sort value of these documents to be the minimum value of the date, which is -9223372036854775808 (Sept 21 1677).
Here’s what the search response would look like:
Notice that all the documents missing dateModified now have the same sort = [-9223372036854775808].
If we want to fetch the next page and create a new search request with search_after = [-9223372036854775808], the request will fail because Elasticsearch cannot use a non-unique search_after to identify the particular search result to start retrieving results from.
The search mechanism relies on sort being the unique identifier for every search result. Therefore, to sort by a nullable field, you must ensure the uniqueness of sort by incorporating a second sort field that can act as a tiebreaker, such as the document ID.
By understanding the mechanics of Elasticsearch and overcoming the associated challenges, we could adopt searching using search_after parameter and empower efficient retrieval of large datasets while maintaining performance and scalability. This process enables us to meet the demands of our growing customer base and handle the millions of transactions being processed on our platform.