In Glean search, we’re always trying to return the most relevant results. When we merge results from a variety of sources—from Slack threads to Jira bugs to O365 docs—there are many dimensions in which to experiment with our ranking functions.
To help build these features, we wanted frequent iterative releases for our ranking team. But rapid turn-around can be tricky in the context of a stateful service. Particularly, our main Index Servers need to preload significant amounts of index data into memory when they restart. In many cases, preloading this data could take 15+ minutes, which required either scheduled service downtime or somewhat involved mitigation.
In practice, with this constraint, we found we weren’t deploying customer updates as often as we wanted. And our engineers found that their development cadence was unnecessarily slowed down by long server restart times.
How did we fix this problem, and move towards faster incremental deployment?
We soon found the advantages of this approach:
Like in any implementation, there was some tuning needed to make the mechanism work well. For instance:
We found that the initial use of a given release tag has an additional latency of about a second, given the Cloud Storage retrieval and Jar decoding overhead, but that subsequent uses are cached and indistinguishable from a regular implementation. Hence for production releases, we typically send a warmup request before switching the configuration.
That said, the ability to dynamically load new releases and experiments without restarting a stateful service gives us some great flexibility and ability to iterate on improving the Glean service.
Fill out the details below to get the full report delivered to your inbox.
At a company where we’re unencumbered by any motivation to hold people’s eyeballs as long as possible just to show them ads, Speed is our favorite feature.
Optimizing performance and resource costs for a modern cloud-only architecture often results in interesting technical challenges. Here’s how we discovered and debugged a Golang memory leak.
Discover through Glean’s latest study why long-term employee success hinges on great onboarding, and how information inaccessibility can cost you.
It takes one year and seven months for most workers to thrive at a new job. Discover why, and how to better set employees up for success.
Glean’s single-tenant model of service deployment guarantees best-in-class security – but it also results in complex monitoring challenges as we scale our customer base. Discover how our unique in-house tool comprehensively tackles the problem while improving our productivity and error visibility.