At Glean, we have a lofty goal of delivering a best-in-class search product.
In a previous post, we talked about how we quickly deploy new ranking code. In this post, we focus the discussion on how we test and deploy the search engine cluster.
While this discussion may be technical in nature, we strive to always keep our focus on the top-level goal of improving the customer experience. This means all the engineering we do, even in the infrastructure layers, should result in faster feature updates and performance, fewer bugs and interruptions, and an overall building of trust by consistently holding ourselves to higher standards.
At Glean, we heavily test all layers of our search stack. Each release qualification goes through a bevy of end-to-end search tests on fully loaded deployments. In addition to this, we realized the need to enable all our search engineers to simulate the full search stack within seconds. Every day we aim to put in new improvements while closely guarding the quality and robustness of the search engine.
To that end, we’ve invested heavily in being able to quickly bootstrap an in-memory multi-node search engine within our unit tests. This framework enables us to very quickly verify any changes we make to our search stack. This may involve updates to our index schemas, retrieval and scoring logic, text tokenization, and more.
While for every release we go through a battery of end-to-end tests using fully loaded deployments, we prioritized this fast, lightweight path because:
We’ve been pleasantly surprised by how much can be caught through our end-to-end simulations. When working on optimizing our text tokenization path, we discovered that even simple search query tests run against the embedded cluster were able to find issues that weren’t caught by the tokenization unit tests themselves. We love when leaks are caught before even merging the bug to the codebase!
We aim to quickly and reliably deliver thousands of search engines. This is a very high-level statement, and in practice, it means every day we need to evaluate and improve our processes for how we operate and maintain all these search engines. This is a constant work-in-progress that keeps paying off as we continue to expand our customer base.
Recently we invested in an overhaul of all our deploy operations. We reviewed every operation and asked ourselves how we can make every step faster, safer, and more easily testable.
Through these efforts, we’re now spending much less time monitoring and babysitting previously arduous maintenance routines. We’re also catching bugs earlier which is helping reduce on-call incidents.
While these investments have yielded some serious improvements and given us some peace of mind, we’re constantly revisiting and iterating. In future posts, we’ll share more about the other investments we’re making to provide the best search possible. We’re learning every day and excited about the opportunity to build an industry-defining product.
Fill out the details below to get the full report delivered to your inbox.
While frequent iterative releases are helpful when making ranking improvements, they can be tricky to pull off, especially in a stateful service. Here’s how we do it.
Optimizing performance and resource costs for a modern cloud-only architecture often results in interesting technical challenges. Here’s how we discovered and debugged a Golang memory leak.
As more teams started using Collections, we realized how important it was to help people figure out which documents inside a given Collection were most relevant. You can now create Subcollections, which allow you to organize a group of documents and links into hierarchies and nest them within a Collection. You can also use Headers to create sections within Collections.
Inline styles didn’t scale with Glean's growth, so we chose a new solution that optimized performance and developer experience. But it didn’t come without a learning curve.