Top 7 Tools Waldo Uses for Effective Information Gathering
Waldo uses seven tools during the information-gathering phase: unified enterprise search, query decomposition, hybrid semantic-and-keyword retrieval, context-aware ranking, retrieval-augmented generation, agentic tool selection, and security and governance controls. Together, these capabilities let the model break down complex questions, search across connected systems, and assemble cited evidence before a frontier language model generates its final answer.
Agentic search models like Waldo represent a shift in how enterprise teams collect and synthesize information. Rather than relying on a single model to handle everything from retrieval to reasoning, these tools separate the search phase from the analysis phase, producing faster and more accurate results.
The sections below walk through each of the seven tools, explain what it does, and show how the overall architecture turns a broad question into a well-sourced answer.
What is Waldo and how does it gather information?
Waldo is an agentic search model designed specifically for the information-gathering phase of enterprise AI workflows. When a user asks a question, Waldo does not immediately attempt to generate an answer. Instead, it decomposes the query into sub-tasks, decides which retrieval tools to call, searches for supporting evidence across internal and external sources, and assembles a complete context package.
Only then does it hand that package to a frontier language model for final synthesis. According to Glean's published benchmarks at launch, separating retrieval from reasoning cuts latency by roughly 50% and token consumption by about 25% compared to running the full workload on a single reasoning model — without any drop in answer quality.
During a typical information-gathering run, Waldo orchestrates multiple tools in sequence. It might start by querying an internal knowledge base for a product spec, then pull usage data from an analytics platform, then search company communications for the most recent decision on that topic. At each step, it evaluates the quality of what it finds — discarding low-relevance results and requesting additional context where gaps remain.
The result is a curated evidence set rather than a raw dump of search hits. For example, if a sales engineer asks "What changed in our enterprise pricing model last quarter?", Waldo would locate the pricing committee's decision document, cross-reference it with the updated rate card, and surface the Slack thread where the VP of Sales confirmed the rollout date — all before a single word of the final answer is written.
Finding the right internal document, the right data point, or the right person who knows the answer accounts for a significant share of knowledge workers' time. McKinsey estimates that employees spend about 20% of their workweek searching for internal information or tracking down colleagues who can help — and IDC data suggests the figure may be even higher, with knowledge workers spending roughly 2.5 hours per day just searching for information. By treating information gathering as a distinct, optimizable phase, Waldo turns that retrieval bottleneck into a structured, auditable process.
Tool 1: unified enterprise search across connected systems
The foundation of effective information gathering is deceptively simple: search every system where knowledge actually lives. In practice, enterprise knowledge is scattered across many application types:
- Document stores (Google Drive, SharePoint, Confluence)
- Messaging platforms (Slack, Microsoft Teams)
- Ticketing systems (Jira, Zendesk, ServiceNow)
- CRM records (Salesforce, HubSpot)
- Project management tools (Asana, Monday)
When Waldo needs to gather evidence for a question, its first move is querying a unified search layer that connects to more than 100 workplace applications — pulling results from all of them in a single pass. A single search pass eliminates the most common retrieval bottleneck: not finding the answer, but not knowing where to look. As IBM notes, modern enterprise search platforms incorporate AI technologies to organize disparate data sources into searchable indexes that enable fast, precise query processing.
What makes this more than a federated search wrapper is permission-aware retrieval. Every result Waldo surfaces respects the access controls already configured in the source system. If a document in Google Drive is shared only with the legal team, a marketing manager's query won't return it — even if the content is highly relevant.
Permission-aware access is a hard requirement for regulated industries like financial services and healthcare, where exposing restricted data during an internal search could trigger a compliance violation. The permission model runs at query time, not as a batch sync, so access changes propagate immediately.
Consider a product manager preparing for a quarterly business review. They need the latest customer churn analysis from the data team's Snowflake dashboard, a summary of recent support escalations from Zendesk, and the competitive analysis slide deck someone on strategy shared last month. Without unified search, that's three separate tools, three separate queries, and a manual synthesis step.
With Glean Search spanning all connected systems, Waldo handles that retrieval in a single orchestrated pass — returning results ranked by relevance, freshness, and the searcher's own work context. The product manager gets a curated evidence set instead of a scavenger hunt. For a deeper look at how AI transforms this process, see how enterprise AI search combines retrieval and generation to surface precise answers.
Tool 2: query decomposition and multi-step planning
Complex enterprise questions rarely have single-source answers. When someone asks "What's our competitive positioning for the Q3 deal with Acme Corp?", the answer sits across multiple systems and documents — none of which, on its own, tells the full story.
Waldo handles this by decomposing the original question into discrete sub-queries before executing any search. For the Acme Corp example, the planner might generate four parallel tracks: recent activity and communications involving Acme, internal win/loss analysis for similar deals, product comparison documents relevant to Acme's tech stack, and Slack conversations where the account team discussed deal strategy.
According to Glean's published benchmarks, the planning step runs on a lightweight model that operates roughly 10x faster per call than a full reasoning model. Planning doesn't require deep inference — it requires pattern recognition over query structure and a map of available data sources. By keeping the planner fast and cheap, Waldo can evaluate multiple decomposition strategies without burning time or compute.
The planner also decides when enough evidence has been gathered. Not every question requires the full orchestration loop; based on Glean's observed query patterns, roughly half of incoming queries resolve on a fast path, where the planner identifies that a single well-targeted search will return sufficient context and skips the multi-step sequence entirely.
The decomposition capability is central to the Agentic Engine's design. Waldo treats a query as a plan to be executed, where each step builds on what previous steps found — an approach known as agentic reasoning. If the first sub-query about Acme's recent activity reveals that the deal shifted from a renewal to an expansion, the planner adjusts downstream queries to focus on expansion-specific competitive positioning rather than renewal benchmarks.
That adaptive loop turns a broad question into a well-scoped evidence package — something a single-pass search cannot match.
Tool 3: hybrid search combining semantic and keyword retrieval
Retrieval quality depends on matching the right search technique to the right kind of query — and most enterprise questions benefit from both techniques at once. Waldo pairs keyword search with semantic search on every retrieval call.
Keyword search is built for precision: when someone searches for "SOC-2 audit report Q1 2026," you want exact term matching to surface that specific document. Semantic search is built for recall: when someone searches for "how do we handle customer data deletion requests," it should find the data privacy runbook even if it never uses the word "deletion." Running both in parallel and merging the results captures matches that either approach alone would miss. For a detailed comparison of these hybrid search techniques, including how they differ from pure vector search, see Glean's breakdown.
The hybrid approach is especially valuable when queries cross departmental boundaries. An engineer searching "deployment rollback procedure" and a support agent searching "how to undo a release" are asking for the same runbook, but the vocabulary gap between the two queries would defeat a purely keyword-based system. Semantic search bridges that gap by matching on meaning rather than terms.
Keyword search keeps the system precise for identifiers that carry no semantic weight — model numbers, customer account IDs, internal project codenames, or API endpoint names. Without the keyword component, a semantic-only system might return conceptually related but factually wrong documents.
The practical impact is a better signal-to-noise ratio in Waldo's evidence collection. Fewer false negatives mean the system doesn't miss relevant documents buried under unfamiliar terminology, and fewer false positives mean the downstream reasoning model spends its context window on genuinely useful evidence rather than tangentially related noise.
Glean Search applies this hybrid retrieval across every connected source, so the quality gain compounds as more systems are integrated — each new connector adds both keyword-indexed and semantically embedded content to the search layer.
Tool 4: context-aware ranking powered by enterprise and personal graphs
Finding relevant documents is only half the retrieval problem — the other half is ranking them so the most useful results surface first. A keyword-and-semantic search might return 40 documents that match a query about a product launch timeline, but the one written by the project lead last week matters more than a draft from six months ago by someone who left the company.
Waldo's ranking layer factors in signals that go well beyond text relevance: organizational structure, document freshness, author expertise on the topic, and the searcher's own work patterns.
Two knowledge structures make this possible. The Enterprise Graph maps relationships between people, teams, projects, documents, and communication channels across the organization. It knows that the VP of Engineering owns the infrastructure roadmap, that the roadmap was last updated three days ago, and that the infrastructure team's Confluence space is the authoritative source for deployment standards.
The Personal Graph captures individual signals — which tools a person uses most, which colleagues they collaborate with frequently, and which topics they've been working on recently. Together, these graphs let the ranking model personalize results for each user without requiring any manual configuration. To understand how knowledge graphs provide the contextual foundation for enterprise AI — including multi-hop reasoning and process-pattern recognition — see Glean's deep dive on the topic.
The practical difference shows up in how results rank for different people. A finance analyst asking about "Q3 revenue forecast" sees the CFO's latest board deck at the top; a sales director asking the same question sees the regional pipeline report first. The underlying document set is identical, but the ranking reflects who is asking and why. Gartner research found that 47% of digital workers struggle to find the information they need to do their jobs effectively — personalized, context-aware ranking is what closes that gap.
Without a context layer like this, every query starts cold, and users spend time scrolling past irrelevant results to find the one document they actually need.
Tool 5: retrieval-augmented generation for grounded, cited answers
Once Waldo has gathered and ranked its evidence, the next step is turning that evidence into a direct answer — not a list of links. Retrieval-augmented generation (RAG) is the technique that handles this step.
RAG works by feeding specific retrieved documents into a language model as input context, rather than relying solely on the model's training data. The output is a synthesized answer with inline citations pointing back to the source documents, so the reader can verify any claim by clicking through to the original.
The citation layer matters more than it might seem. In enterprise settings, an answer without provenance is an answer that can't be trusted for decision-making. If a compliance officer asks about the company's data retention policy, they need to know whether the answer came from the current policy document or an outdated wiki page.
RAG makes provenance transparent: every statement in the generated response traces back to a specific source, and the reader can evaluate both the answer and the quality of the evidence behind it. Source-grounded generation is fundamentally different from a general-purpose language model, which may produce fluent text with no connection to any verifiable document.
RAG also reduces hallucination risk by anchoring generation in retrieved enterprise data rather than the model's parametric memory. According to a 2026 analysis of RAG benchmarks, retrieval-augmented generation can reduce AI hallucinations by 40–71% compared to baseline LLMs — a critical improvement for enterprise use cases where accuracy is non-negotiable.
Glean Assistant applies this pattern across research and analysis workflows — teams get answers they can act on because the supporting evidence is visible, traceable, and current. For data-heavy questions like "What was our customer acquisition cost trend over the last four quarters?", the answer pulls from actual reports and dashboards rather than generating plausible-sounding numbers from training data.
Tool 6: agentic tool selection and orchestration
Not every query calls for the same set of tools. A question about organizational reporting structure needs people-and-team data from the Enterprise Graph, while a question about quarterly revenue trends needs structured data from a connected analytics platform.
A question about a product decision needs documents and message threads. Waldo's orchestration layer evaluates each incoming query and autonomously selects which tools to invoke, in what order, and with what parameters — routing to the right retrieval and analysis capabilities based on the nature of the question.
Waldo adapts its strategy per problem, which is what makes the system agentic. If a question requires both a document search and a structured data lookup, it runs them in parallel and merges the results before passing the combined evidence to the language model. If a question is straightforward enough to resolve with a single search, the orchestrator skips the multi-tool sequence entirely.
Based on Glean's observed data, adaptive routing is why roughly half of all queries resolve on a fast path — the system doesn't impose unnecessary complexity on simple questions.
The orchestration layer also handles error recovery. If a retrieval call returns thin or low-confidence results, the system reformulates the sub-query — broadening search terms, targeting a different source system, or adjusting the time window — and retries before escalating.
Glean Agents build on this same agent orchestration pattern for multi-step workflows that go beyond information gathering into action-taking. The underlying principle is the same: match the tool to the task, sequence intelligently, and recover gracefully when a step underperforms.
Tool 7: security, governance, and audit controls for compliant data collection
Every tool in Waldo's information-gathering stack operates inside a security perimeter that enforces access controls at the retrieval level — not as a post-processing filter. When Waldo queries a connected system, it inherits the requesting user's permissions in that system.
A search that spans Salesforce, Confluence, and Google Drive applies three separate permission checks, one per source, before any result enters the evidence set. Inheriting the user's permissions means Waldo never surfaces data that the user wouldn't be able to access by logging into the source application directly.
For teams in financial services, healthcare, and government, permission-aware retrieval is a baseline requirement — but it's not sufficient on its own. These organizations also need to know what happened during an information-gathering session: which sources were queried, what evidence was retrieved, and how the final answer was assembled. A proper permissions structure is essential to ensuring that generative AI delivers secure and relevant results in complex enterprise environments.
Waldo produces audit trails that log this entire chain, giving compliance teams a reviewable record of every retrieval and generation step. If a regulator asks how a particular analysis was produced, the organization can trace it back to specific documents and the sequence of tool calls that surfaced them.
Glean's security and governance model adds a further constraint: zero-day data retention for enterprise content processed during search and generation. Customer data is never used to train external models, and content processed during a query is not persisted beyond the session.
These governance controls sit upstream of every retrieval and generation step. For organizations evaluating information-gathering tools, the architecture means that adopting agentic search doesn't require relaxing existing data governance policies — the system operates within the same boundaries that IT and security teams have already defined.
How to choose the right information-gathering tools for your workflow
Match tools to query complexity
Not every question needs a full agentic orchestration loop. Simple lookups — "Where's the brand guidelines PDF?" — resolve on a fast path with a single targeted search, while multi-source questions that require decomposition, cross-referencing, and synthesis need the full tool chain. Understanding where your team's typical queries fall on this spectrum helps set realistic expectations for response time and depth.
Evaluate tools on context depth, not just speed
Effective information-gathering tools understand who is asking, what they're working on, and which sources are authoritative for that person's role. Look for platforms that maintain a persistent context layer — Glean's Enterprise Graph and Personal Graph, for example, map relationships across people, teams, and documents so every query is ranked for the person asking, not just for text relevance. Speed matters, but a fast answer from the wrong document is worse than a slightly slower answer from the right one.
Prioritize governed, permission-aware platforms
Any tool that searches across enterprise systems must respect existing access controls without requiring manual configuration per query. Audit trails, data residency options, and zero-day retention for processed content should be table stakes, not premium add-ons. If a platform can't demonstrate permission-aware retrieval at the connector level, it's not ready for regulated environments. Recent research shows that hallucination rates for complex reasoning tasks still exceed 33%, making governed, grounded retrieval systems — rather than unanchored generation — a prerequisite for trustworthy enterprise AI.
Frequently asked questions
What specific tools does Waldo use during information gathering?
Waldo uses seven core capabilities: unified enterprise search across 100+ connected applications, query decomposition and multi-step planning, hybrid semantic-and-keyword retrieval, context-aware ranking via enterprise and personal knowledge graphs, retrieval-augmented generation with inline citations, agentic tool selection and orchestration, and security and governance controls including permission-aware access and audit logging.
How does Waldo decide which tools to use for a given query?
A lightweight planning model evaluates each incoming query and selects the appropriate tools, sequencing, and parameters based on the question's complexity and data requirements. Simple queries resolve on a fast path with a single search call; complex multi-source questions trigger the full decomposition and orchestration loop.
Can Waldo integrate with existing enterprise tools for enhanced information gathering?
Yes. Waldo operates on top of a search infrastructure that connects to more than 100 workplace applications, including document stores, messaging platforms, CRM systems, ticketing tools, and analytics platforms. Each connector inherits the source system's permission model, so integration doesn't require reconfiguring access controls.
What are the advantages of using an agentic search model for information gathering?
Agentic search treats a query as a plan to be executed rather than a single retrieval event. That design means the system can decompose complex questions, search multiple sources in parallel, evaluate result quality mid-stream, and adapt its strategy when initial results are insufficient. The result is more complete, better-sourced evidence sets — with lower latency and fewer wasted tokens compared to passing the full reasoning burden to a single large model.
What are best practices for using Waldo in the information-gathering process?
Start with clear, specific questions rather than broad prompts — Waldo's decomposition works best when the intent is unambiguous. Review cited sources in generated answers to build confidence in the evidence chain, and confirm that relevant source systems are connected so Waldo can search your full knowledge base. For a broader perspective on how modern information retrieval techniques are evolving, see Glean's comprehensive guide.
The right information-gathering tools turn a scattered search across dozens of systems into a single, structured retrieval process — one that delivers sourced answers instead of ranked links. When your team can trust the evidence behind every response, decisions move faster and with less rework. Request a demo to explore how Glean and AI can transform your workplace.










