Skip to main content

Pro Search: Teaching AI to Research

Building a search system that combines Google Custom Search with AI summarization and proper citations.

By Alexey Suvorov · · Updated · 6 min read
Featured image for Pro Search: Teaching AI to Research

May 2023. We had a feature in Dashboard v1 called Bulk Chat – a way to send the same prompt to multiple AI models simultaneously and compare their responses. Users started doing something we didn’t anticipate: they were using it as a research tool. Type a question, fire it at GPT-4 and Claude, compare which answer cited better sources, then manually verify by searching Google.

The workflow was clumsy. Three tabs minimum. Copy-pasting between a search engine, a text parser, and an AI chat. But users kept doing it, which meant the need was real. By December 2024, we’d built the pipeline they were assembling by hand.

We called it Pro Search.

The five-stage pipeline

Pro Search isn’t a single feature. It’s a pipeline with five stages, and every stage solves a different problem.

Stage 1: Query. The user types a question. Not a search query – a natural language question. “What are the current EU regulations on AI-generated content?” or “Compare the performance of GPT-4 Turbo vs Claude 3 Opus on code generation benchmarks.”

Stage 2: Search. The question goes to Google Custom Search API. We get back 10 results with titles, snippets, and URLs. This is the same Google search anyone can do, but programmatic access means we can process the results without scraping.

Stage 3: Fetch and parse. This is where the real work happens. We fetch the top results as full web pages – raw HTML, with all the navigation bars, cookie banners, advertisements, and boilerplate that make web pages hard for machines to read. Then we run each page through @mozilla/readability.

Readability was originally built for Firefox’s Reader View. It strips a web page down to its actual content – the article text, the headings, the meaningful structure – and discards everything else. Combined with jsdom for DOM parsing, it transforms messy HTML into clean, structured text that an AI model can actually work with.

Stage 4: Summarize. The parsed content from all sources gets combined into a context window, and we send it to the AI model with the user’s original question. The prompt is specific: summarize the findings, address the question directly, and cite your sources. Every claim must reference which source it came from.

Stage 5: Cite and suggest. The response comes back with inline citations – numbered references that link to the original URLs. Below the summary, we generate follow-up question suggestions based on what the user asked and what the sources contained. One question often leads to three more, and the follow-up suggestions keep the research session flowing.

The free tier gives users 12 Pro Search queries per day. Enough to explore a topic seriously. Not enough to abuse the Google Custom Search API quota.

Why Readability was the secret weapon

The parsing stage is the one most people underestimate. Raw HTML is hostile to AI processing. A typical news article page contains 40-60KB of HTML, but only 2-5KB of that is the actual article text. The rest is headers, footers, navigation, ads, related article widgets, cookie consent modals, JavaScript bundles, and metadata.

If you feed raw HTML to an AI model, you’re wasting 90% of the context window on noise. Worse, the model might try to summarize the navigation menu or the cookie policy instead of the article.

Mozilla’s Readability library solves this with a scoring algorithm. It analyzes the DOM structure, identifies the most likely “content” node based on text density, paragraph count, and structural cues, and extracts just that content. It’s not perfect – it occasionally grabs a sidebar or misses a multi-page article – but it’s right 85-90% of the time.

We didn’t just use Readability for web pages. The content processing pipeline handles multiple formats:

  • HTML pages via htmlparser2 and @mozilla/readability
  • PDF documents via pdf-parse
  • Word documents via mammoth (DOCX to HTML, then Readability for extraction)
  • RSS feeds via rss-parser
  • Markdown content via marked
  • YouTube videos via YouTube Transcriptor integration for transcript extraction
  • Images processed through Sharp for metadata and optimization

This multi-format pipeline meant Pro Search could handle sources that weren’t just web pages. A user researching a topic might get results that include PDFs from academic sites, RSS feed entries from technical blogs, and YouTube video transcripts – all parsed into clean text and fed to the summarizer.

The evolution from Bulk Chat to agent tools

Pro Search in Dashboard v1 was a standalone feature. A user clicked the Pro Search tab, typed a question, and got a cited summary. Clean interface, clear purpose.

But when we built Dashboard v2, the question changed. Instead of “how do users search the web,” we asked “how do agents search the web.”

The answer was two new tools: GoogleSearchWithScraperTool and PerplexitySearchTool. Both gave agents the same capabilities Pro Search gave users, but embedded in the agent’s tool belt. An agent working on a research task could search the web, parse the results, and incorporate the findings into its response – without the user manually triggering a search.

The Google tool used the same pipeline: Custom Search API, fetch, Readability, summarize. The Perplexity tool took a different approach entirely. Perplexity’s API handles the search-parse-summarize pipeline natively, returning AI-generated summaries with built-in citations. No need for our parsing stack. Faster, simpler, but less transparent – you don’t see the raw sources before summarization.

Both tools coexist because they serve different use cases. When an agent needs fast, reliable web research with minimal token overhead, Perplexity is better. When an agent needs to deeply analyze specific pages or handle non-standard formats like PDFs, the Google pipeline gives more control.

The citation problem

Citations sound simple: just add source numbers after claims. In practice, citation quality depends on how well the summarization model attributes information to specific sources.

Early versions of Pro Search had a citation problem. The model would sometimes hallucinate a citation – attributing a claim to Source 3 when it actually came from Source 1, or worse, generating a claim that wasn’t in any source and attaching a citation anyway. The citation numbers were there, but they didn’t always point to the right content.

We improved this through prompt engineering, not model fine-tuning. The key changes were:

  • Numbered source blocks in the prompt, so each source had a clear identifier the model could reference
  • Explicit instructions to only cite information that appeared in the provided sources
  • Source-first summarization, where the model was asked to process each source individually before synthesizing an answer
  • Follow-up question generation that was grounded in the actual source content, not the model’s general knowledge

The citation accuracy improved significantly, though it’s still not perfect. Some models are better at attribution than others. Claude tends to be more conservative – it won’t cite what it can’t trace. GPT-4 is more liberal, which means more citations but occasionally less accurate ones.

What 12 queries per day taught us about user behavior

The free-tier limit of 12 Pro Search queries per day was an arbitrary number. We expected most users to use 3-5. The data told a different story.

Power users hit the limit regularly. Not because they were running 12 independent searches, but because Pro Search became the starting point for longer research sessions. One question led to a follow-up. The follow-up led to a tangent. The tangent led to a deeper dive. Twelve queries wasn’t twelve questions – it was one research session that went deep.

This usage pattern influenced how we designed the agent tools in Dashboard v2. Agents don’t have a fixed query limit per session. They search as many times as the task requires, with credit costs calculated per search. The cost model shifted from “how many searches” to “how much research does this task need.”

From feature to capability

Pro Search started as a feature in Dashboard v1’s navigation bar. It ended as a capability embedded in every agent in Dashboard v2. The journey between those two points taught us something about building AI products: features become capabilities.

A feature is something a user clicks on. A capability is something an agent uses. The difference matters because capabilities compose. An agent with web search can research a topic before writing about it. An agent with web search and code execution can research an API, read the documentation, and write an integration. An agent with web search, code execution, and file management can build an entire project from requirements it gathers itself.

Pro Search was one of the first features we promoted from “thing users do” to “thing agents do.” It won’t be the last. The lesson is straightforward: build features as pipelines with clear inputs and outputs. When the time comes to hand them to an agent, the interface is already there.

Alexey Suvorov

CTO, AIWAYZ

10+ years in software engineering. CTO at Bewize and Fulldive. Master's in IT Security from ITMO University. Builds AI systems that run 100+ microservices with small teams.

LinkedIn

Related Posts

See what AIWAYZ can do for your team

Start a free trial — no credit card, no commitment.

© 2026 AIWAYZ. All rights reserved.

+1-332-208-14-10