How we discover and score organizations at scale
In Part 1 we walk through our current agentic research system: from discovery of organizations and products to LLM-as-judge scoring, URL validation, and asset enrichment. We share the pipeline, the agents, and where tokens go.
TL;DR
- What this workflow does: Uses multiple AI agents to discover, judge, and enrich organizations and products into a structured database.
- Why it matters: Produces cleaner, more reliable records than scraping alone by filtering out low-quality or broken results early.
- What problems it solves: Helps analysts and product teams see who is in a market, what they offer, and how they price, without hand-building every list.
- The outcome: A living, queryable dataset that supports market entry analysis, competitive benchmarking, gap identification, and pricing insight.
At Moncho we use multiple AI agents to discover, validate, and enrich market data. This post describes the workflow we run today, the one that powers organization discovery, scoring, and logo enrichment, and how it fits into the bigger picture of product and pricing discovery.
Instead of scraping everything and cleaning it later, we combine targeted search, validation, and LLM judging so most junk is never stored in the first place. That keeps precision high while still capturing a broad view of the market, which matters when you are making decisions based on this data.
The high-level flow
Before we populate any data, we first define sectors, landscapes, and segments based on global market maps, value chains, and well-established segments and needs. This is often done manually, by reviewing industry reports, analyst blogs, marquee investor memos, and similar sources, so the taxonomy is stable and aligned with how markets are structured.
Our research pipeline then follows a clear sequence:
- Discovery of organizations: Search and LLM extraction find companies by sector, segment, and location.
- Score to select the top ones: A Judge Agent scores each org on six dimensions; we keep those above a confidence threshold.
- Verify URL: The validation tool (used by the Judge) runs a HEAD request and rule-based checks (defunct, duplicate, subdomain/country-path rules).
- Fetch logos: Approved orgs get logos via the Logo Agent and Logo.dev.
- Discovery of products & pricing: For selected orgs we discover products and pricing (Pricing Agent, search tools).
- Score to select top products: The Scoring Agent scores products on multiple dimensions using sector-specific rubrics.
- Verify URL / Fetch product images: Product URLs and images are handled by the Product Image Agent and related tools.
What this enables
- Market entry analysis: Quickly see which players are active in a segment, what they offer, and where they operate.
- Competitive benchmarking: Compare organizations and products on consistent, multi-dimensional scores instead of ad hoc notes.
- Gap identification: Spot under-served needs, segments, or geographies where few strong offerings exist.
- Pricing intelligence: Track how products are positioned and priced across categories without manual scraping marathons.
End-to-end pipeline (organizations and products)
Below is the full flow from search query to approved organizations with logos, and then to products and product images. Judge Agent and Scoring Agent are our LLM-as-judge steps; the rest use tools and external APIs.
flowchart TB
Q([Search query]) --> D[Discovery]
D --> J[Judge and validate]
J -->|Judge and validate top orgs| L[Logo enrichment]
L --> P[Product discovery]
P --> S[Score products]
S -->|Score by sector rubrics| I[Product images]
style J fill:#dbeafe,stroke:#0ea5e9
style S fill:#dbeafe,stroke:#0ea5e9
Blue = LLM as Judge
Agents and their tools
Each agent is wired to specific tools. Judge and Scoring are the two that act as LLM judges (multi-dimensional scoring); the others orchestrate search, validation, or asset APIs.
flowchart LR
subgraph A[Agents]
TAX[Taxonomy]
D[Discovery]
V[Validation]
J[Top Orgs Judge]
L[Logo]
P[Product discovery]
S[Product Scoring]
I[Product Image]
VER[Verification]
end
subgraph T[Tools and APIs]
T0[Taxonomy and sector definitions]
T1[Search: Exa, Tavily]
T2[Validation]
T3[Logo.dev + CDN]
T4[Product discovery: Exa]
T5[Sector Specific Rubrics]
T6[Physical Goods: Exa]
T7[Digital Goods: Logo.dev]
T8[Verify website, social]
end
TAX --> T0
D --> T1
V --> T2
J --> T2
L --> T3
P --> T4
S --> T5
I --> T6
I --> T7
VER --> T8
style J fill:#dbeafe,stroke:#0ea5e9
style S fill:#dbeafe,stroke:#0ea5e9
- Taxonomy: Defines sectors, landscapes, and segments (taxonomy and sector definitions).
- Search: Exa or Tavily for web search; internal tools and agents to search the taxonomy.
- Validation: Rule-based checks (e.g. URL, defunct, duplicate).
- Top Orgs Judge: LLM scoring on six dimensions; uses the validation tool.
- Logo: Logo.dev and CDN for fetching and serving logos.
- Product Scoring: Sector-specific rubrics only.
- Product Image: If it is physical goods then we use- Exa. And if it is digital goods then we fetch the company logo using Logo.dev & CDN.
- Verification: Verify website and social activity for active/inactive checks.
Verification runs outside the main discovery pipeline; URL checks in the org flow are done via the validation tool.
Other agents and tools
Additional agents support landscape structure, needs, market sizing, and service discovery. They use internal tools for landscape and segment reference, market reports and sizing, link validation, and economic research. We do not list every tool or dimension here; the pipeline above covers the core flow from taxonomy through to product images.
High-level architecture
At a systems level, this workflow is a set of AI agents talking to tools and a data store rather than a single black-box model. The diagram below shows the main pieces without going into implementation details.
flowchart LR
U[Analysts and product teams] --> QP[Queries and prompts]
QP --> ORCH[Agent orchestrator]
ORCH --> AGENTS["Discovery, Judge, Scoring,
Validation, Enrichment agents"]
AGENTS --> TOOLS["Search, validation, logo and
image APIs, pricing and market tools"]
AGENTS --> DATA["Structured market
data store"]
DATA --> UI["Internal and customer-
facing views"]
style ORCH fill:#e0f2fe,stroke:#0ea5e9
style AGENTS fill:#dbeafe,stroke:#0ea5e9
Where we use LLM judges
- Judge Agent: Scores discovered organizations on six dimensions, applies a weighted overall score, and uses the validation tool for URL and rule-based checks. We filter to orgs above a confidence threshold before logo enrichment. Think of it as a strict quality inspector on the factory line that rejects bad rows before they can reach your table.
- Scoring Agent: Scores entities (organizations or products) on multiple dimensions using sector rubrics. It uses search and LLMs to research and output scores plus short rationales. We use it to select top products (and can use it for orgs) after discovery. You can think of it as a structured reviewer that reads across sources and leaves a short, scored review for each entity.
Both return structured JSON so we can store scores, filter by threshold, and trace decisions.
Guardrails, quality, and human review
Although the workflow is agentic and automated, it is designed to be conservative about what it accepts. Validation tools check links and basic rules, Judge and Scoring agents work from multiple sources instead of a single page, and outputs are kept in structured form so we can audit decisions later.
Human analysts remain in the loop where it matters most. They design the taxonomy and rubrics, seed high-value segments, and spot-check entities and products that the system promotes. When something looks off, they can trace how it was scored, correct the record, and update the underlying rules.
In practice, this means we bias toward avoiding bad data over capturing every possible record, especially in new or sensitive markets.
Unit economics of the workflow
We track the end-to-end cost of generating each organization or product record in the database. From search and discovery through scoring, validation, and asset enrichment, we measure latency and token usage per run so we can keep unit economics predictable as we scale. That visibility lets us tune prompts, choose when to run parallel agents, and align cost with the value of each new data point.