The Setup That Looked Right
We run an AI agent that qualifies inbound sales leads. The architecture was straightforward and, on paper, solid:
Layer 1: Apollo enrichment. Every lead gets enriched with company data — employee count, industry, domain, headquarters. This is standard. Apollo is one of the most widely used enrichment APIs in sales automation.
Layer 2: Claude verification. The enriched data gets passed to Claude, who evaluates whether the company fits our ideal customer profile. Tech or software companies, 5 to 500 employees, not in blocked categories like staffing agencies or financial institutions.
Two layers. Structured data feeding an LLM. The kind of pipeline that looks great in an architecture diagram.
It was quietly throwing away good leads for weeks.
The Hard Filter Problem
The first issue was obvious once we looked for it. Before Claude ever saw a lead, there was a hard filter on Apollo’s employee count: reject anything below 5 or above 500 employees.
Simple enough. Except Apollo’s employee counts are frequently wrong.
A 60-person SaaS company shows up as 2 employees in Apollo. Hard-rejected. A bootstrapped startup with 40 engineers and no Apollo profile shows null. Rejected. A company that recently acquired another and doubled in size still shows last year’s number. Rejected or approved based on stale data.
These weren’t edge cases. Apollo’s data quality for small and mid-market companies — exactly our ICP — is inconsistent enough that the hard filter was systematically biased against the companies we most wanted to reach.
But the hard filter wasn’t the real problem. It was just the loud one.
The Anchoring Problem
The subtler issue was happening inside the LLM layer. When a lead passed the hard filter, its Apollo data got packaged into Claude’s prompt:
“Apollo says: 350 employees. Industry: Information Technology. Domain: example.com.”
Claude would then evaluate the company. And here’s where it broke down: Claude anchored on whatever number Apollo provided.
Anchoring is one of the most well-documented cognitive biases, and LLMs exhibit it too. When you put a number in the prompt — even a wrong number — the model’s reasoning gravitates toward it. “Apollo says 350 employees” becomes the starting point, and Claude adjusts from there rather than reasoning from scratch.
For well-known companies, this was fine. Claude knows Shopify has thousands of employees regardless of what Apollo says. But for the long tail — the 80% of companies Claude has partial or no knowledge about — Apollo’s number became the anchor.
A company Apollo listed at 10,000 employees (wrong — it was a 50-person subsidiary sharing a parent company’s profile) would get flagged as too large, even though Claude had no independent information to confirm that number. Claude treated Apollo’s data as ground truth because it was the only number in the prompt.
We weren’t using AI to verify enrichment data. We were using enrichment data to bias the AI.
The Counterintuitive Fix
We tried removing Apollo’s employee count from the pipeline entirely.
No hard filter on count. No employee number passed to Claude. The prompt became:
“Company: [name]. Domain: [domain]. Industry: [industry, if available]. Based on your knowledge, does this company fit our ICP?”
Claude now operates on three tracks:
Recognizes the company: Estimates size from its own training data, evaluates accordingly. For well-known companies, this is more accurate than Apollo’s often-stale records.
Doesn’t recognize the company: Assumes it’s a small company (20-50 employees) and proceeds. This is the right default for our ICP — we’d rather evaluate an unknown small company than reject it based on missing data.
Blocked category: Rejects regardless of size. Staffing agencies, recruiting firms, banks, non-tech corporations. Claude is good at category classification even for companies it doesn’t recognize, based on the domain and name alone.
The result: more legitimate small tech companies getting through the pipeline. Fewer false rejections. Better qualification decisions — with less data in the prompt.
”But Claude Could Be Wrong Too”
The obvious objection: if Apollo’s data is unreliable, why is Claude’s knowledge any better? Its training data has a cutoff. Companies change size. New startups don’t exist in the training set.
Fair. But this isn’t about which source is more accurate. It’s about which source fails better.
Apollo’s failure mode is silent and confident. It returns “2 employees” with no uncertainty signal. Your pipeline treats it as fact. The hard filter rejects the lead. The LLM never sees it. A good company disappears from your funnel and nobody knows it happened. There’s no log entry that says “rejected due to bad enrichment data.” It just looks like a lead that didn’t qualify.
Claude’s failure mode is explicit and cheap. When Claude doesn’t recognize a company, it says so. The system’s response to “I don’t know this company” is: assume small, proceed. The lead moves forward. The worst case is spending an extra cycle evaluating a company that turns out to be outside ICP. That’s a low-cost mistake — a few cents of API time and a few seconds of pipeline processing.
Compare the costs:
- Apollo wrong → good lead permanently rejected → lost revenue, invisible to metrics
- Claude wrong → marginal lead gets evaluated → small cost, visible in pipeline
You’re not trusting Claude to be more accurate than Apollo. You’re trusting it to fail in a direction that costs less. Silent false rejections are expensive. Cheap false positives that get caught downstream are not.
This is the principle: when choosing between data sources in an AI pipeline, optimize for failure mode, not accuracy. The source with the cheaper, more visible failure mode is the safer default — even if its accuracy is comparable or slightly worse.
Why Less Data Made Smarter Decisions
This felt wrong at first. Isn’t the whole point of enrichment to give the AI more context? Isn’t more data always better?
No. And this is the lesson that generalizes beyond our specific use case.
Enrichment data has error rates. Every third-party data provider has accuracy gaps. Apollo, Clearbit, ZoomInfo — they all have stale records, missing fields, and incorrect attributions. For enterprise companies, the data is usually good. For small and mid-market companies, it’s a coin flip.
LLMs amplify data errors through anchoring. When you put a number in a prompt, the model treats it as a reference point. If the number is wrong, the model’s reasoning starts from a wrong place. Unlike a human analyst who might say “that number doesn’t seem right,” an LLM has no independent way to flag that an enrichment value contradicts its own knowledge — because the enrichment value is in the prompt, and the prompt is treated as authoritative context.
The chain of confidence is invisible. In a multi-layer pipeline, each layer assumes the previous layer’s output is trustworthy. The hard filter trusts Apollo’s number. Claude trusts whatever survived the hard filter plus whatever’s in the prompt. Nobody in the chain is asking “how confident am I in this data?” — the architecture assumes confidence implicitly.
This is the same problem that shows up in RAG pipelines when retrieved documents are wrong but the LLM treats them as ground truth. It’s the same problem in agentic workflows when a tool returns bad data and the agent builds its next action on top of it. The pattern is universal: upstream data errors don’t attenuate in AI pipelines — they amplify.
A Framework for Data Hygiene in Agent Workflows
After this experience, we now evaluate every data source in our agent pipelines against three questions:
1. What’s the error rate for my specific segment?
Apollo might have 95% accuracy for Fortune 500 companies and 60% accuracy for companies with 10-100 employees. If your ICP is the second group, that 60% isn’t a minor limitation — it’s a systematic bias in your pipeline. Know the error rate for your use case, not the provider’s headline accuracy.
2. Will the LLM anchor on this data or reason around it?
If you’re passing a number, the model will anchor on it. If you’re passing a category or description, the model has more room to reason independently. Employee count is a hard anchor — the model uses it as a reference point. Industry classification is a soft anchor — the model can cross-reference it against other signals.
The question is: if this data point is wrong, will the model still reach the right conclusion? If not, consider omitting it.
3. What’s the default when data is missing?
Missing data is inevitable with enrichment APIs. The question is what your pipeline does when a field is null. If the default is “reject” or “skip,” you’re systematically excluding the exact companies that are most likely to be underserved by the data provider — often small, new, or niche companies.
We changed our default from “reject if unknown” to “assume small and proceed.” The worst case is evaluating a few extra companies. The best case is catching the ones that every competitor’s pipeline is filtering out.
The Broader Point
The AI agent ecosystem is moving fast. Every week there’s a new tool, a new API, a new enrichment provider to plug into your workflow. The instinct is to add more data sources, more layers, more signals.
But every data source you add is a dependency. And every dependency has an error rate. When those errors feed into an LLM, they don’t just pass through — they shape the model’s reasoning in ways that are hard to detect and harder to debug.
Sometimes the highest-leverage improvement to your AI pipeline isn’t adding a new data source. It’s removing one that’s doing more harm than good.
More data isn’t always better data. Especially when an AI is making decisions based on it.