Build Log #002: We Were Showing Fake Data With 85% Confidence Scores

During a code audit of our own AI platform, we found a function that fabricated property owner names when real data wasn't available. It displayed them with confidence badges. Here's the story, and why the fix matters more than the bug.

Build Log is our engineering journal — the real stories behind building production AI systems. Read #001 here.

What We Were Building

We run a production SaaS platform called LandPlanner.ai — instant site feasibility analysis for real estate professionals. You enter an address, and it pulls data from 15+ federal and state APIs (USDA, FEMA, USGS, NOAA, EPA, Census, and more), runs ML predictions, and generates a comprehensive site analysis report.

One component of the report is property ownership data — who owns the parcel, tax assessment values, that kind of thing. This data comes from county assessor records via a third-party API called Regrid.

Except sometimes the Regrid API isn't available. Maybe the API key isn't configured. Maybe the county doesn't have data. Maybe the service is down.

What should happen when the data source is unavailable? The answer seems obvious: show nothing. Display a message like "ownership data unavailable" and move on.

That is not what was happening.

What We Found

During a code audit, we found a function called _generate_fallback() in the property ownership service. When the real data source wasn't available, this function didn't return empty results. It fabricated them.

It generated fake owner names. Fake tax assessment values. Fake acquisition dates. And it attached a confidence score of 0.85 to all of it.

Let that sink in. A user runs a feasibility analysis on a real property. They get a report that says "Owner: James R. Mitchell, Tax Assessment: $342,000, Confidence: 85%." They make business decisions based on this. They call James R. Mitchell.

James R. Mitchell doesn't exist. The system made him up.

How This Happens

Before you think "who would write something this irresponsible" — understand how this kind of bug is born. It almost certainly started as a development placeholder. Someone needed the ownership card to render while building the UI. They wrote a fallback function that generated realistic-looking data so the frontend would display properly. Standard dev practice.

Then it shipped. The placeholder became production code. Nobody went back to remove it because it "worked" — the card always rendered, the UI looked complete, QA didn't flag it because the data looked plausible.

The confidence score is the worst part. It wasn't calculated from anything. It was hardcoded. 0.85. Just high enough to seem reliable but not suspiciously perfect. Someone actually thought about what number would look convincing.

This is the kind of thing that keeps me up at night about AI systems in production.

The Decision

We had three options:

Improve the fallback. Use public records, name databases, or other data sources to make educated guesses. Label them as estimates.
Keep the fallback but lower the confidence. Show the generated data but with a 0.20 confidence score and a "synthetic estimate" label.
Remove it entirely. When real data isn't available, return nothing. Show "ownership data unavailable."

We chose option 3. Here's why.

Option 1 sounds reasonable until you think about what it means. You're building a system that guesses who owns a property. No matter how good your guessing gets, it's still a guess — and it's a guess your user will treat as fact because it appears in an "analysis report" generated by a "platform."

Option 2 is the coward's compromise. A 0.20 confidence score on fabricated data is still fabricated data. And users don't read confidence scores. They read names and numbers. "James R. Mitchell" with a tiny disclaimer is still "James R. Mitchell" in the user's mind.

Option 3 is the only honest answer. If you don't have the data, say you don't have the data. A blank section with "ownership data unavailable — connect Regrid API for property records" is infinitely more trustworthy than a convincing lie.

The Fix

Four lines of code. The _generate_fallback() function that previously returned fabricated names, values, and dates now returns (None, None, None). The frontend already handled null values gracefully — it just showed an empty state. The fallback function had been preventing that empty state from ever appearing.

Four lines. That's all it took. The hard part wasn't the code — it was finding it, understanding the implications, and making the decision.

Why This Matters Beyond Our Platform

This pattern is everywhere in AI systems. Not always this blatant, but the same fundamental mistake: filling gaps in data with plausible-looking fabrications instead of admitting uncertainty.

Every RAG system we build has to grapple with this. What happens when the retrieval step doesn't find relevant chunks? What happens when the LLM isn't confident in its answer? The temptation is always to generate something — because an empty response feels like a failure.

It's not. An empty response when you don't have the answer is the system working correctly. A fabricated response when you don't have the answer is the system lying to your users.

This is one of the core principles we build into every system at Deep Conduit:

"I don't know" is a feature, not a bug. Every AI system we deploy is designed to say "I don't have that information" when it doesn't. We'd rather show a gap than fill it with a confident lie. Your users will trust the system more when it admits its limits — not less.

The Checklist

After this incident, we added a step to every code audit and every new deployment:

Trace every data display back to its source. If a UI component shows data, can you trace exactly where that data came from? If the trace dead-ends at a "generate" or "fallback" function, that's a red flag.
Test with data sources disabled. Turn off each API, database, and external service one at a time. What does the system show? If it still shows "data," where is that data coming from?
Audit every confidence score. Is it calculated from something real (model output, similarity score, data completeness metric)? Or is it hardcoded? Hardcoded confidence scores are lies wearing a lab coat.
Check for "realistic" test data in production code. Grep for common placeholder patterns — fake names, fake addresses, sequential numbers. Dev fixtures that escape into production are a classic source of fabricated data.

The Uncomfortable Truth

We caught this because we were auditing our own code. We were specifically looking for problems. Most companies don't do this. Their AI systems have fallback functions generating plausible-looking data right now, and nobody knows.

If you're running AI systems in production — or buying AI services from someone — ask this question: "What does the system do when it doesn't have the answer?"

If the answer is anything other than "it tells you it doesn't have the answer," you have a problem you don't know about yet.

This is Build Log #002. We publish these because building trustworthy AI means being honest about the ways it can go wrong. If you want help auditing your AI systems or building ones that are honest by design, let's talk.