Build Log #007: From 58 Seconds to 15 — How We 4x'd Our Site Analysis

Our land feasibility analysis called 22 external APIs sequentially, took nearly a minute, and users were bouncing. We parallelized everything, killed a retry storm, and shipped a 4x speedup in 8 commits. Here's how the sausage got made.

This is an entry in our build log — an honest engineering journal of building AI-powered tools for real estate and land analysis. Not the polished version. The actual story.

The Problem: A Minute of Staring at a Spinner

LandPlanner analyzes any US parcel across 41 data sources — soils, zoning, flood risk, demographics, crop economics, environmental hazards, the works. It's a lot of data. And until last week, fetching all of it took 58 seconds.

Fifty-eight seconds. In 2026. For a web app.

We knew it was slow. We'd been telling ourselves "we'll optimize it later" for weeks. Then we watched a session recording of someone search for a parcel, wait about 20 seconds, and close the tab. That's the kind of user research that hits different.

Step 1: Find Out Where the Time Actually Goes

The analyze endpoint in site_analysis.py is... substantial. 5,684 lines. It calls everything from USDA soil surveys to Walk Score to FEMA flood maps. And it does it in two sequential waterfalls — one block of ~20 services, then another block of ~25.

Sequential. One after another. Waiting for soils to finish before starting zoning. Waiting for zoning before demographics. Like doing your grocery shopping by driving to each farm individually.

We built a PerfTimer utility and wrapped every service call with timing. First real numbers:

crop_economics: 9.0s — the USDA NASS API, looping over crops one at a time
CropScape: 52s — yes, fifty-two seconds, on a service that was already broken
demographics: 3.9s (uncached)
topography: 3.3s (uncached)
brownfields: 2.1s (uncached)
Everything else: 0.1-1.0s each, but there are twenty of them

The CropScape number jumped off the screen. 52 seconds for a single service? Turns out the USDA CropScape CDL API has been intermittently broken since early 2026 — returning "Error: Failed to get value" for everything. Our code was hitting it, getting an error, retrying with backoff, hitting it again, retrying again... a full retry storm against a dead endpoint. For every single analysis.

Step 2: Kill the Retry Storm

First commit: circuit breaker for CropScape. If the API returns an error, mark it as down and skip it for subsequent requests. Don't retry a service that's been broken for a month.

52 seconds → 2.7 seconds. One line of logic.

This is the kind of optimization that makes you want to throw your keyboard. We'd been burning almost a minute per request on a service that wasn't even returning data. The retry configuration was set to 2 attempts with exponential backoff — reasonable in isolation, disastrous when the upstream is a smoking crater.

Lesson: Retry logic without circuit breakers is just automated self-harm. If a service is down, retrying it faster doesn't make it less down.

Step 3: Tighten Every Timeout

With CropScape handled, we audited every connector timeout. Twenty-three services had timeouts between 8 and 30 seconds. For services that typically respond in under a second, a 30-second timeout means a single slow response holds up the entire pipeline.

We dropped them all to 3-5 seconds and reduced max retries from 2 to 1. If a soil survey doesn't respond in 5 seconds, we're not going to get a better answer by waiting 25 more.

Step 4: Parallelize Everything

This is the big one. We added a parallel prefetch system using Python's ThreadPoolExecutor. At the top of the analyze function, before any sequential logic, we fire off 22 service calls simultaneously:

Soils, zoning, demographics, wetlands, topography, flood zones, utilities, hazards, air quality, noise, habitat, market data, Walk Score, climate risk, brownfields, construction costs, schools, CropScape, growing season, irrigation, traffic, and crop economics — all at once.

The results get cached in a prefetch dictionary. When the sequential code later "calls" each service, our PerfTimer.timed() wrapper checks the cache first. If the result is already there (and it usually is, because the prefetch started 5 seconds ago), it returns instantly.

There was a fun bug here: Python lambda closures in a loop don't capture the loop variable by value. They capture the reference. So all 22 prefetch lambdas were calling the last service in the list. Classic Python footgun. Fixed with default parameter binding: lambda s=service: s.fetch().

Step 5: The Crop Economics Deep Dive

After parallelization, crop_economics was still the bottleneck at 9 seconds. The NASS API returns data one crop at a time, and our code was looping sequentially over each crop found on the parcel — corn, then soybeans, then wheat, then alfalfa.

Same fix, smaller scale: ThreadPoolExecutor inside the crop economics service, fetching all crops in parallel. A parcel with 4 crops now takes as long as the slowest crop lookup instead of the sum of all of them.

9 seconds → 2 seconds.

Step 6: Progressive Loading (The Perception Hack)

While we were doing real optimization, we also shipped a perception improvement. The moment we have a geocode result — parcel ID, address, acreage, coordinates — we show it. Immediately. Before any analysis starts.

Users see useful information in under a second, with a spinner and "Analyzing site data..." for the rest. It's the same total wait time, but the psychological difference is enormous. Seeing something vs. seeing nothing completely changes whether 15 seconds feels long.

The Results

Eight commits over two days:

Timeout reduction (23 connectors, 8-30s → 3-5s)
PerfTimer utility + timing wrappers
CropScape circuit breaker (52s → 2.7s)
NASS timeout reduction
Parallel prefetch for 22 services
Fixed missing imports for prefetch
Fixed lambda closure scoping + lazy import shadowing
Parallelized per-crop NASS lookups (9s → 2s)

Final timing on a real parcel (Herriman, Utah):

Prefetch block: 5.5s (22 services in parallel)
Extended payload: 3.5s (solar, crime, demographics)
Crop economics: 2.0s
SHAP explanation: 1.1s
Total: 14.8 seconds

58 seconds → 14.8 seconds. A 4x speedup.

The theoretical floor is about 9 seconds, limited by the slowest external API (NASS). We could get closer with response caching (repeat searches would be sub-second) and SSE streaming (render cards as data arrives). Those are next.

What We Learned

The biggest gain came from the dumbest bug. Fifty-two seconds of retry storms against a dead USDA endpoint. That's not a performance optimization — that's just removing self-inflicted damage.

The second biggest gain came from something we should have done from day one: not calling 22 APIs sequentially when none of them depend on each other. This isn't clever engineering. It's embarrassingly obvious in hindsight.

The actual clever bit — the prefetch cache pattern, the timing infrastructure, the nested parallelism for crop lookups — those are satisfying. But they account for maybe 30% of the improvement. The other 70% was just... stopping doing dumb things.

That's performance optimization in a nutshell, honestly. Most of the time, your code isn't slow because it needs a clever algorithm. It's slow because it's doing something pointless, and you haven't looked closely enough to notice.

Lesson: Before you parallelize, profile. Before you cache, profile. Before you rewrite, profile. Half of our speedup came from deleting bad behavior, not adding good behavior. The PerfTimer was the most important commit in the entire series — not because of what it did, but because of what it showed us.

This is Build Log #007. We publish these as we build — the real engineering stories behind production AI and data systems. If you're dealing with slow pipelines, external API spaghetti, or just want to talk about making things faster, we'd love to hear from you.