Build Log #012: Why Every API Call Took 69 Seconds (It Was the Corn)

We enriched our site analysis with USDA crop economics data. Then every single API call started taking over a minute. The fix was four changes, one of which was embarrassingly obvious.

Build logs are our honest engineering journal. Not the polished case study — the actual "why is everything slow" reality of building data systems at scale.

The Feature That Ate Our Latency Budget

Our site analysis API hits over 40 data sources per call — FEMA flood zones, USGS seismic data, NREL solar irradiance, soil surveys, the works. A few months ago we added agricultural features: crop economics, suitability scores, growing season data, conservation eligibility. Real farming intelligence sourced from the USDA's NASS QuickStats API.

The ag data was good. Really good, actually. Yield per acre, price per bushel, planted acreage — commodity by commodity, county by county. It's the kind of data that makes a feasibility report actually useful to someone evaluating agricultural land.

There was just one problem: it was making every API call take 69 to 84 seconds.

The Sequential Sin

Here's how nass_quickstats.py worked when we first wrote it:

Look up the state and county for the coordinates
Query NASS for corn yield. Wait for response.
Query NASS for corn price. Wait for response.
Query NASS for corn acreage. Wait for response.
Query NASS for soybeans yield. Wait.
Query NASS for soybeans price. Wait.
Query NASS for soybeans acreage. Wait.
Repeat for wheat, hay, cotton, rice...

Each query to NASS took 2–4 seconds. For a county growing 6–8 commodities, that's 18–24 individual HTTP requests, executed one at a time. In series. Like it's 2003 and we've never heard of concurrency.

The NASS API doesn't support batch queries. You can't ask "give me everything for Polk County, Iowa" — you ask for one statistic at a time. So every commodity needed three separate round trips: yield, price, acreage. And we were making them sequentially because that's how the prototype was written and nobody went back to fix it.

Here's the really painful part: this was happening on every single analysis call. Running a feasibility report on a suburban lot in Phoenix? We were still querying NASS for crop data in Maricopa County. A parking lot in Manhattan? Here, let us check what corn is doing in New York County. It always came back empty, but it still took 15–20 seconds to confirm that, yes, there is no agriculture happening in midtown.

Four Fixes, One Afternoon

Once we actually profiled the endpoint (something we should have done weeks ago), the fix was obvious. Well — four fixes.

Fix 1: Redis Cache (30-Day TTL)

Crop economics data doesn't change daily. Corn yields in Story County, Iowa are published annually. There's zero reason to hit the NASS API every time someone analyzes a parcel in the same county.

cache_key = f"nass:{state}:{county}:{commodity}"
cached = redis.get(cache_key)
if cached:
    return json.loads(cached)
# ... actual API call ...
redis.setex(cache_key, 86400 * 30, json.dumps(result))

30-day TTL. The data is annual at best. If anything, 30 days is conservative. First call to a county still pays the API cost. Every subsequent call for any parcel in that county? Instant.

Fix 2: Parallel Queries

For uncached commodities, we stopped waiting for corn to finish before asking about soybeans.

with ThreadPoolExecutor(max_workers=8) as pool:
    futures = {
        pool.submit(fetch_stat, state, county, commodity, stat_type): (commodity, stat_type)
        for commodity in commodities
        for stat_type in ["yield", "price", "acreage"]
    }

All queries for all commodities fire simultaneously. Total wait time is now the duration of the slowest single query, not the sum of all queries. For 8 commodities × 3 stats = 24 queries, this alone turned 60+ seconds into 3–5 seconds.

Fix 3: Aggressive Timeouts, No Retry Sleeping

The original code had a 10-second timeout per request with a retry that slept for 2 seconds between attempts. Government APIs are flaky — that's just reality — but sleeping inside a synchronous loop was compounding the problem. We dropped the timeout to 3 seconds and removed the retry sleep entirely. If NASS doesn't respond in 3 seconds, we move on. The data is nice to have, not mission-critical. A feasibility report without soybean prices is still a feasibility report.

Fix 4: The Obvious One

Don't query crop data for non-agricultural parcels.

if parcel.zoning_class not in AGRICULTURAL_ZONES:
    return {}  # skip NASS entirely

This is the one that's embarrassing. We have zoning classification data. We know whether a parcel is zoned agricultural, residential, commercial, industrial. We were querying USDA crop economics for every single parcel regardless of zoning. Shopping malls. Apartment complexes. Office parks. All of them dutifully waiting 15+ seconds to learn that, shockingly, there is no corn growing in the parking lot of a Target.

One if statement. That's the fix for the majority of our users' experience, because most feasibility analyses aren't for farmland.

The Results

Scenario	Before	After
Suburban residential	69–84s	~0s (skipped)
Agricultural, cached county	69–84s	~0s (cache hit)
Agricultural, uncached county	69–84s	3–5s

The majority of our API calls are for non-agricultural parcels. For those users, NASS went from a 60+ second tax on every request to literally zero. For the ag users — who actually care about crop data — the first call to a county takes 3–5 seconds (parallel queries) and every subsequent call in that county is instant (cache).

Total endpoint time dropped from 69–84 seconds back to the 11–15 second range we'd achieved after the spatial index fix. We gave back every second of that hard-won optimization by bolting on a feature without thinking about its performance characteristics.

Why This Keeps Happening

If you've been following these build logs, you might notice a pattern: we build a feature, it works, we ship it, and then weeks later we discover it's been silently murdering performance. The spatial index. The backup retention. Now NASS.

The root cause is the same every time: we wrote the naive version first — sequential queries, no caching, no short-circuiting — and then moved on to the next feature before optimizing the last one. In a two-person operation where the priority is always "build the next thing," performance work feels like a luxury. Until it isn't.

The lesson isn't "always optimize everything." We'd never ship anything. The lesson is: when you add a feature that calls an external API in a loop, put a timer on it and look at the number. If we'd added a single logger.info(f"NASS total: {elapsed:.1f}s") when we deployed the ag features, we'd have caught this immediately. Instead, it sat there for weeks, making every API call take over a minute, and we just... didn't notice. Because we weren't measuring.

If you're not timing your external API calls, you don't know how fast your application is. You know how fast it was when you last checked.

This is Build Log #012. We publish these as we build — the real engineering stories behind production data systems. No, we're not embarrassed about the zoning check. Okay, a little. If you're building something that calls government APIs and want to commiserate about latency, get in touch.