How we measure AI exposure, YubHub

So, if you're looking at /live and wondering how I'm scoring AI exposure on real job listings, this is the page that explains it. It's not magic and it's not opinion. It's three datasets joined together. I'll show you the joins, the matching logic, and the bits I haven't fixed yet.

The dataset that does the heavy lifting

The Anthropic Economic Index is the bit that makes this possible. Anthropic publishes, yes, publishes, on Hugging Face , a sample of what people are using Claude.ai for, tagged to occupations.

They take a sample of conversations, classify each one against O*NET's task statements (the granular tasks that make up a job), and aggregate up to the occupation level. The number they publish, they call it observed_exposure, is the share of conversations matched to that occupation's tasks.

So when AEI says Customer Service Representatives = 0.70, it means 70% of the work that occupation does shows up in Claude conversations. That's a measurement, not a prediction.

The current release is from 24 March 2026, with data sampled between 5 and 12 February 2026. 756 occupations covered. The median exposure across all of them is zero, most jobs don't have any measurable Claude usage. The top is Computer Programmers at 0.745 followed by Customer Service Representatives at 0.701.

I credit Anthropic generously here because what they've published is unusual. Most companies measuring AI usage keep that data. They've made it CC-BY and put it on Hugging Face with documentation. Pull the file yourself , job_exposure.csv is 37 KB and indexed by SOC code.

What's a SOC code?

The Standard Occupational Classification is the US government's taxonomy for jobs. Every occupation gets a 7-character code like 15-1252 (Software Developers) or 43-4051 (Customer Service Representatives). The full taxonomy has 867 entries, organised into 23 major groups. Boring, but it's the lingua franca of labour-market data.

AEI is keyed on SOC codes. So is the BLS Occupational Employment Survey, the OECD's AI exposure work, Felten/Raj/Seamans' AIOE, basically all serious labour market datasets. If you want to join job-listings data to any of this, you need to map your job titles to SOC codes. That's the hard bit.

Matching messy job titles to SOC codes

O*NET publishes two title files we use as a crosswalk:

Sample of Reported Titles, ~7,950 commonly reported titles → SOC. Curated, higher quality.
Alternate Titles, ~57,500 lay alternative titles → SOC. Broader coverage, more noise.

So that's about 65,000 title→SOC mappings to start with. Both files are at onetcenter.org, current release O*NET 30.2 (April 2026).

But, real-world job titles are far messier than O*NET's catalogue. We have things like "Senior FullStack Engineer: Offsite Discovery" and "Manager, Field Engineering" and "Member of Technical Staff - Multimodal - MAI Superintelligence Team." The exact-match strategy gets you maybe 15% coverage on a real dataset.

So I run a cascade of progressively-fuzzier candidate strings against the lookup map, in priority order:

The full normalised title (lowercase, strip punctuation/parentheticals/level modifiers).
Substring before the first comma, handles "Account Executive, Foo Bar".
Substring before the first dash, pipe, or slash, handles "Engineer - AI Platform".
Substring before the first parenthesis, handles "Engineer (Remote)".
Tail-noun: last 2, 3 or 4 words, handles "Senior Backend Software Engineer" → "software engineer".
Head-noun: first 2, 3, 4 or 5 words.

On top of that, a small manual override table for modern tech titles that don't exist in O*NET yet , things like "backend engineer", "site reliability engineer", "ML engineer", "member of technical staff." O*NET's taxonomy has a structural lag of a few years.

Two important guardrails:

Single-word lookup keys are blocked, otherwise "manager" maps to one of the SOC entries that happens to have just "manager" in its alias list, and suddenly half your dataset is "Property Managers." Found that one in production.
Non-English titles (German, French, Portuguese, mostly automotive internships) are skipped cleanly rather than mis-matched.

How well does it work?

Right now, of 15,290 enriched job listings:

9,126 have a clean SOC match and an AEI exposure score (~60%).
The matched jobs span 291 distinct occupations.
Roughly 6% of listings are non-English and skipped without trying.
The remainder are specialty titles that don't have an obvious O*NET equivalent, videogame testers, motorsport engineers, "Cinematic Lighter," that kind of thing.

For those niche misses, an AI fallback (asking Claude to map title → SOC) would push coverage above 90%. That's a future workstream. For now I'd rather have 58% clean than 90% with garbage in it.

Known limitations

Three things I want to be upfront about.

Exposure is not substitution. A high score means Claude is being used for that occupation's tasks , it doesn't mean those workers are being replaced. Klarna learned this in 2024–25 with their customer service reversal (replaced 700 agents in early 2024, quietly walked it back in mid-2025 when CSAT collapsed). The AEI exposure score is a usage measurement, not a labour-market forecast.

Skill exposure ≠ occupation exposure. The skill "customer service" appears in 262 listings across our dataset, weighted exposure roughly 0.14, because most of those listings are sales reps and account managers, not actual CSR roles. The occupation "Customer Service Representatives" is at 0.70 across 79 dedicated jobs. Both numbers are real, both are useful, both are on the /live page. Just don't conflate them.

The AEI sample skews toward Claude users. Anthropic's data is from people who chose to use Claude.ai , not a representative sample of the entire workforce. So an occupation could have low exposure here and still be heavily exposed to ChatGPT, Gemini, or in-house enterprise AI. AEI is the best public dataset I have. It is not the whole story.

The code

The matching pipeline runs locally as a one-shot backfill script (Node, ~250 lines). It reads the O*NET title files, builds the candidate-cascade lookup, queries the live D1 database for jobs with no SOC code, runs every job through the cascade, and writes the matches back in batched UPDATEs. Then a Cloudflare Worker endpoint (/stats/skill-ai-exposure and /stats/occupation-ai-exposure) serves the aggregate stats to the /live page.

The data files are vendored at v2/data/ in the repo so the analysis is reproducible. AEI is updated quarterly-ish, I'll re-run the backfill when the next release drops.

Sources, in full

Anthropic Economic Index, release 2026-03-24, data window 5–12 February 2026. Hugging Face dataset · March 2026 report ("Learning curves") · January 2026 report ("Economic primitives").
O*NET 30.2 (April 2026), Sample of Reported Titles + Alternate Titles. onetcenter.org.
Felten, Raj, Seamans (2021), AI Occupational Exposure (AIOE), the original SOC-indexed exposure dataset. Used as a cross-validation reference. GitHub repo.
WEF Future of Jobs Report 2025, PwC Global AI Jobs Barometer 2025, used as the framework for the wage-premium and skill-churn callouts on /live.
YubHub job listings, 15,290 enriched listings across 814 companies, scraped and AI-enriched continuously. The dataset behind the joins.

Spotted something wrong? Found a better dataset? Tell me.