Frontier models have eaten the public internet. The next 10× in capability does not come from another trillion text tokens — it comes from data the internet never had: pixels from orbit, lidar point clouds from real roads, glucose curves from real bodies, vibration spectra from real bearings, neural spikes from real cortex, mass spectra from real proteins, AIS pings from real ships. Whoever owns the network that generates that data, or the instrument that resolves that data, owns a moat that scaling laws cannot dissolve.
The investable shape of this thesis splits into two categories with very different return profiles. Sensor monopolies (Sony image sensors, KLA semiconductor metrology, Lasertec EUV inspection) are mature, defensible, but mostly priced — KLA is already +182% YoY. Proprietary real-world dataset owners (Samsara fleet telematics, Mobileye REM maps, Tesla FSD video, Tempus oncology multimodal, Verisk insurance loss data) are less obvious, more contested, and where the asymmetric AGI optionality lives. The market has noticed Planet Labs (+1,478% YoY!) but has not yet noticed Samsara or Tempus.
Not all sensor data is equally scarce. Ranking by hard-to-replicate × AGI training value:
| Data type | Why scarce | AGI utility | Who owns the moat |
|---|---|---|---|
| Longitudinal real-world driving video + label | Requires fleet of millions of cameras over years; can't be synthesized convincingly (edge cases) | World model training, robotics policy, sim2real grounding | Tesla (FSD fleet), Mobileye (REM), Waymo |
| Industrial fleet telematics (long-tail edge cases) | Requires installed base of millions of trucks/equipment over a decade | Embodied AI for logistics & operations, predictive maintenance models | Samsara (IOT), PCAR/CMI dealer networks |
| Multimodal clinical (genome + imaging + EMR + outcome) | HIPAA / regulatory; consent; reimbursement infra | Drug discovery foundation models, diagnostic copilots | Tempus (TEM), Flatiron (Roche), Foundation Medicine |
| Daily-revisit high-resolution Earth imagery | Capital + launch + downlink + dwell time; ITAR for <30cm | Geospatial foundation models, defense intel, climate, commodities | Planet (PL), BlackSky (BKSY), Maxar/Vantor (private), Capella (private) |
| Continuous physiological signals (CGM, HR, sleep) | Installed-base wearables; FDA-cleared accuracy | Health agent foundation models, longevity, drug efficacy | Dexcom (DXCM), Abbott (ABT), Apple, Garmin (GRMN), Oura (private), Whoop (private) |
| Atomic-scale process metrology (semis) | Decades of process know-how, $50M+ tools, customer co-development | Closing the loop on chip self-improvement (RSI requires this) | KLA (KLAC), Lasertec (6920.T), ASML's own metrology |
| High-channel-count neural spikes | FDA implants; surgical pipeline; multi-year safety data | Direct neural readout/control; the ultimate human-data primitive | Neuralink, Synchron, Paradromics (all private) |
| Real-time commodity flow (AIS+terminals+pipelines) | Decades of relationships with terminals, satellite AIS, vessel taxonomy | Trading agents, macro models | Kpler (private), Vortexa (private), S&P Commodity Insights |
| Mass spectra / single-cell omics | Specialized instruments + sample prep + chain of custody | Cell models, drug screening, protein design | Thermo (TMO), Danaher (DHR), Bruker (BRKR), 10x Genomics |
| Real-time financial tape | Exchange monopoly; consent of issuers | Execution agents, alpha-bearing data; AI license revenue | CME, ICE, MSCI, S&P (SPGI) |
Three sub-markets, three different shapes:
| Sub-market | 2026 supply | 2028E demand if AGI hits | Gap | Pricing |
|---|---|---|---|---|
| Daily-revisit sub-meter EO | ~100 sats, 1-2 daily revisits at sub-1m globally | Continuous global coverage at 30cm with on-board AI — 10× today | Large | Partially priced (PL +1,478%) |
| L2+/L3 driving video (labeled) | Tesla & Mobileye combined ~10M cars producing | Every robotaxi fleet needs its own; data brokering emerges | Medium | Not priced for MBLY (-36% YoY) |
| Multimodal oncology foundation data | Tempus ~9M records; Flatiron similar | Every major pharma needs proprietary multimodal at scale | Large | Not priced (TEM -26% YoY) |
| Fleet telematics (commercial) | Samsara ~2M+ assets; legacy Trimble/Geotab fragmented | Embodied AI agents need operational ground truth at this scale | Medium | Not priced (IOT -26% YoY) |
| EUV mask + chip inspection | Lasertec sole vendor for EUV pellicle/mask inspection; KLA dominates wafer | Every new fab, every new node, every leading-edge customer; tools = bottleneck | Large | Fully priced (KLAC +182% YoY) |
| CGM continuous glucose data | ~10M users (Dexcom + Abbott) | 100M+ as health agents + GLP-1 era normalize CGM | Medium | Partially priced |
| Implanted neural recording | <100 patients globally across all programs | Thousands by 2028, millions long-term | Massive | Private only; pre-revenue |
| Ticker | Mkt Cap (5/26/26) | 1Y | Category | Moat (data or sensor) | Key risk | Priced-in? |
|---|---|---|---|---|---|---|
| IOT Samsara | $18.2B | -26% | Industrial telemetry | Largest connected operations dataset; 2M+ assets generating 12T+ data points/year; ARR ~$1.5B; net retention >115%. Hard to displace because data is multi-modal (video + CAN bus + GPS + driver behavior) per vehicle | Macro freight cycle; PLTR-like multiple compression continued | No |
| MBLY Mobileye | $8.4B | -36% | AV driving data | REM crowdsourced HD map from ~200M+ vehicles globally with EyeQ chips; only competitor to Tesla in real-world driving corpus; mature OEM relationships | Tesla wins outright; China supply (Hesai+local) eats ADAS; Intel parent overhang | No |
| TEM Tempus AI | $8.4B | -26% | Genomic + clinical | ~9M de-identified multimodal records (genomic + imaging + EMR); AstraZeneca/Pathos $200M+ deal to build largest oncology foundation model; recurring data licensing revenue | Burn rate; HIPAA/consent challenges; legacy lab competition | No |
| PL Planet Labs | $17.2B | +1,478% | EO satellites | ~200 sats; ~17yr global daily archive; Pelican 30cm; ran AI inference on-orbit April 2026; defense/intel multi-year deals | Already priced; capacity dilution; competition from Maxar/Vantor | Yes — richly |
| BKSY BlackSky | ~$1.2B (est) | large + | EO satellites | Gen-3 sub-meter, fast revisit; sovereign defense contracts (recent 8-figure intl) | Smaller fleet vs PL; ITAR limits scale | Partial |
| RKLB Rocket Lab | $82.9B | +776% | Launch + manufacture | Vertically integrated EO infra play; will be the launcher for new sensor constellations | Fully priced; Neutron execution; SpaceX dominance | Yes |
| KLAC KLA | $262.7B | +183% | Semi process metrology | ~50% wafer inspection share; required for every new fab/node; AGI needs leading-edge chips and KLA closes the loop | Fully priced; cyclical; China export controls | Yes |
| 6920.T Lasertec | ~$8-10B (est) | volatile | EUV mask inspection | De-facto monopoly on EUV photomask actinic inspection (ACTIS); every leading-edge mask shop needs one | Concentrated customer base (TSMC, Samsung, Intel); China export controls | Partial |
| 6758.T Sony | ~$135B (~21T JPY) | -13% | Image sensor | ~50% CMOS image sensor share; iPhone + auto + industrial; AI vision sensor (IMX500) | Samsung competition; price erosion; consumer slowdown | Partial |
| TMO Thermo Fisher | ~$200B (est) | flat | Scientific instruments | Broadest catalog of instruments + consumables; mass spec, EM, sequencing-adjacent; AI for science requires all of it | Mature; not a pure AI-data play | Mostly |
| DHR Danaher | ~$190B (est) | flat | Bio instruments | Beckman Coulter, Cytiva, Leica, SCIEX, Pall, IDT — entire stack of bio data generation | Mature; bioprocess inventory cycle | Mostly |
| VRSK Verisk | ~$45B (est) | flat | Insurance loss data | ~50yr U.S. property/casualty claims database; unreplicable; every insurer pays for it | Slow growth; not headline-AGI | Partial |
| TRMB Trimble | $14.7B | flat | Geospatial + construction | Connected jobsite data; SketchUp+Claude integration; ARR $2.4B +12% YoY; consumption-based AI pricing rolling out | Hardware cyclical; construction macro | Partial |
| HXGBY Hexagon | ~$30B (est) | flat | Geospatial + metrology | Reality capture (point clouds at industrial scale); metrology; airborne lidar; data + sensor | European complexity; M&A heavy | Partial |
| HSAI Hesai | $3.4B | +55% | Lidar | Global lidar volume leader; 471k unit Q1; 4M unit 2026 capacity; cost down 99.5% in 8yr | Commodity — competition crushed Luminar; geopolitics | Partial |
| DXCM Dexcom | ~$30B (est) | down | CGM data | Continuous glucose monitor leader; data feeds health agents; FDA-cleared accuracy | Abbott competition; GLP-1 CGM-cannibalization debate | Partial |
| SPGI/MSCI/ICE/CME | $80-150B each | flat-up | Financial data | Proprietary tape and indices; high-margin AI licensing tailwind | Regulatory; already priced as quality compounders | Yes |
| TSLA Tesla | ~$1.3T (est) | volatile | FSD video | Largest real-world driving video corpus on earth; Dojo training | Not pure-play; valuation; Musk key-man | Mostly |
| GOOGL Waymo (via Alphabet) | ~$2T parent | + | AV + maps + YouTube | Waymo's calibrated multi-sensor stack + Maps + Street View + YouTube video corpus = arguably the best real-world dataset | Not pure-play; antitrust | Yes |
| Company | Stage / last raise | Why it matters | Access path |
|---|---|---|---|
| Neuralink | $650M Series E, $9B post (June 2025) | Highest-bandwidth implantable BCI; the "Sony of the brain"; if even partially commercial, the dataset is unique on earth | Forge/EquityZen secondary; SpaceX-style hold-and-pray |
| Synchron | $200M Series D, late 2025; ~$345M cum. | Endovascular Stentrode = lower surgical burden than Neuralink; faster regulatory path | Difficult; check institutional secondaries |
| Paradromics | $100M+ VC + $18M grants | High-throughput recording for speech decode | Very limited; watch for crossover round |
| Esri | Founder-owned, private since 1969 | The geospatial data & software bedrock; will likely never IPO. The Bloomberg-of-maps. | None — structural lock |
| Kpler / Vortexa | Kpler raised at ~$1B+ in 2024; Vortexa ~$200M+ cum. | The proprietary commodity-flow tape that trading agents will need to consume in real time | Secondaries via Insight, Five Arrows |
| HERE Technologies | Owned by Audi/BMW/Daimler + Mitsubishi/NTT | OEM-owned counter to Google Maps; geospatial moat | None directly |
| Helsing | ~$5B valuation, defense AI | Drone + sensor + battle-management; massive sensor data ingestion for European defense | European VCs; secondaries |
| Pixxel / Iceye / Capella | Various Series C/D | Hyperspectral (Pixxel), SAR (Iceye, Capella) — sensor modalities Planet doesn't fully cover | VC secondaries |
| Oura / Whoop | Oura ~$5B 2024; Whoop ~$3.6B 2021 | Continuous-wearable health datasets at scale, FDA-engaged | Secondaries; potential IPOs 2026-27 |
| Maxar (now Vantor) | Advent took private 2023 | Highest-res commercial EO + 50yr archive; defense gold | Watch for re-IPO; track via Advent |
The picks-and-shovels under the sensors themselves:
Honest take: commodity sensor materials are a worse trade than the dataset owners. The materials are sold into a competitive market; the dataset compounds.
Talent signal for AGI moats: watch where ex-DeepMind/Anthropic researchers cross over into a sensor-data company. Those bets (e.g., the recent migration into bio and EO foundation-model startups) are early signals of which datasets the frontier labs themselves think are most underexploited.
Thesis. The only public pure-play on industrial real-world data. ~2M assets, multi-modal stream (CAN, GPS, dashcam video, AI vision), ARR ~$1.5B +30%, NRR >115%, gross margin ~75%. Stock is -26% YoY despite improving fundamentals — multiple compression, not business decay. The dataset is the wedge: once a fleet is on Samsara, the AI-vision dashcam corpus alone becomes irreplaceable training data for embodied logistics agents. Will be acquisition bait for an AGI lab or a strategic (Microsoft, Salesforce) in the 2-3 year window.
Risk: Freight macro stays bad; multiple compresses further before re-rating. Sizing: meaningful position; add on dips.
Thesis. The most data-dense public bio play. 9M+ multimodal records (genomic + imaging + EMR + outcome) at consent and at scale. AstraZeneca/Pathos partnership for the largest oncology foundation model is the proof point that pharma will pay for proprietary multimodal data, not just generic genomics. Stock is -26% YoY on burn concerns — classic Anthropic-style optionality being mispriced as biotech.
Risk: Cash runway; HIPAA evolution; Roche/Flatiron is the strongest competitor. Sizing: sized as venture-style optionality.
Thesis. Down 78% from 2022 high, -36% YoY, market cap $8.4B vs. $23B IPO. The REM crowd-sourced HD map is the only credible non-Tesla driving-data corpus and it's monetizable across >30 OEMs. If the world goes multi-fleet AV (not Tesla-only), MBLY is the data backbone. Cheap optionality on that outcome.
Risk: Tesla wins outright; China shuts MBLY out further; Intel parent pressure. Sizing: contrarian medium position.
Thesis. The only company in the world that sells the EUV photomask actinic inspection tool. Every leading-edge mask shop on earth (TSMC, Samsung, Intel) needs it. As AGI training pushes leading-edge fab capacity, Lasertec gets paid. Less crowded than KLA; volatile but mathematically un-bypassable.
Risk: Customer concentration; China-cycle whipsaw; founder-CEO transition. Sizing: moderate.
Thesis. If even one of (Neuralink, Synchron, Paradromics) makes it to 10k+ commercial implants by 2030, the neural-data corpus becomes the most valuable real-world dataset on the planet. Neuralink at $9B is the cleanest beta on that outcome with the strongest team and capital backing. Access only via SpaceX/Forge-style secondaries.
Risk: Multi-year FDA path; ethics moratoria; competing modality wins. Sizing: small, illiquid, hold 10+ years.
Date of analysis: 2026-05-26
Live data confirmed via stockanalysis.com market-cap pages for IOT, MBLY, TEM, PL, RKLB, KLAC, HSAI, 6758.T (Sony), Trimble TRMB. Q1 2026 Hesai shipments via cnevpost.com (May 19, 2026). Hesai capacity expansion via TechCrunch (Jan 5, 2026). Trimble Q1 2026 via FinancialContent (May 7, 2026). Tempus AstraZeneca/Pathos deal via Tempus press release + FierceBiotech. BlackSky international contract via Business Wire (Feb 17, 2026). Planet on-orbit AI announcement via Business Wire (April 7, 2026). Neuralink valuation via TechCrunch ($9B post-money, May 2025) and Sacra. Synchron / Paradromics funding via NeuroFounders (Nov 2025). Sony CMOS image sensor share via PetaPixel / Electronics Weekly (2024 industry reports).
Estimates explicitly labeled (est) above: BKSY mkt cap, Lasertec mkt cap, Sony USD-equivalent mkt cap, TMO, DHR, VRSK, HXGBY, DXCM, Tesla, Alphabet mkt caps. These are author estimates based on recent trading ranges; verify before sizing.
Conflicts of interest: none disclosed by the analyst.
/Users/ravf/projects/investments/agi-tracks/realworld-data-sensors.html