AGI Investment Track: Real-World Data & Sensors

2026-05-26   Track 6 of 8  |  Author: realworld-data-sensors analyst  |  For: Ravi
Premise. AGI is happening within ~2 years. RSI will hit diminishing cognitive returns first. Past that point, the binding constraint becomes data from the physical world — multimodal, high-fidelity, hard-to-fake. This memo ranks where the value pools sit.

1. Thesis

Frontier models have eaten the public internet. The next 10× in capability does not come from another trillion text tokens — it comes from data the internet never had: pixels from orbit, lidar point clouds from real roads, glucose curves from real bodies, vibration spectra from real bearings, neural spikes from real cortex, mass spectra from real proteins, AIS pings from real ships. Whoever owns the network that generates that data, or the instrument that resolves that data, owns a moat that scaling laws cannot dissolve.

The investable shape of this thesis splits into two categories with very different return profiles. Sensor monopolies (Sony image sensors, KLA semiconductor metrology, Lasertec EUV inspection) are mature, defensible, but mostly priced — KLA is already +182% YoY. Proprietary real-world dataset owners (Samsara fleet telematics, Mobileye REM maps, Tesla FSD video, Tempus oncology multimodal, Verisk insurance loss data) are less obvious, more contested, and where the asymmetric AGI optionality lives. The market has noticed Planet Labs (+1,478% YoY!) but has not yet noticed Samsara or Tempus.

2. The Bottleneck Data Types

Not all sensor data is equally scarce. Ranking by hard-to-replicate × AGI training value:

Data typeWhy scarceAGI utilityWho owns the moat
Longitudinal real-world driving video + labelRequires fleet of millions of cameras over years; can't be synthesized convincingly (edge cases)World model training, robotics policy, sim2real groundingTesla (FSD fleet), Mobileye (REM), Waymo
Industrial fleet telematics (long-tail edge cases)Requires installed base of millions of trucks/equipment over a decadeEmbodied AI for logistics & operations, predictive maintenance modelsSamsara (IOT), PCAR/CMI dealer networks
Multimodal clinical (genome + imaging + EMR + outcome)HIPAA / regulatory; consent; reimbursement infraDrug discovery foundation models, diagnostic copilotsTempus (TEM), Flatiron (Roche), Foundation Medicine
Daily-revisit high-resolution Earth imageryCapital + launch + downlink + dwell time; ITAR for <30cmGeospatial foundation models, defense intel, climate, commoditiesPlanet (PL), BlackSky (BKSY), Maxar/Vantor (private), Capella (private)
Continuous physiological signals (CGM, HR, sleep)Installed-base wearables; FDA-cleared accuracyHealth agent foundation models, longevity, drug efficacyDexcom (DXCM), Abbott (ABT), Apple, Garmin (GRMN), Oura (private), Whoop (private)
Atomic-scale process metrology (semis)Decades of process know-how, $50M+ tools, customer co-developmentClosing the loop on chip self-improvement (RSI requires this)KLA (KLAC), Lasertec (6920.T), ASML's own metrology
High-channel-count neural spikesFDA implants; surgical pipeline; multi-year safety dataDirect neural readout/control; the ultimate human-data primitiveNeuralink, Synchron, Paradromics (all private)
Real-time commodity flow (AIS+terminals+pipelines)Decades of relationships with terminals, satellite AIS, vessel taxonomyTrading agents, macro modelsKpler (private), Vortexa (private), S&P Commodity Insights
Mass spectra / single-cell omicsSpecialized instruments + sample prep + chain of custodyCell models, drug screening, protein designThermo (TMO), Danaher (DHR), Bruker (BRKR), 10x Genomics
Real-time financial tapeExchange monopoly; consent of issuersExecution agents, alpha-bearing data; AI license revenueCME, ICE, MSCI, S&P (SPGI)
What is NOT a real moat: commodity lidar (price-competed to near-zero by Hesai), commodity CMOS image sensors (Sony has scale but no exclusivity vs Samsung), generic AIS providers, single-frame static satellite imagery, generic IoT sensors. The moat is in (a) installed base + time, or (b) atomic-scale instrument know-how, not in "we make a sensor".

3. Supply / Demand Gap Model

Three sub-markets, three different shapes:

Sub-market2026 supply2028E demand if AGI hitsGapPricing
Daily-revisit sub-meter EO~100 sats, 1-2 daily revisits at sub-1m globallyContinuous global coverage at 30cm with on-board AI — 10× todayLargePartially priced (PL +1,478%)
L2+/L3 driving video (labeled)Tesla & Mobileye combined ~10M cars producingEvery robotaxi fleet needs its own; data brokering emergesMediumNot priced for MBLY (-36% YoY)
Multimodal oncology foundation dataTempus ~9M records; Flatiron similarEvery major pharma needs proprietary multimodal at scaleLargeNot priced (TEM -26% YoY)
Fleet telematics (commercial)Samsara ~2M+ assets; legacy Trimble/Geotab fragmentedEmbodied AI agents need operational ground truth at this scaleMediumNot priced (IOT -26% YoY)
EUV mask + chip inspectionLasertec sole vendor for EUV pellicle/mask inspection; KLA dominates waferEvery new fab, every new node, every leading-edge customer; tools = bottleneckLargeFully priced (KLAC +182% YoY)
CGM continuous glucose data~10M users (Dexcom + Abbott)100M+ as health agents + GLP-1 era normalize CGMMediumPartially priced
Implanted neural recording<100 patients globally across all programsThousands by 2028, millions long-termMassivePrivate only; pre-revenue

4. Investable Public Companies

TickerMkt Cap (5/26/26)1YCategoryMoat (data or sensor)Key riskPriced-in?
IOT Samsara$18.2B-26%Industrial telemetryLargest connected operations dataset; 2M+ assets generating 12T+ data points/year; ARR ~$1.5B; net retention >115%. Hard to displace because data is multi-modal (video + CAN bus + GPS + driver behavior) per vehicleMacro freight cycle; PLTR-like multiple compression continuedNo
MBLY Mobileye$8.4B-36%AV driving dataREM crowdsourced HD map from ~200M+ vehicles globally with EyeQ chips; only competitor to Tesla in real-world driving corpus; mature OEM relationshipsTesla wins outright; China supply (Hesai+local) eats ADAS; Intel parent overhangNo
TEM Tempus AI$8.4B-26%Genomic + clinical~9M de-identified multimodal records (genomic + imaging + EMR); AstraZeneca/Pathos $200M+ deal to build largest oncology foundation model; recurring data licensing revenueBurn rate; HIPAA/consent challenges; legacy lab competitionNo
PL Planet Labs$17.2B+1,478%EO satellites~200 sats; ~17yr global daily archive; Pelican 30cm; ran AI inference on-orbit April 2026; defense/intel multi-year dealsAlready priced; capacity dilution; competition from Maxar/VantorYes — richly
BKSY BlackSky~$1.2B (est)large +EO satellitesGen-3 sub-meter, fast revisit; sovereign defense contracts (recent 8-figure intl)Smaller fleet vs PL; ITAR limits scalePartial
RKLB Rocket Lab$82.9B+776%Launch + manufactureVertically integrated EO infra play; will be the launcher for new sensor constellationsFully priced; Neutron execution; SpaceX dominanceYes
KLAC KLA$262.7B+183%Semi process metrology~50% wafer inspection share; required for every new fab/node; AGI needs leading-edge chips and KLA closes the loopFully priced; cyclical; China export controlsYes
6920.T Lasertec~$8-10B (est)volatileEUV mask inspectionDe-facto monopoly on EUV photomask actinic inspection (ACTIS); every leading-edge mask shop needs oneConcentrated customer base (TSMC, Samsung, Intel); China export controlsPartial
6758.T Sony~$135B (~21T JPY)-13%Image sensor~50% CMOS image sensor share; iPhone + auto + industrial; AI vision sensor (IMX500)Samsung competition; price erosion; consumer slowdownPartial
TMO Thermo Fisher~$200B (est)flatScientific instrumentsBroadest catalog of instruments + consumables; mass spec, EM, sequencing-adjacent; AI for science requires all of itMature; not a pure AI-data playMostly
DHR Danaher~$190B (est)flatBio instrumentsBeckman Coulter, Cytiva, Leica, SCIEX, Pall, IDT — entire stack of bio data generationMature; bioprocess inventory cycleMostly
VRSK Verisk~$45B (est)flatInsurance loss data~50yr U.S. property/casualty claims database; unreplicable; every insurer pays for itSlow growth; not headline-AGIPartial
TRMB Trimble$14.7BflatGeospatial + constructionConnected jobsite data; SketchUp+Claude integration; ARR $2.4B +12% YoY; consumption-based AI pricing rolling outHardware cyclical; construction macroPartial
HXGBY Hexagon~$30B (est)flatGeospatial + metrologyReality capture (point clouds at industrial scale); metrology; airborne lidar; data + sensorEuropean complexity; M&A heavyPartial
HSAI Hesai$3.4B+55%LidarGlobal lidar volume leader; 471k unit Q1; 4M unit 2026 capacity; cost down 99.5% in 8yrCommodity — competition crushed Luminar; geopoliticsPartial
DXCM Dexcom~$30B (est)downCGM dataContinuous glucose monitor leader; data feeds health agents; FDA-cleared accuracyAbbott competition; GLP-1 CGM-cannibalization debatePartial
SPGI/MSCI/ICE/CME$80-150B eachflat-upFinancial dataProprietary tape and indices; high-margin AI licensing tailwindRegulatory; already priced as quality compoundersYes
TSLA Tesla~$1.3T (est)volatileFSD videoLargest real-world driving video corpus on earth; Dojo trainingNot pure-play; valuation; Musk key-manMostly
GOOGL Waymo (via Alphabet)~$2T parent+AV + maps + YouTubeWaymo's calibrated multi-sensor stack + Maps + Street View + YouTube video corpus = arguably the best real-world datasetNot pure-play; antitrustYes
Market caps where labeled (est) are author estimates based on recent trading; for exact figures consult the original tickers. Live data confirmed for IOT, MBLY, TEM, PL, RKLB, KLAC, HSAI, TRMB, 6758.T as of 2026-05-26.

5. Pre-IPO / Private to Watch

CompanyStage / last raiseWhy it mattersAccess path
Neuralink$650M Series E, $9B post (June 2025)Highest-bandwidth implantable BCI; the "Sony of the brain"; if even partially commercial, the dataset is unique on earthForge/EquityZen secondary; SpaceX-style hold-and-pray
Synchron$200M Series D, late 2025; ~$345M cum.Endovascular Stentrode = lower surgical burden than Neuralink; faster regulatory pathDifficult; check institutional secondaries
Paradromics$100M+ VC + $18M grantsHigh-throughput recording for speech decodeVery limited; watch for crossover round
EsriFounder-owned, private since 1969The geospatial data & software bedrock; will likely never IPO. The Bloomberg-of-maps.None — structural lock
Kpler / VortexaKpler raised at ~$1B+ in 2024; Vortexa ~$200M+ cum.The proprietary commodity-flow tape that trading agents will need to consume in real timeSecondaries via Insight, Five Arrows
HERE TechnologiesOwned by Audi/BMW/Daimler + Mitsubishi/NTTOEM-owned counter to Google Maps; geospatial moatNone directly
Helsing~$5B valuation, defense AIDrone + sensor + battle-management; massive sensor data ingestion for European defenseEuropean VCs; secondaries
Pixxel / Iceye / CapellaVarious Series C/DHyperspectral (Pixxel), SAR (Iceye, Capella) — sensor modalities Planet doesn't fully coverVC secondaries
Oura / WhoopOura ~$5B 2024; Whoop ~$3.6B 2021Continuous-wearable health datasets at scale, FDA-engagedSecondaries; potential IPOs 2026-27
Maxar (now Vantor)Advent took private 2023Highest-res commercial EO + 50yr archive; defense goldWatch for re-IPO; track via Advent

6. Commodity / Physical Plays

The picks-and-shovels under the sensors themselves:

Honest take: commodity sensor materials are a worse trade than the dataset owners. The materials are sold into a competitive market; the dataset compounds.

7. People / Talent Concentration

Talent signal for AGI moats: watch where ex-DeepMind/Anthropic researchers cross over into a sensor-data company. Those bets (e.g., the recent migration into bio and EO foundation-model startups) are early signals of which datasets the frontier labs themselves think are most underexploited.

8. Top Picks — Ranked

#1 — IOT Samsara  High conviction

Thesis. The only public pure-play on industrial real-world data. ~2M assets, multi-modal stream (CAN, GPS, dashcam video, AI vision), ARR ~$1.5B +30%, NRR >115%, gross margin ~75%. Stock is -26% YoY despite improving fundamentals — multiple compression, not business decay. The dataset is the wedge: once a fleet is on Samsara, the AI-vision dashcam corpus alone becomes irreplaceable training data for embodied logistics agents. Will be acquisition bait for an AGI lab or a strategic (Microsoft, Salesforce) in the 2-3 year window.

Risk: Freight macro stays bad; multiple compresses further before re-rating. Sizing: meaningful position; add on dips.

#2 — TEM Tempus AI  High conviction (volatile)

Thesis. The most data-dense public bio play. 9M+ multimodal records (genomic + imaging + EMR + outcome) at consent and at scale. AstraZeneca/Pathos partnership for the largest oncology foundation model is the proof point that pharma will pay for proprietary multimodal data, not just generic genomics. Stock is -26% YoY on burn concerns — classic Anthropic-style optionality being mispriced as biotech.

Risk: Cash runway; HIPAA evolution; Roche/Flatiron is the strongest competitor. Sizing: sized as venture-style optionality.

#3 — MBLY Mobileye  Medium-high conviction

Thesis. Down 78% from 2022 high, -36% YoY, market cap $8.4B vs. $23B IPO. The REM crowd-sourced HD map is the only credible non-Tesla driving-data corpus and it's monetizable across >30 OEMs. If the world goes multi-fleet AV (not Tesla-only), MBLY is the data backbone. Cheap optionality on that outcome.

Risk: Tesla wins outright; China shuts MBLY out further; Intel parent pressure. Sizing: contrarian medium position.

#4 — 6920.T Lasertec  Medium conviction

Thesis. The only company in the world that sells the EUV photomask actinic inspection tool. Every leading-edge mask shop on earth (TSMC, Samsung, Intel) needs it. As AGI training pushes leading-edge fab capacity, Lasertec gets paid. Less crowded than KLA; volatile but mathematically un-bypassable.

Risk: Customer concentration; China-cycle whipsaw; founder-CEO transition. Sizing: moderate.

#5 — Neuralink (private)  Venture-style

Thesis. If even one of (Neuralink, Synchron, Paradromics) makes it to 10k+ commercial implants by 2030, the neural-data corpus becomes the most valuable real-world dataset on the planet. Neuralink at $9B is the cleanest beta on that outcome with the strongest team and capital backing. Access only via SpaceX/Forge-style secondaries.

Risk: Multi-year FDA path; ethics moratoria; competing modality wins. Sizing: small, illiquid, hold 10+ years.

Explicitly NOT picks:

9. What Would Change My Mind

10. Sources & Notes

Date of analysis: 2026-05-26

Live data confirmed via stockanalysis.com market-cap pages for IOT, MBLY, TEM, PL, RKLB, KLAC, HSAI, 6758.T (Sony), Trimble TRMB. Q1 2026 Hesai shipments via cnevpost.com (May 19, 2026). Hesai capacity expansion via TechCrunch (Jan 5, 2026). Trimble Q1 2026 via FinancialContent (May 7, 2026). Tempus AstraZeneca/Pathos deal via Tempus press release + FierceBiotech. BlackSky international contract via Business Wire (Feb 17, 2026). Planet on-orbit AI announcement via Business Wire (April 7, 2026). Neuralink valuation via TechCrunch ($9B post-money, May 2025) and Sacra. Synchron / Paradromics funding via NeuroFounders (Nov 2025). Sony CMOS image sensor share via PetaPixel / Electronics Weekly (2024 industry reports).

Estimates explicitly labeled (est) above: BKSY mkt cap, Lasertec mkt cap, Sony USD-equivalent mkt cap, TMO, DHR, VRSK, HXGBY, DXCM, Tesla, Alphabet mkt caps. These are author estimates based on recent trading ranges; verify before sizing.

Conflicts of interest: none disclosed by the analyst.


Generated for Ravi, agi-investment-tracks team. Track 6 of 8 (realworld-data-sensors). File: /Users/ravf/projects/investments/agi-tracks/realworld-data-sensors.html