Insight · AI

Automotive AI is only as good as the data it learns from

Generative AI has largely run out of fresh human-written data to learn from. In the vehicle, the next edge is not a bigger model — it is real, contextual data from the car itself, which the software-defined vehicle and the EU Data Act are finally unlocking.

01

AI has run low on data — and it shows

The first wave of generative AI was trained on the open web: the accumulated written output of humanity. That well is largely drawn. As models increasingly train on data generated by other models, quality degrades — a system feeding on its own output. More compute does not fix it. What moves the needle now is access to fresh, real-world data that no one has scraped yet.

02

In the vehicle, the data already exists

A modern vehicle is one of the richest real-time data sources in the physical world. Across its electronic control units it produces a constant stream of signals — system states, diagnostic trouble codes, network statistics, logs, location, and the full record of how the vehicle has actually been used and serviced. That is exactly the contextual, real-world data that AI needs and the open web cannot provide.

The catch is access. Historically that data has been hard to reach, inconsistently structured, and locked inside systems never designed to share it. An AI initiative is only as good as the data pipeline feeding it — and in most vehicles, the pipeline is the bottleneck.

03

The SDV and the Data Act open the pipeline

Two forces are changing that. The software-defined vehicle separates hardware from software and standardises how the car communicates, making vehicle data accessible and consistent rather than buried and bespoke. And the EU Data Act obliges connected products to let their data be accessed and shared. Together they turn the vehicle from a closed box into a configurable, governed data source.

Once the data flows — vehicle and fleet service history, diagnostic knowledge, ECU signals, fault codes — generative models, large language models and retrieval-augmented approaches can be put to work on problems that matter to a manufacturer: faster troubleshooting, proactive maintenance, better service decisions. The model is the commodity; the data pipeline is the advantage.

04

So what do you do now?

Treat the data pipeline as the strategic asset, not the model. The practical work is to make the vehicle's data accessible, trustworthy and well-structured — which is a diagnostics and embedded-software problem before it is an AI problem. Know what software is in the field, capture clean signals from it, govern how that data is shared, and the AI layer on top finally has something real to learn from.

That is the groundwork Diadrom has done for more than two decades: the diagnostics, software-download and data discipline that turns a vehicle's raw output into something a manufacturer — and its AI — can actually use.

Key takeaways

  • Generative AI has largely exhausted fresh human-written training data; quality degrades when models feed on model-generated data.
  • The vehicle is a rich source of real, contextual data — ECU signals, fault codes, usage and service history — the open web cannot provide.
  • The software-defined vehicle and the EU Data Act together make that data accessible, consistent and shareable.
  • The model is the commodity; the data pipeline is the advantage — and that is a diagnostics problem before it is an AI problem.

All insights

Talk to Diadrom

Turning your vehicle data into something AI can actually use? Let's talk.