The Data Challenge in Industrial AI: A Critical Hurdle for Digital Transformation
Disclaimer: This article is published in partnership with Siemens. Siemens is paying for my engagement, not for promotional purpose. Opinions are my own.
This is the second part of a multi-part series of articles discussing key issues in Industrial AI and Siemens’ role and activities in this transformative technology.
The advent of Industrial Artificial Intelligence (AI) has ushered in tremendous opportunities for industrial companies, such as Siemens, to optimize operations, reduce costs, and innovate in product and process development. However, leveraging AI in industrial settings comes with its own set of challenges, most of which are rooted in the data that drives these AI systems. Whether it is the quantity or quality of data, its management and governance, or the intricacies of ownership and protection, data-related obstacles can significantly delay or even prevent the deployment of effective AI solutions in industrial contexts. Particularly, in the era of Generative AI, some data challenges have eased, while others persist and new ones have emerged.
Let’s have a look at the major data challenges in the context of Industrial AI. While the first paragraphs discuss more general AI-related challenges, the last one focuses on specific issues that Siemens and its peers are currently dealing with.
Data Quality: The Foundation of Reliable Industrial AI
The success of AI applications heavily depends on the quality of the data fed into these systems. However, industrial data is notoriously messy. Companies like Siemens and its customers often deal with large volumes of data generated by complex production systems, which can include sensor data, machine logs, maintenance records, and product lifecycle management information. This data often contains errors, missing values, or is poorly labeled. Low-quality data not only leads to inaccurate predictions but can also exacerbate biases in AI models, leading to suboptimal or even faulty outcomes.
Recent studies on data quality reveal that companies lose about $12.9 to $15 million annually due to inefficiencies and lost productivity caused by poor data. In industries that rely on precision and safety, such as manufacturing or energy, poor data quality can have severe implications. For Siemens, ensuring that AI systems receive accurate, clean, and well-structured data is essential for minimizing operational risks. Case in point: In their AI solutions, such as Senseye Predictive Maintenance, Siemens integrates Generative AI and Machine Learning to analyze machine data effectively, highlighting the critical role of accurate data for operational success and risk mitigation.
Data Quantity: The Imbalance Between Too Much and Too Little
While AI systems thrive on data, the volume of data in industrial contexts can pose a significant challenge. In some cases, companies may have too much data to process efficiently, resulting in storage and computational challenges. Conversely, some use cases may suffer from having too little data, especially when trying to train predictive models for rare events, such as equipment failure or product defects. This imbalance between data overabundance and scarcity makes it often difficult for industrial firms to effectively utilize Industrial AI.
A study by Boston Consulting Group (BCG) highlights that around 70% of AI transformations fall short of expectations, with insufficient data being a key challenge. Industrial companies face similar struggles, particularly when they need historical data to build predictive maintenance models or optimize manufacturing processes. In such cases, synthetic data or data augmentation methods may be required to bridge the gap, but these techniques add complexity and risk to AI model development.
Data Management: The Overarching Challenge
Managing the vast amounts of data in industrial settings poses a huge task. Industrial companies generate data from various sources, including sensors, machines, enterprise systems, and supply chains. The complexity arises from the need to harmonize and standardize data from diverse formats and systems, which often lack interoperability. This becomes especially challenging when legacy systems are involved.
Siemens, for example, often deals with decades-old infrastructure alongside modern, digitalized components, making it difficult to integrate data flows. Case in point: The Siemens Xcelerator business platform combines operational technology (OT) with information technology (IT) to create horizontal and vertical data flows. This approach helps bridge the gap between legacy systems and modern digital platforms, allowing data to move freely across product development, manufacturing, and supply chain operations. According to Gartner, 75% of the world's data will require some level of management by 2025, and industries are at the forefront of this challenge. Effective data management strategies, including data lakes, warehouses, and real-time streaming platforms, are necessary to ensure that AI models can access clean and structured data.
Data Interoperability: The Need for Seamless Integration
Interoperability is another significant hurdle for Industrial AI. Many companies operate in environments with various vendors, protocols, and systems, all of which must work together to enable AI solutions. Case in point: Siemens’ Industrial Edge platform is designed to process data from various systems in real-time, integrating AI seamlessly into industrial processes. It enhances the ability to work with diverse systems by providing a unified framework for data collection, analysis, and decision-making, regardless of the underlying vendor technology. A lack of standardization across data systems can prevent Industrial AI applications from accessing and processing data effectively, thus delaying deployment or reducing efficacy.
Many companies suffer from the absence of data interoperability, causing operational inefficiencies and wasted resources and resulting in significant costs and economic impact. To unlock the potential of AI, industrial players must push for open standards and more collaborative ecosystems, allowing data to move seamlessly across platforms and enabling AI to extract value across the entire production chain. Siemens does exactly this when assembling the Industrial Metaverse.
Data Ownership and Accessibility: Balancing Innovation with Control
Data ownership and accessibility present additional challenges in Industrial AI. In many industrial ecosystems, data is collected and shared between multiple parties, such as equipment manufacturers, suppliers, and operators. Disputes over who owns the data - and who has the right to use it - can hinder collaboration and slow down AI initiatives.
A Deloitte report highlights that data ownership and control over who can access and manage data is among the major hurdles for businesses looking to deploy AI at scale. Industrial companies often collaborate with external partners on AI projects, making it critical to establish clear data-sharing agreements that balance innovation with data control. Case in point: In Siemens' collaboration with Microsoft on the Industrial Copilot, they ensure that customers retain full control over their data, while using AI to boost productivity and streamline processes across industries.
Data Governance and Privacy: Navigating Regulatory and Ethical Complexities
Data governance and privacy have emerged as critical issues in the Industrial AI landscape. Industrial companies are often subject to strict regulations regarding the handling and storage of data, especially in sectors like energy, healthcare, and transportation. The General Data Protection Regulation (GDPR) in Europe and similar regulations in other regions require companies to manage personal data responsibly, including ensuring data privacy and security.
For Industrial AI projects that involve customer or employee data, governance frameworks are essential to maintain compliance with legal requirements. Furthermore, as AI systems become more complex and autonomous, ensuring accountability and transparency in how data is used becomes paramount. Multinationals with their global presences must navigate varying regulations across jurisdictions, adding complexity to their AI projects. Case in point: Siemens continuously seeks to integrate AI technologies responsibly, balancing innovation with compliance to ensure that its AI systems meet diverse global standards.
Generative AI: Addressing and Introducing New Data Challenges
Generative AI, including models like OpenAI's ChatGPT, introduces new possibilities for Industrial AI but also comes with its own set of data challenges. On the positive side, Generative AI can help mitigate some data scarcity problems by generating synthetic data for AI model training. It can also enhance data augmentation techniques, potentially reducing the reliance on real-world data for certain applications.
However, Generative AI also introduces new challenges. One major concern is the reliability and authenticity of synthetic data. If the synthetic data used to train Industrial AI models is not accurate or reflective of real-world conditions, it can lead to biased or flawed models. Additionally, the increased complexity of generative models makes it harder to ensure transparency, interpretability, and accountability in AI-driven decisions.
Moreover, the integration of Generative AI requires even more sophisticated data management strategies, as the data needs to be curated, labeled, and verified for use. This adds another layer of complexity to existing data management challenges.
Specific Challenges in Industrial Environments: Data Representativeness, Protection and Communication
Beside the above, more general AI-related data challenges, Siemens and other companies that work in industrial production environments, are required to deal with a very specific set of challenges.
An AI model must be trained with representative data to enable realistic predictions or classifications. Since industrial processes often exhibit specific and rare anomalies (e.g., machine failure or manufacturing defects), it can be difficult to collect enough representative anomaly data for model training despite large sets of available data. Synthetic data presents a potential solution here. For instance, by creating a digital twin of a new product, simulated, photo-realistic images of potential anomalies or defects can be generated and be used for AI model-teaching ahead of production start. Case in point: quality control of completely new series in the automotive industry.
Especially in industrial settings data often contains sensitive and proprietary information (such as machine specifications, process parameters or operational details). Since data- sharing is critical to the success of the co-creation of complex AI solutions in industrial applications, participating companies must decide which data to release for this purpose and which not. This requires that the current “state of the art” is determined as a basis for decision-making so that data that is already freely accessible is not unnecessarily withheld. Siemens has been doing this groundwork in their collaborations, e.g., with Schaeffler. In contrast to telecommunications, for instance, in which data sharing has long been common, the industrial sector is still in its infancy here. Boris Scharinger, Industrial AI Strategist at Siemens AG, gets to the point:
The speed with which we do our homework here will play a decisive role in determining how successful Industrial AI and thus Europe's competitiveness in this area is going to unfold.
Last, but not least, the interface to data-generating machines and devices plays a crucial role for Industrial AI systems, as these need to access large amounts of production data (e.g., semantics of a PLC) to train algorithms, make decisions and perform predictive analytics. The OPC-UA interface (Open Platform Communications Unified Architecture) is an open, manufacturer-independent communication standard that is widely used in industrial automation and mechanical engineering. It is used for secure, reliable and cross-platform communication between different systems and devices in a production environment. Some machines, especially older ones, do not meet this standard. This makes the necessary data communication for the use of Industrial AI extremely difficult or even impossible.
Takeaway
The data challenges in Industrial AI include quality, quantity, management, interoperability, ownership and governance. The last challenge in particular lies in dealing with Generative AI models, which can produce synthetic data but introduce new risks in terms of data authenticity and accountability. Siemens and other companies in the industry face specific hurdles, such as the representativeness of anomaly data, the protection of sensitive data and communication between machines and AI systems. One example is the use of synthetic data by digital twins to train models for quality control. Another challenge is the lack of compatibility of older machines with modern standards such as OPC-UA, which makes the necessary data communication more difficult.
Conclusion: To unlock the full potential of Industrial AI, it is imperative that companies address their data management soon enough!
Published in partnership with
Technology, Data, & Industrial AI Thought Leader| Building the Future of Smart & AI Powered Systems @ Siemens AG
1moGreat article Dr. Ralph-Christian Ohr . These points needs to be amplified.Companies will continue to struggle with #valuecreation from AI if these data related issues are not addressed. As they say, GIGO. While AI holds promise for efficiency and insights, poor data quality can be a bottleneck, leading to unreliable results and missed opportunities. #Datamanagement is very important. #Datstrategy is key. It is not an ootion but a must if companies want to derive value from #AI. Investing in data quality is essential to harness the potential of #AI and other advanced technologies. Without quality data, new #technologies cannot achieve reliable, impactful results—missing this investment could mean falling behind. Somebody once shared this anology that stayed with me: "Imagine you’re trying to bake a cake. If you use spoiled milk or stale flour, the cake will taste terrible, no matter how skilled you are as a baker. In the same way, if you feed low-quality data into an AI system, the results will be disappointing, no matter how advanced the algorithm."
Business Development Executive at Feynman Center for Innovation/LANL.gov
2moIs there a link to your series of articles on the Data topic?
🚀 Ellen Schramke Lydia Bierwirth Julia Kauppert Nina Kunz Kevin O'Donovan Antonio Vieira Santos Bernd "Benno" Blumoser Ulli Waltinger Dr. Matthias Loskyll Christian Homma Franz Menzl Bianca Höflinger Alfred Hauenstein Christopher Schütte Claus Romanowsky Helmut Ziegler