Companies now more than ever rely on data to create trustworthy insights to make critical decisions of strategic importance. The need for trust permeates every aspect of society, from education and politics to technology, where the value and reliability of artificial intelligence (AI) are hotly debated.
Large enterprises are faced with the massive challenge of handling the vast volume and variety of data created by myriad technology and connected sites worldwide. Faced with complicated data challenges, how do organizations create an environment that facilitates AI we can trust as a means to make intelligent business decisions?
It comes down to a three-part process that starts with a data management platform that is AI-friendly so that data teams have common access to datasets at every stage. Then, we need transparency in how we assemble algorithms and data to create trust with nontechnical stakeholders. Lastly, we must focus on understanding the lineage of data throughout the end-to-end data supply chain so you can trust the data itself.
Using An AI-Friendly Data Lake
Whether consciously or unconsciously, we all have inherent biases that influence machine learning algorithms because we choose to include the data that generally supports or confirms our beliefs. However, this data bias will drive inaccurate prediction modeling and lead to poorly informed business decisions. Businesses require quality data to trust AI.
Access to quality data varies significantly depending on the type of data an organization is trying to use. From experience, a business can find access to quality operational data where data does not change over a long period of time, such as an online transaction or line items found on a receipt. However, it becomes more difficult to have quality master data, which includes information related to your customers, products, suppliers, etc.
The new data management paradigm at play here masters the “three V’s of data” — volume, variety and velocity. It also provides a central place for analysts to process data with higher-quality predictions that can inform better business decisions. Also, a cloud-native data lake is beneficial because it can extract and process high-volume multiformat data that eliminates data silos that could alter the level of bias and quality of information resulting from the analytics.
Businesses typically collect data from four primary areas: the consumer (direct-to-consumer channels, device ID and social listening), the customer (point of sale, shelf price data and promotional data), the macro environment (market size and GDP demographics) and internal operations (daily sales, digital factory, etc.). Large multinational organizations can easily experience a “data swamp” of unorganized, inaccurate and irrelevant data with the collection of the abundance of data.
A data lake, something I’ve given a TEDx Talk about, is critical in keeping the data organized so organizations can extract the right insights. When implemented and utilized correctly, a data lake promotes business agility and flexibility because it allows businesses to apply different types of analytics, such as big data analytics, SQL queries and more, to gain insights on data in real time.
Providing Transparency Through Explainable AI
The purpose of an algorithm must be clear and transparent so that even if someone is not an expert on data analytics or AI systems, they can understand it.
Businesses need to make the data driving the algorithm transparent to build transparent algorithms. It must be debuggable, showing the lineage of its decisions, and understandable to a nontechnical audience.
Tagging Builds Trust
In addition to using internal data for AI, organizations rely on data purchased from third-party data providers. Although some well-known data providers have spent decades building their customer base by supplying reliable data, it can be challenging to ensure the end-to-end supply chain of the data from the third-party is trusted and accurate. In this case, it is critical to tag the data to identify its source and create a lineage.
As you can imagine, tagging every piece of data coming from a third-party supplier can be time-consuming. There are forward-thinking businesses that are currently working on automating the tagging process, hopefully taking into account that the future of tagging lies in ensuring machine learning models are unbiased.
As organizations rely more and more on AI, we will see an increase in trust as more regulations ask businesses to identify the source of any data and create transparent and explainable algorithms. Tagging data is one critical way to ensure the validity and accuracy of information being used in machine learning and, ultimately, on how AI behaves.
Indeed, businesses that are ensuring they can plug in and access data from various sources, sharing transparent algorithms and tagging data based on its trustworthiness, are just working on one part of the process to ensure AI is unbiased.
The human element within AI can also be improved. Enhancing trust between humans and machines can start with the initial hiring process — businesses need to hire data scientists without “tunnel vision” to eliminate potential biases. Data scientists who are on the lookout for data that doesn’t conform, rather than data that does, will be more likely to spot biases that may be affecting the quality of the data and, therefore, the effectiveness of artificial intelligence.