Data Centric Approach to Data Privacy – Data Centric vs. Model vs. Application
Artificial intelligence technology largely evolved with a focus on rules for the creation of models and the solutions they can provide. Data was assumed to be available for data scientists to use as needed. But AI models are only as good as the data used in them. Simply put, when it comes to machine learning, it’s “garbage in, garbage out.”
Rather than focusing on the model itself, an emerging approach called “data-centric AI” puts more emphasis on optimizing the data to make the technology more adaptable and scalable, while still retaining the ability to produce powerful results.
What is the Definition of a Data-Centric Approach to AI?
Famously advocated for by renowned computer scientist Andrew Ng, data-centric AI uses analytics and machine learning techniques to ensure that the data used to train a model is high quality, comprehensive, and refined for its purpose. The meaning of a data-centric approach is to reach a high level of performance through good data processing, first and foremost minimizing time spent on refining the model.
Put very simply, a data-centric approach to AI includes the following steps:
- Ensuring the appropriate data labels are used
- Eliminating noise
- Augmenting data
- Engineering model features
- Error analysis
- Review from subject matter experts to ensure the accuracy of training data
As organizations are realizing the value of adding data-centric AI to their operations, they are increasingly shifting their processes to prioritize access to higher quality, less biased data sets. And as available data sets continue to grow in size and complexity, organizations will prioritize data-centric AI even more heavily.
Data-Centric vs. Model Centric
Unlike data-centric AI, model-centric AI is focused on developing and refining models to boost system performance. This long-standing approach to AI tends to treat data as a static asset, and the development of models as the main driver of results that must be improved.
While useful, a model-centric approach to AI has challenges. For one, it often leads to the creation of different specialized models that each focus on distinct tasks. This ‘model creep’ can force organizations to manage many different AI systems and datasets. This balkanized approach to AI can also lead to higher data collection costs for disparate tasks and challenges.
Model-centric AI is also not well-suited to shifting conditions or new variables, as dealing with changes can result in significant redeployment delays. There are also challenges related to standardization, as different teams may use different methods while developing AI solutions. In these situations, adapting or scaling AI systems can be a massive undertaking.
Advocates for data-centric AI say these legacy challenges can be addressed by developing systems that are focused on improving data and structuring it in ways that support adaptability, versatility, scalability, and standardization.
Data-Centric vs. Application Centric
The emergence of a data-centric approach to AI mirrors calls for a more data-centric approach to enterprise architecture
For decades, enterprise architecture has been driven by applications. The result has been a patchwork of approaches to handling data and a severe lack of interoperability.
An emerging approach to enterprise architecture is to be data-centric rather than app-centric. As with artificial intelligence, people are seeing the value in putting data first, rather than seeing data as it means to an end.
Real World Approaches to Data-Centric AI: Manufacturing
Machine vision technologies powered by AI offer an effective way for manufacturers to identify defective parts in finished products.
One of the biggest challenges in developing this type of system for manufacturers is creating a consistent approach to data management. Without good data on the types of flaws or defects to look for, an AI system can’t properly perform inspections. This problem is more challenging than it seems because human experts can disagree on the ways to label image data, confounding an AI system.
Further complicating matters is the fact that change is a constant in manufacturing. Supply chain issues may force a manufacturer to use slightly different parts. New product lines might be unveiled. Environmental changes such as different lighting or humidity can change throughout the day or the seasons. All of these changes can potentially confound an AI-powered machine vision system.
Since model-centric AI is focused on rules and solutions, this approach is not well suited to the amount of change commonly seen in manufacturing. The result of sticking with a machine vision model in the face of change can be high rejection rates, as the system has difficulty discriminating between acceptable variation and actual defects. Adjusting this problem may require significant human intervention, leading to higher costs and production slowdown. Developers may have to spend weeks and months consulting with quality control professionals and refining machine vision models based on those consultations.
Through a data-centric AI approach, quality control experts and developers wouldn’t have to wait until problems arise before collaborating. Collaboration during the development phase would help to clearly define data, build a model around that data, assess results, and optimize the model. The resulting data-centric model would minimize back-and-forth further down the road.
A data-centric approach would also be better positioned for scaling as the manufacturer adds production lines and new facilities. Standardized methods for collecting and processing data would facilitate future training, refining, and updating of new models.
Why You Should Use TripleBlind with Your Data-Centric Approach to AI
To take a data-centric approach to AI, you obviously need access to sufficient amounts of unbiased, accessible data – which has traditionally been quite hard to come by. Until now!
Through the innovative TripleBlind Solution, our clients can access more data through secure collaborations. Our technology offers true scalability and faster processing compared to other competing technologies. TripleBlind also supports all data and algorithm types, while protecting both types of proprietary assets: Organizations can safely collaborate without worrying about loss of either their data or their algorithms.
If you would like to learn more about how our technology supports data-centric AI, check out our Blind AI Tools or download our Whitepaper. We remove common barriers to using high-quality data for artificial intelligence, solving key challenges AI professionals face with data access, bias, and prep. Through a combination of privacy-enhancing techniques, the TripleBlind Solution allows for training of new models on remote data –– without compromising the privacy or fidelity of sensitive data. Let us show you how by booking a live demo today.