Balancing Data Confidentiality with Utility
In recent years, we’ve seen a massive flood of data in data-driven fields like finance and healthcare. This has led to innovations in business models, AI, and data collaboration. However, data confidentiality can complicate how useful this data can actually be.
So, how do organizations balance the business utility of their data with the need for privacy and data protection?
In this webinar, Justin Lam and Chris Barnett discuss ways enterprises can move both their data confidentiality & utility forward together. Justin Lam is a data security research analyst at 451 research, a part of S&P Global Marketing Intelligence. Chris Barnett is TripleBlind’s VP of Marketing and Partnerships and oversees all go-to-market strategies for TripleBlind AI.
The state of data usability and privacy in business
In many cases, enterprises and their partners are seeking to analyze or monetize non-public or personally identifiable information. However, this tends to be at odds with growing body of privacy regulations and intellectual property concerns.
The trend: more data-driven companies
With the benefits of faster preparation, analysis, and decision making, more companies are adopting a data-driven approach in today’s market. Decisions can be made with greater confidence, and mistakes can be addressed faster when the data doesn’t support them.
Market intelligence research by 451 found 64% of respondents said “most” to “nearly all” of their strategic decisions were data-driven. While this is true, it is worth noting that some organizations may not have the need to become more data-driven (some business models simply don’t require it), while some are in a position where they can’t make the shift (due to current processes, budgets, low levels of data, etc.)
Second-order trend: more overlap of internal teams
With this central role data now plays in fields like finance and healthcare, internal processes must adapt as well.
One adaptation is that data security teams, dev & product teams, and GRC (governance, risk management, and compliance) are working together more frequently, and more meaningfully. It’s become increasingly important that these teams can understand each other’s functions and how to account for each other’s priorities in their own work.
This introduces an important set of questions, such as:
- Can the security teams account for some of the development in some of the AI/ML models?
- Can researchers and developers proactively incorporate security into their technology stacks?
- Can they translate technical controls they’ve put in place (such as data encryption, data classification) in order to satisfy legal requirements?
- Can legal teams understand technical controls put in place and translate them?
Current challenges to data utility
While many organizations understand the benefits of being more data-centric, there are still some major obstacles.
The main concerns we hear from our customers center around:
- Budget limitations
- Integrating existing technology
- Security concerns
- Privacy/governance
- Not enough skilled personnel
Of these, privacy and governance are the most frequently mentioned. Managing legal frameworks can drastically limit data utility if you don’t have the right strategy or technology in place.
A 451 study found that among companies who identify as data-driven, “data utility” (quality and consistency) was one of the top barriers to having better data practices (the second was security, and the third was privacy).
The future of these emerging trends
Innovation and experimentation will continue, but many projects will continue to get abandoned due to issues like those mentioned above.
There are a number of reasons for this abandonment.
First, many businesses launch AI/ML projects without fully understanding such obstacles, and many of these projects (39%) get abandoned for this reason. More than half of organizations unable to get access to the required data will ultimately abandon all relevant projects.
Additionally, environmental, social, and governance (ESG) aspects of AI/ML are also becoming increasingly relevant.This is especially due to the environmental impact of AI/ML (such as processing required to train data models), immoral or unethical usage (from privacy issues to discrimination), and nascent government regulation in the field, where lawmakers and businesses alike are still trying to figure out the best way forward.
ESG issues around AI/ML have hit the mainstream in the last few years (especially in 2022), prompting greater concerns about their impact (451’s research found 74% of people are now “somewhat” to “very” concerned about the privacy of their data online).
Businesses are prioritizing AI/ML initiatives in greater numbers, forcing more companies to get involved if they want to stay competitive. They have way more data, so they want to know how to monetize it for themselves, for their stakeholders, and their customers. In 451’s survey, they found 82% of businesses say data marketplaces (to buy and sell data) will likely be in their top 5 priorities in the coming three years.
What’s next: how can organizations respond?
To balance data utility with confidentiality, businesses can turn to privacy-enhancing (or privacy-preserving) technologies, known as PET. There are four key areas TripleBlind’s PET can help here: data aggregation, invention of new uses, distribution of models and data, and model verification.
Aggregation
Data is often siloed, typically for legal reasons to protect privacy. Instead of physically aggregating data,TripleBlind allows you to “logically” aggregate it, which reduces the need for storage and transmission costs and mitigates regulatory burdens. You can then leverage all siloed data sets, without the hassles and expenses of doing so via traditional methods.
Invention of uses:
With so much new data, companies can now find novel ways to get value from that data. PET helps improve the efficiency of this process, as well as the extendibility, allowing companies to get more use out of the data by securely leveraging a more valuable pool of data sets. TripleBlind also provides exploratory data analysis and GUI-based search to discover interesting data sets.
Distribution of models/data
Enterprises want to be able to deploy AI models and other algorithms for use by others, without exposing the underlying IP. Thankfully, sharing data doesn’t require you to sacrifice privacy, and PET makes it much easier: you can use TripleBlind’s router to deliver algorithm results to any global location, with technology in place to protect critical IP and patient data. This allows you to skip the impasse and legal hurdles of sharing models and data, without all the risks.
For monetization and monitoring purposes, TripleBlind still provides you with a “backend” audit trail to track usage and bill customers.
Verification of models
Typically, models are done on specific populations or datasets. It’s difficult to determine the model’s relevance to the broader population due to issues like data bias. For instance, blood test data from an exclusively American sample group generally can’t extrapolate to an Indian population, so that model would need verification to determine its limits and biases.
TripleBlind’s statistical tools let you detect model drift or bias, so you can validate your models and use them on different populations and countries.
For more information on TripleBlind, check out our whitepaper on the underutilization of data.
Or learn about our recent survey on what CDOs are thinking about data privacy.