Tokenization
of Data
Data tokenization is a cybersecurity method that replaces sensitive data in-the-clear with an algorithmically generated, non-sensitive token that obscures the content of the original data.
Data tokenization is a cybersecurity method that replaces sensitive data in-the-clear with an algorithmically generated, non-sensitive token that obscures the content of the original data.
A token functions as a digital stand-in for a valuable piece of data. On their own, tokens have no intrinsic value. However, because they represent a valuable piece of information, such as a Social Security number, they can be seen as having representative value.
Similarly, a plastic poker chip has no inherent value, except for when it represents a dollar value during a poker game. If someone steals a box of plastic poker chips from your house, they cannot take them to the bank and exchange them for money.
In the context of data usage, tokens are used in transactions while the sensitive data they represent can be safely kept in a secure location.
The general goal of tokenization is to remove sensitive data, such as personally identifying information, from a system. Once sensitive data is replaced with a token, that data can then be moved off the system to a highly secure location. If the business system is then breached, hackers will only have access to the tokens and not to the underlying data.
For an organization using them, tokens serve as distinct identifiers that can be used to retrieve sensitive data. While this may sound like a type of encryption, it is somewhat dissimilar in the sense that encryption involves information being encoded and decoded using an encryption key.
The goal of tokenization is not to stop malicious actors from hacking into networks or company systems. Instead, it is an added security layer used to protect information, not the system that holds it.
Companies looking to boost security or privacy with the tokenization of data will typically use third-party data tokenization vendors. The provider stores data in a separate secured location and issues tokens to the company for its data.
When a company wants to access its sensitive data, it passes the relevant token to its security provider. The provider then uses data tokenization tools to collect the data and pass it back to the company. The security provider’s system is the only party that is capable of reading the token, and each token is unique to the client, meaning that a provider will not use the same token for multiple clients.
People and devices generate massive amounts of data, as the amount of data being produced has increased tenfold over the past decade, according to Forbes. Much of this data is underutilized due to a lack of access. In a survey from Forbes, just 3 percent of companies said they can access data of sufficient quality. And the amount of data being produced is only going to increase. According to a report from the Wall Street Journal, global spending and information technology rose by 8.4 percent, to $4.1 trillion, in 2021.
Donation can provide some key benefits for protecting sensitive data, especially when it comes to some specific use cases. With tokenization, companies do not have to store sensitive information or transmit it using their own systems. Evidence shows many organizations are already doing this. According to a report from Gartner, more than 50 percent of data is kept in the public cloud.
Tokenization is just one of many privacy-enhancing technologies. In some situations, it is preferred over other solutions, and in some situations, other solutions are superior. Read more about privacy-enhancing techniques.
Both tokenization and encryption are techniques used to obfuscate data with the goal of securing it during storage and in-transit. While they are similar in some ways, there are some key differences.
Encryption involves the mathematical transformation of data, which is usually plain text. Through an encryption key and algorithm, sensitive data is converted into a cipher which is then converted back into the original information through the use of a decryption key. One of the main benefits of encryption is that it can be easily scaled up as an organization needs to protect more data. Encryption also facilitates the secure transmission of original data, albeit in an encoded form.
Conversely, a tokenization system does not create tokens that are directly representative of the original data. In a typical data breach situation, stolen tokens are far less valuable than stolen encoded data. Tokenization does not support scaling up as well as encryption and this can lead to performance issues. Tokenization also typically involves non-representative tokens being sent to a data collaborator, and the original data remaining with the company.
While tokenization is more advantageous when it comes to maintaining data formats, encryption is a more refined approach to data transmission.
Data masking involves the creation of false, yet realistic-looking data based on an original dataset. This approach helps to protect sensitive data while maintaining structural similarities that facilitate use in training, demoing, and other non-vital applications.
There are two different categories of data masking. The ‘static’ approach prevents users from seeing any original information after masking operations have been performed. The ‘dynamic’ approach allows only authorized users to view the original information.
Tokenization is actually a specialized type of data masking and this specialization makes it more rigid. While the general approach of masking allows for more dynamic relationships between the original data and the generated dataset, tokenization is locked into a one-to-one relationship between a token and its corresponding original data.
Hashing is a data obfuscation technique that uses a hashing algorithm to always produce the same text output from a singular input value. Different inputs processed by a hashing algorithm will never result in the exact same ‘hash’ being produced.
In the same way that translating a book from Spanish to English maintains the integrity of the original work, hashing is meant to maintain the integrity of the original information. Unlike with language translation, a ‘hash’ cannot be converted back into the original information. However, the authentication and comparison of hashes do allow for the recognition of identical data records. They also allow for the recognition of any changes made between subsequent renditions of the same record.
Hashing and tokenization are very different methods that can be used for very different purposes. With hashing, an organization can receive only hash codes and still perform authentications or other operations. The organization never has to receive original, potentially sensitive, information for authentication. Therefore, it never has to receive Social Security or credit card numbers.
Tokenization is more appropriate when an organization wants to possess the original data. An organization that wants to hold onto sensitive information like Social Security numbers can still send tokens that do not reveal any information if intercepted. Some tokenization systems keep sensitive information in a digital vault, while “vaultless” systems use an algorithm for secure storage. Either way, the original data never has to leave the holder’s system.
Commonly used in e-commerce and finance, data tokenization provides an additional layer of security for sensitive information. The tokenization of data mostly prevents the unnecessary and risky passing of sensitive information within a company’s internal system, which can create security threats.
Tokenization came to prominence as a security technology in e-commerce, and organizations in healthcare and other industries are now giving the technology a look. These organizations are driven by a desire to embrace analytics and AI, both of which require massive amounts of data. The necessary data collection can raise serious compliance concerns. In the United States, Health Insurance Portability and Accountability Act (HIPAA) regulations state that personal healthcare data must remain private. To address this mandate, healthcare organizations have been experimenting with a number of different privacy-enhancing methods.
Most national and international regulators consider data tokenization to be compliant with their data privacy rules. In some implementations, the use of tokenization to protect patient privacy is considered compliant with HIPAA regulations. However, the use of tokenization involves a major tradeoff between the degree of privacy and the level of utility.
A typical data tokenization system for patient data anonymizes records by removing identifiable information such as names and five-digit zip codes. The de-identification process is typically configured manually on a client-by-client basis, using input from the tokenization client.
During this process, the system creates one or more tokens for each designated record so that de-identified patient records can be placed into one or more datasets. Tokens are often created based on combinations of identifying information, dependent on the system configuration. Security Challenges of Tokenization
There are two common data tokenization solutions:
Let’s take a closer look at each solution to understand the advantages and disadvantages of each.
A tokenization vault stores the original plaintext information in a file or database after generating a token. When the original value must be retrieved, a call is made to the vault using the token, and retrieval occurs. Data tokenization tools then serve up the requested data.
The most obvious issue with this approach is that it creates a copy of de-identified, valuable data and places it in another location. This practice is commonly referred to as “moving the problem,” and it results in another attack point for those with malicious intent. Also, the use of a vault presents inherent scalability issues. A tokenization vault does not function well in distributed data ecosystems. For instance, tokenization of datasets from multiple parties could require significant coordination among data partners.
Another security issue is the fact that the provider has access to its clients’ sensitive — and often valuable — data. Although it is standard practice to put firewalls and other security measures in place, a company using the tokenization services of a provider largely depends on trust and legal agreements. Companies in this situation could remove any personally identifiable information before passing it to the client’s system to avoid privacy disasters, but the remaining data could still have value for unauthorized users.
Vaultless data tokenization, on the other hand, does not store the original plaintext data in a secondary location. Rather, it uses a secure cryptographic device that maps tokens to plaintext values. Although this approach avoids creating a secondary point of attack, it is vulnerable to a plaintext or a ciphertext attack. In these kinds of attacks, a hacker produces tokenization requests with the intent of unlocking the tokenization device. These types of attacks can be resource-heavy, but they’ve proven to be effective.
In addition to having privacy and security vulnerabilities, data tokenization also presents challenges related to use and performance.
One of the most important disadvantages is the cumbersome configuration process required to de-identify records. Configuration processes must be performed whenever a new data partner is added. Sometimes, the addition of a new dataset by an existing data partner will also require a significant configuration process. Adding steps between the generation and use of data slows down the time-to-insights and can lead to some datasets essentially expiring in terms of usefulness before they are ever leveraged.
Furthermore, an aggressive de-identification configuration can strip a dataset of critical information. The approach of using tokenization or other anonymization techniques inherently leads to data degradation, as the datasets lose precision when information is stripped away and replaced with tokens.
Other performance challenges include:
TripleBlind’s encrypted-in-use approach, built on practical breakthroughs in decades of trusted, verified research in cryptography and mathematics, avoids many of the issues associated with tokenization.
Our Blind Compute technology does not involve hiding, removing, or replacing data. This solution maintains full data fidelity, resulting in more accurate computational outcomes. Its key features include:
TripleBlind’s software-only API addresses a wide range of use cases, allowing for the safe and secure commercialization of sensitive data. If you would like to learn more about the TripleBlind Solution offers a number of advantages over tokenization, contact us today to schedule a demo.
TripleBlind is built on novel, patented breakthroughs in mathematics and cryptography, unlike other approaches built on top of open source technology. The technology keeps both data and algorithms in use private and fully computable.
We may request cookies to be set on your device. We use cookies to let us know when you visit our websites, how you interact with us, to enrich your user experience, and to customize your relationship with our website.
Click on the different category headings to find out more. You can also change some of your preferences. Note that blocking some types of cookies may impact your experience on our websites and the services we are able to offer.
These cookies are strictly necessary to provide you with services available through our website and to use some of its features.
Because these cookies are strictly necessary to deliver the website, refusing them will have impact how our site functions. You always can block or delete cookies by changing your browser settings and force blocking all cookies on this website. But this will always prompt you to accept/refuse cookies when revisiting our site.
We fully respect if you want to refuse cookies but to avoid asking you again and again kindly allow us to store a cookie for that. You are free to opt out any time or opt in for other cookies to get a better experience. If you refuse cookies we will remove all set cookies in our domain.
We provide you with a list of stored cookies on your computer in our domain so you can check what we stored. Due to security reasons we are not able to show or modify cookies from other domains. You can check these in your browser security settings.
We also use different external services like Google Webfonts, Google Maps, and external Video providers. Since these providers may collect personal data like your IP address we allow you to block them here. Please be aware that this might heavily reduce the functionality and appearance of our site. Changes will take effect once you reload the page.
Google Webfont Settings:
Google Map Settings:
Google reCaptcha Settings:
Vimeo and Youtube video embeds:
You can read about our cookies and privacy settings in detail on our Privacy Policy Page.
Privacy Policy