Tools for Securing your Data (for Developers) – Tokenization

In this and the next few blog posts I’ll talk about two useful tools that can help secure and share your data – Tokenization and Threshold Cryptography.

Tokenization refers to the process of replacing sensitive data fields with a randomly generated token value, and storing the sensitive data value in a logically separate data store.  The token value should be randomly generated, so there is no way to map back from the token to the data value without the use of the tokenization system.  (This is a different approach from encrypting the data values, where the encrypted value can potentially be reversed.)  The generated token can potentially be of the same data type and format as the original data value, allowing the capability of integrating tokenization into existing legacy systems, or using tokenization to sever sensitive data values from public cloud-based systems.

Tokenization is an alternative to encryption using strong cryptography.  The two techniques can be combined, using string encryption to secure all data and token values that are transferred between the application data store and token “vault”, as well as encrypting the actual data values within the vault.

A lot of vendors are now talking about something called “Vault-less Tokenization”, which is something like tokenization but without having to maintain a separate token repository.  (The drawbacks to a token database is that if you lose it, you lose your data!)  Vault-less Tokenization is something like encryption, where the token value is derived from the original data, plus some “secret” or some values derived from a lookup table, and then rendered in the data’s original format.  It has the advantages of tokenization, without the cost of a separate data repository.  To recover the original value, you apply the reverse of the “secret” on the token value.  It’s really not that much different than strong cryptography, the main benefit that it gives you a properly formatted token.

There are a number of properties a secure tokenization system should possess and I’ll be talking about these in a future blog post.

Another great tool in your toolkit is threshold cryptography.  This one is a bit more complicated, and I’ll be talking about it (and its applications) in my next post.  And later I’ll demonstrate where tokenization and threshold cryptography can be combined to form a secure platform for social data sharing.