How Cape Privacy Can Augment a Tokenization-Based Ecosystem for Secure Machine Learning Predictions

  • Bessie Chu
    Bessie Chu
  • Yann Dupis
    Yann Dupis

Many enterprises today use tokenization to protect their sensitive data and incorporate tokenization workflows as part of their data protection ecosystem. However, tokenization alone does not solve protecting sensitive data in-use or facilitate the most secure environment for using sensitive data for machine learning. 

Both tokenization and encryption secure information during transfer and storage, though the mechanics differ. They are not interchangeable, and each has trade-offs.

Encryption transforms plaintext into cipher text (unreadable data) using mathematical algorithms and a key to decrypt the data. Tokenization generates a token value randomly and stores the mapping in a database. Tokenization can preserve the format of structured data fields such as credit card numbers or other numerical values. However, tokenization can be difficult to scale, difficult to transfer, and difficult for processing across tokenized fields. It is not possible to practically perform compute operations across tokenized fields, let alone for machine learning. Encryption works well on large data volumes and can work for both structured and unstructured data. It also works well if the data needs to be exchanged among the third parties so long as the other party has the key. With advances in technologies like Secure Multiparty Compute [MPC], encrypted data can also be computed on and protected by cryptography.

Currently, many enterprises tokenize their data when consolidating or migrating data into public clouds such as Snowflake. Many services provide this capability, however in practice the data ends up difficult to use because it must be de-tokenized to plaintext to run predictive AI on, eg. predicting customer churn. The de-tokenization process itself can be insecure since the data is transformed back into plaintext, and as a result, the data effectively ends up “frozen” in practice due to the companies not wanting to assume the risks involved.

We at Cape Privacy believe our solution can provide services to make the best of both worlds and existing implementations. Cape Privacy’s MPC-based cryptography enables running machine learning models on encrypted data. Cape Privacy specifically focuses on enabling machine learning predictions on encrypted data-in-use. 

Companies that use tokenization alone today find themselves in the conundrum of having to detokenize back into plaintext to utilize the data for predictive intelligence. Cape Privacy uses secret sharing to process the data. Enterprises can de-tokenize to Cape so your most sensitive data never is in plaintext by detokenizing directly into an AES encrypted format and utilizing Cape Privacy’s platform to upload machine learning models and run predictions from Snowflake.

To learn more about Cape Privacy and our platform, visit our website. Or contact Cape Privacy for more information.

Share this post