Sometimes the data you need to process is sensitive, but sometimes so are the things you're searching for. We encountered a use case that needed to sift through streaming network data and look for hits on certain IP addresses. The data itself was sensitive, but the search criteria was even more so because it identified targets of an investigation.
It was of utmost importance that the application code (intellectual property) and search terms not be compromised or known. Not by attackers and not even by internal admins.
And because these investigations are sensitive, this organization wanted to further protect the search results. In addition to being transferred over TLS, the results also had to be encrypted using a customer-provided public key.
This made more sense as they shared how they envisioned multiple parties being involved, each having distinct roles and differing levels of access:
Team A ("Alice")
Has an undisclosed methodology for determining IP addresses of interest (search terms) and desires to keep visibility of these to a minimum
Will write and deploy a function to look for those addresses within the network data passed in
Will encrypt the search results using their public key
Will not invoke the function; will receive encrypted results from others who do
Will use their private key to decrypt results
Must be the only party who can see search results
Team B ("Bob")
Manages the secure search service:
Will consume streaming network traffic data from a Kafka topic
Will call the search function with that data
Will provide the encrypted results to "Alice" by producing messages onto a separate Kafka topic
Must not be able to view the results
Here is a simplified version of the system. In this article we'll focus specifically on the parts involving Cape components.
## The Solution: Cape Enclaves
Cape offers easy access to confidential computing in a function-as-a-service model. This means that you can delegate processing to a cloud service, where both data and compute are kept confidential by use of trusted execution environments called enclaves. Nobody, including Cape and the cloud provider (even their admins), can possibly access it. A lofty claim; but don't worry, Cape SDKs cryptographically attest that this is the case with every connection.
More specifically, functions deployed to Cape are first encrypted with a key specific to your enclave and transient to the session- and always over TLS. Once your session ends, the enclave is terminated- along with any keys and memory (there is no external networking or persistent storage). Decryption and processing only occur within the isolated, secure confines of the enclave. When invoking the function with data later, it is treated exactly the same way.
This means Alice can simply write a search function in Python, with the search terms embedded, and deploy it to Cape with assurance that her code and search terms will remain confidential. And Bob can invoke it with sensitive data, also assured that the inputs are protected. In fact, this immediately meets most of our requirements.
The remaining piece is adding encryption of the results using a customer-supplied public key. Luckily, Cape also offers a Python library for Hybrid Public Key Encryption (HPKE). This makes it easy to generate keypairs, as well as encrypt and decrypt data.
Here is a summary of the key steps of our solution, each of which leverages a Cape component:
Generate a keypair.
Write a search function that checks input data for specific IP addresses, then encrypts the results using a customer-supplied public key.
Deploy the function to Cape.
Invoke the function with network traffic data.
Decrypt results using the private key associated with the public key used for encryption.
The volume and rate of requests will be hard to predict. Therefore, it'd be ideal to ensure the service could auto-scale to meet demand. Fortunately, this is a built-in feature of Cape.
The Implementation
Before writing our search function, let's first standardize on an input data format. We'll accept a list of json objects, where IP addresses will be stored in the "hosts" element:
[{"protocol": "tcp", "hosts": "1.2.3.4", "state": "ESTABLISHED"}, {...}]
In our response, we want to indicate whether each IP address we process is a match to each in the list of IPs that we are checking for. So if we were checking for the IPs 1.2.3.4 and 5.6.7.8, and we were processing 1.2.3.4, the response would be:
[{"1.2.3.4": true, "5.6.7.8": false}, {...}]
With those defined, we're prepared to review how we implemented each of the steps of our solution:
Your function bundle will include the public key generated in step 1.
Invoke the function with network traffic data. Try it yourself.
Alternatively, you can use PyCape for Python clients.
Decrypt results using the private key associated with the public key used for encryption.
Full code, including examples using both Cape CLI and PyCape, are available on GitHub.
Final Words
I hope that this example has demonstrated how trivial it can be to significantly enhance the security of your applications and data. Even more, confidential computing has the ability to protect IP, mitigate data leakage, and might even help with regulatory compliance.
I also hope that as a developer you find it quite convenient not to have to concern yourself with many of the (very important) details around getting security right. Cape helps with that, so you can spend more time focused on code and business requirements.
And remember: what happens in Cape, stays in Cape!