Secure Sentiment Analysis with Cape

  • Ellie Kloberdanz
    Ellie Kloberdanz

What is Sentiment Analysis and How Cape Makes it Secure?

Sentiment analysis is an application of natural language processing (NLP) that classifies the sentiment of text, typically as either positive or negative. Because vast amounts of data exist in textual form, sentiment analysis has a lot of practical applications including social media monitoring, customer feedback analysis, news analysis, market research etc. Processing this type of data in an automated manner therefore allows for extracting valuable information efficiently.

However, what if the textual input data that we need to analyze is sensitive or needs to stay confidential? This is where the Cape Privacy secure enclave system comes in. Cape Privacy provides a confidential computing platform based on AWS Nitro enclaves for security and privacy-minded developers. Cape allows for running serverless functions on encrypted data and ensures that sensitive data or intellectual property within apps is protected. 

Cape provides a command line interface (CLI) and also a Python and JavaScript software development kits (SDKs) called pycape and cape-js that allow developers to deploy their apps and allow users to interact with them in a secure manner.

There are three essential components that enable this: cape encrypt, cape deploy, and cape run. The command cape encrypt encrypts inputs that can be sent into the Cape enclave for processing, cape deploy performs all needed actions for deploying a function into the enclave, and finally cape run invokes the deployed function with an input that was previously encrypted with cape encrypt. Learn more on the Cape docs.

How to Build a Sentiment Analysis App with Cape?

Training a Text Classification Model

The function that we wish to deploy and run in case of sentiment analysis is a text classification model and therefore, we need to first define its architecture and train it. 

To make the model light weight we choose to use TensorflowLite and its Model Maker library. The training data that we use is the SST-2 (Stanford Sentiment Treebank), a commonly used dataset published by Socher et al. (2013) that consists of over 60,000 movie reviews that have been labeled as positive or negative. For the model architecture we use the average word embedding, which produces a model that is small and therefore, can perform fast inference. The following code snippet shows the model definition and training procedure and exports the trained model and its vocabulary as a TensorFlow Lite model.

Install dependencies

!sudo apt -y install libportaudio2 !pip install -q tflite-model-maker-nightly

Import libraries

import numpy as np import os from tflite_model_maker import model_spec from tflite_model_maker import text_classifier from tflite_model_maker.config import ExportFormat from tflite_model_maker.text_classifier import AverageWordVecSpec from tflite_model_maker.text_classifier import DataLoader import tensorflow as tf import pandas as pd assert tf.__version__.startswith('2') tf.get_logger().setLevel('ERROR')

Prepare training data

df.to_csv(new_file)

# Replace the label name for both the training and test dataset. Then write the # updated CSV dataset to the current folder. replace_label(os.path.join(os.path.join(data_dir, 'train.tsv')), 'train.csv') replace_label(os.path.join(os.path.join(data_dir, 'dev.tsv')), 'dev.csv')

spec = model_spec.get('average_word_vec')

train_data = DataLoader.from_csv( filename='train.csv', text_column='sentence', label_column='label', model_spec=spec, is_training=True) test_data = DataLoader.from_csv( filename='dev.csv', text_column='sentence', label_column='label', model_spec=spec, is_training=False)

Train model

model = text_classifier.create(train_data, model_spec=spec, epochs=10)

Evaluate model

loss, acc = model.evaluate(test_data)

Export model as Tensorflow Lite

model.export(export_dir='model') model.export(export_dir='model', export_format=[ExportFormat.LABEL, ExportFormat.VOCAB])

Create a Function

Any function that is deployed with Cape needs to be named app.py, where app.py needs to contain a function called cape_handler() that takes the input that the function processes and returns the results. In the case of the sentiment analysis app the input is the text that we wish to classify and the output is the sentiment that can be negative or positive. 

The code snippet below shows our app.py. We can see that the cape_handler() function loads the TensorFlow Lite model that we previously trained and also its vocabulary. Additionally, the handler also vectorizes the text inputs using the vocabulary such that the inputs are encoded as numeric vectors before we run inference on them. The model then predicts the sentiment of this encoded text and outputs its predicted sentiment.

Import libraries

import numpy as np from tflite_runtime.interpreter import Interpreter import contractions

Load vocabulary function

defload_vocab(path):     vocabulary = {}     with open(path, "r") as f:         for i, line in enumerate(f.readlines()):             item = line.strip().split(" ")             word = item[0]             encoding = int(item[1])             vocabulary[word] = encoding     return vocabulary

Text vectorization function

defvectorize_text(text, vocabulary, input_shape):     encoded_text = []     # Fix contractions     expanded_words = []     for word in text.split():         expanded_words.append(contractions.fix(word))     text = " ".join(expanded_words)     text = text.split(" ")     for word in text:         word = word.lower()  # convert to lower case         # account for words not in vocabulary         if word in vocabulary.keys():             word_encoding = vocabulary[word]         else:             word_encoding = vocabulary["<UNKNOWN>"]         encoded_text.append(word_encoding)     encoded_text = np.array(encoded_text, dtype=np.int32)     encoded_text = np.pad(         encoded_text, (0, input_shape[1] - len(encoded_text)), "constant"     )     encoded_text = np.reshape(encoded_text, (input_shape[0], input_shape[1]))     return encoded_text

Cape Handler

defcape_handler(text):     text = text.decode("utf-8")     # Load vocabulary     vocabulary = load_vocab("./vocab.txt")     # Load the TFLite model and allocate tensors.     interpreter = Interpreter(model_path="./model.tflite")     interpreter.allocate_tensors()     # Get input and output tensors.     input_details = interpreter.get_input_details()     output_details = interpreter.get_output_details()     # Predict     input_shape = input_details[0]["shape"]     input_data = vectorize_text(         text=text, vocabulary=vocabulary, input_shape=input_shape     )     interpreter.set_tensor(input_details[0]["index"], input_data)     interpreter.invoke()     output_data = interpreter.get_tensor(output_details[0]["index"])     output_result = np.argmax(output_data)     if output_result == 1:         result = "positive"     else:         result = "negative"         prob = output_data[0][output_result] * 100     return (str(float(f'{prob:.2f}')) + "% " + result) or"You've stumped me! Please try a different phrase."

Deploy with Cape

To deploy our function with Cape, we first need to create a folder that contains all needed dependencies. For this sentiment analysis app, that deployment folder needs to contain the app.py above, the trained TFLite model and its vocabulary. Additionally, because the app.py program imports some external libraries, the deployment folder needs to have those as well. We can save a  list of those dependencies into a requirements.txt file and run docker to install those dependencies into our deployment folder called app as follows:

sudo docker run -v `pwd`:/build -w /build --rm -it python:3.9-slim-bullseye pip install -r requirements.txt --target ./app/

Now that we have everything ready, we can log into Cape:

cape login Your CLI confirmation code is: GZPN-KHMT Visit this URL to complete the login process: https://login.capeprivacy.com/activate?user_code=GZPN-KHMT Congratulations, you're all set!

And after that we can deploy the app:

cape deploy ./app Deploying function to Cape ...

Success! Deployed function to Cape

Function ID ➜ CzFFUHDyjq6Uqm8MCVfdVc

Checksum ➜ eb989a5ef2fabf377a11ad5464b81b67757fada90a268c8c6d8f2d95013c4681

Invoke with Cape

Now that the app is deployed, we can pass it an input and invoke it with cape run:

cape run CzFFUHDyjq6Uqm8MCVfdVc "This was a great film" 78.08% positive

JavaScript Front-end with Cape SDK

In addition to the CLI, Cape also provides Python and JavaScript SDKs. Moreover, the CLI also allows developers to generate tokens for their functions as follows:

Cape token <function ID> -- expires <number of seconds>

We can then use cape-js to invoke the function deployed in the enclave. First, we need to install cape-js with:

npm install @capeprivacy/cape-sdk

Or:

yarn add @capeprivacy/cape-sdk

Then we can import it to our JavaScript program:

import { Cape } from'@capeprivacy/cape-sdk';

Within our JavaScript program that we used to create the front-end, we can use the function token to connect to the enclave using cape-js as follows. 

const client = new Cape ({ functionToken: <function token>});

The function id is then used to run the function that we previously deployed in the enclave with cape deploy.

await client.run({ id: '<FUNCTION_ID>', data: 'input' });

Using JavaScript and cape-js, we have created a front-end for the sentiment analysis application that allows users to go to a website, enter any text, click a button, and see the predicted sentiment. Go ahead, and try it yourself.

Conclusion

In this blog we walked through one example use case for Cape’s Confidential Computing platform based on AWS Nitro enclaves. Specifically, we built a sentiment analysis application with TensorFlow lite that classifies the sentiment of any text as positive or negative. We have shown how this app can be seamlessly deployed with Cape’s CLI to ensure that the textual data processing is performed securely. In addition to the CLI, we have also showcased how cape-js, Cape’s JavaScript SDK that can be used within JavaScript programs, can connect to an enclave and run any deployed function. The front-end that we built gives Cape’s users a GUI interface for interacting with the sentiment analysis app in addition to the CLI.

Check out the Getting Started Docs to try Cape for free. We’d love to hear what you think.

Share this post