Secure Sentiment Analysis with Cape

  • Ellie Kloberdanz
    Ellie Kloberdanz
Sentiment Analysis Blog

What is Sentiment Analysis and How Cape Makes it Secure?

Sentiment analysis is an application of natural language processing (NLP) that classifies the sentiment of text, typically as either positive or negative. Because vast amounts of data exist in textual form, sentiment analysis has a lot of practical applications including social media monitoring, customer feedback analysis, news analysis, market research etc. Processing this type of data in an automated manner therefore allows for extracting valuable information efficiently.

However, what if the textual input data that we need to analyze is sensitive or needs to stay confidential? This is where the Cape Privacy secure enclave system comes in. Cape Privacy provides a confidential computing platform based on AWS Nitro enclaves for security and privacy-minded developers. Cape allows for running serverless functions on encrypted data and ensures that sensitive data or intellectual property within apps is protected.

Cape provides a command line interface (CLI) and also a Python and JavaScript software development kits (SDKs) called pycape and cape-js that allow developers to deploy their apps and allow users to interact with them in a secure manner.

There are three essential components that enable this: cape encrypt, cape deploy, and cape run. The command cape encrypt encrypts inputs that can be sent into the Cape enclave for processing, cape deploy _performs all needed actions for deploying a function into the enclave, and finally _cape run invokes the deployed function with an input that was previously encrypted with cape encrypt. Learn more on the Cape docs.

How to Build a Sentiment Analysis App with Cape?

Training a Text Classification Model

The function that we wish to deploy and run in case of sentiment analysis is a text classification model and therefore, we need to first define its architecture and train it.  To make the model light weight we choose to use TensorflowLite and its Model Maker library. The training data that we use is the SST-2 (Stanford Sentiment Treebank), a commonly used dataset published by Socher et al. (2013) that consists of over 60,000 movie reviews that have been labeled as positive or negative. For the model architecture we use the average word embedding, which produces a model that is small and therefore, can perform fast inference. The following code snippet shows the model definition and training procedure and exports the trained model and its vocabulary as a TensorFlow Lite model.

Install dependencies

Copy
sudo apt -y install libportaudio2 && pip install -q tflite-model-maker-nightly

Import libraries

Copy
import numpy as np
import os

from tflite_model_maker import model_spec
from tflite_model_maker import text_classifier
from tflite_model_maker.config import ExportFormat
from tflite_model_maker.text_classifier import AverageWordVecSpec
from tflite_model_maker.text_classifier import DataLoader

import tensorflow as tf
import pandas as pd

assert tf.version.startswith('2')
tf.get_logger().setLevel('ERROR')

# Prepare training data
df.to_csv(new_file)

# Replace the label name for both the training and test dataset. Then write the
# updated CSV dataset to the current folder.

replace_label(os.path.join(os.path.join(data_dir, 'train.tsv')), 'train.csv')
replace_label(os.path.join(os.path.join(data_dir, 'dev.tsv')), 'dev.csv')

spec = model_spec.get('average_word_vec')

train_data = DataLoader.from_csv(
  filename='train.csv',
  text_column='sentence',
  label_column='label',
  model_spec=spec,
  is_training=True)
  test_data = DataLoader.from_csv(
  filename='dev.csv',
  text_column='sentence',
  label_column='label',
  model_spec=spec,
  is_training=False
)

# Train model
model = text_classifier.create(train_data, model_spec=spec, epochs=10)

# Evaluate model
loss, acc = model.evaluate(test_data)

# Export model as Tensorflow Lite
model.export(export_dir='model')
model.export(export_dir='model', export_format=[ExportFormat.LABEL, ExportFormat.VOCAB])

Create a Function

Any function that is deployed with Cape needs to be named app.py, where app.py needs to contain a function called cape_handler() that takes the input that the function processes and returns the results. In the case of the sentiment analysis app the input is the text that we wish to classify and the output is the sentiment that can be negative or positive.  The code snippet below shows our app.py. We can see that the cape_handler() function loads the TensorFlow Lite model that we previously trained and also its vocabulary. Additionally, the handler also vectorizes the text inputs using the vocabulary such that the inputs are encoded as numeric vectors before we run inference on them. The model then predicts the sentiment of this encoded text and outputs its predicted sentiment.

Import libraries

Copy
import numpy as np from tflite_runtime.interpreter
import Interpreter
import contractions

# Load vocabulary function

def load_vocab(path):
  vocabulary = {}

  with open(path, "r") as f:
    for i, line in enumerate(f.readlines()):
      item = line.strip().split(" ")
      word = item[0]
      encoding = int(item[1])
      vocabulary[word] = encoding return vocabulary

# Text vectorization function

def vectorize_text(text, vocabulary, input_shape):
  encoded_text = []

  # Fix contractions
  expanded_words = []

  for word in text.split():
    expanded_words.append(contractions.fix(word))
    text = " ".join(expanded_words)

  text = text.split(" ")

  for word in text:
    word = word.lower() # convert to lower case
    # account for words not in vocabulary
    if word in vocabulary.keys():
      word_encoding = vocabulary[word]
    else:
      word_encoding = vocabulary["<UNKNOWN>"]
      encoded_text.append(word_encoding)

  encoded_text = np.array(encoded_text, dtype=np.int32)
  encoded_text = np.pad(encoded_text, (0, input_shape[1] - len(encoded_text)), "constant")
  encoded_text = np.reshape(encoded_text, (input_shape[0], input_shape[1]))
  return encoded_text

# Cape Handler

def cape_handler(text):
  text = text.decode("utf-8")

  # Load vocabulary
  vocabulary = load_vocab("./vocab.txt")

  # Load the TFLite model and allocate tensors.
  interpreter = Interpreter(model_path="./model.tflite")
  interpreter.allocate_tensors()

  # Get input and output tensors.

  input_details = interpreter.get_input_details()
  output_details = interpreter.get_output_details()

  # Predict

  input_shape = input_details[0]["shape"]
  input_data = vectorize_text(
    text=text, vocabulary=vocabulary, input_shape=input_shape
  )
  interpreter.set_tensor(input_details[0]["index"], input_data)
  interpreter.invoke()

  output_data = interpreter.get_tensor(output_details[0]["index"])
  output_result = np.argmax(output_data)

  if output_result == 1:
    result = "positive"
  else:
    result = "negative"

  prob = output_data[0][output_result] \* 100
  return (str(float(f'{prob:.2f}')) + "% " + result) or "You've stumped me! Please try a different phrase."

Deploy with Cape

To deploy our function with Cape, we first need to create a folder that contains all needed dependencies. For this sentiment analysis app, that deployment folder needs to contain the app.py above, the trained TFLite model and its vocabulary. Additionally, because the app.py _program imports some external libraries, the deployment folder needs to have those as well. We can save a  list of those dependencies into a _requirements.txt file and run docker to install those dependencies into our deployment folder called app as follows:

Copy
sudo docker run -v pwd:/build -w /build --rm -it python:3.9-slim-bullseye pip install -r requirements.txt --target ./app/

Now that we have everything ready, we can log into Cape:

Copy
cape login

Your CLI confirmation code is: GZPN-KHMT
Visit this URL to complete the login process: https://login.capeprivacy.com/activate?user_code=GZPN-KHMT
Congratulations, you're all set!`And after that we can deploy the app:`cape deploy ./app

Deploying function to Cape ...` `Success! Deployed function to Cape` `Function ID ➜ CzFFUHDyjq6Uqm8MCVfdVc` `Checksum ➜ eb989a5ef2fabf377a11ad5464b81b67757fada90a268c8c6d8f2d95013c4681`

Invoke with Cape

Now that the app is deployed, we can pass it an input and invoke it with cape run:

Copy
cape run CzFFUHDyjq6Uqm8MCVfdVc "This was a great film"

78.08% positive

JavaScript Front-end with Cape SDK

In addition to the CLI, Cape also provides Python and JavaScript SDKs. Moreover, the CLI also allows developers to generate tokens for their functions as follows:

Copy
cape token <function ID> -- expires <number of seconds>

We can then use cape-js to invoke the function deployed in the enclave. First, we need to install cape-js with:

Copy
npm install @capeprivacy/cape-sdk

Or:

Copy
yarn add @capeprivacy/cape-sdk

Then we can import it to our JavaScript program:

Copy
import { Cape } from '@capeprivacy/cape-sdk'

Within our JavaScript program that we used to create the front-end, we can use the function token to connect to the enclave using cape-js as follows.

Copy
const client = new Cape ({ functionToken: <function token>});

The function id is then used to run the function that we previously deployed in the enclave with cape deploy.

Copy
await client.run({ id: '<FUNCTION_ID>', data: 'input' })

Using JavaScript and cape-js, we have created a front-end for the sentiment analysis application that allows users to go to a website, enter any text, click a button, and see the predicted sentiment. Go ahead, and try it yourself at: https://demos.capeprivacy.com/sentiment.

Conclusion

In this blog we walked through one example use case for Cape's Confidential Computing platform based on AWS Nitro enclaves. Specifically, we built a sentiment analysis application with TensorFlow lite that classifies the sentiment of any text as positive or negative. We have shown how this app can be seamlessly deployed with Cape's CLI to ensure that the textual data processing is performed securely. In addition to the CLI, we have also showcased how cape-js, Cape's JavaScript SDK that can be used within JavaScript programs, can connect to an enclave and run any deployed function. The front-end that we built gives Cape's users a GUI interface for interacting with the sentiment analysis app in addition to the CLI.

Check out the Getting Started Docs to try Cape for free. We'd love to hear what you think.

Share this post