Achieve Pay Equity While Keeping Data Private

  • Alan Wong
    Alan Wong
  • Lee Rosen
    Lee Rosen
  • Rahul Ramesh
    Rahul Ramesh
  • Shweta Sah
    Shweta Sah

Equal Pay Using Cape

The ability to process data securely is historically managed through various tools and policies. In the world of human resources, organizations may manage employee data through a single system and have policies in place to limit access. However, this data may at times need to be accessed and shared with other parties, internally and externally, including finance, department managers, insurance providers. Since this data tends to be sensitive information including employee personal information including family members, social security numbers and salary information, the ability to process this data securely is important. Because of this sensitivity of personally identifiable information (PII); generally, pay inequality findings are reported only to senior management and managers as it could potentially reveal inequitable compensation if shared with all the employees.

Organizations can address these problems to avoid litigation and address new legislation and promote an inclusive culture by making their compensation data transparent. With Cape, we can build a product that would allow employees to gain insight and visibility into how their salaries compare with their peers.

How did you implement it?

For this assignment, we will be using the Cape CLI, but it can also be implemented using PyCape or the Javascript SDK.

Write a function

Use cape function to create a function directory called equalpay with an example

$ cape function create equalpay

Here is the complete code for that takes your role, location, salary and equity in csv format. Once it's deployed and run, it returns your compensation as a percentage of average salary across the location for the specific role.

#!/usr/bin/env python3
import csv
def cape_handler(arg):
  arg_decoded = arg.decode('utf-8')
  args = str(arg_decoded).split(",")
  role = args[0]
  location = args[1]
  if len(args) > 2:
    salary = args[2]
    salary = None
  if len(args) > 3:
    equity = args[3]
  else: equity = None

  datafile = "./compensation_dataset.csv"
  person_data = { "Role" : role, "Location": location, "Salary": int(salary), "Equity": int(equity)}
    result = calculate_equal_pay(person_data, datafile)
  except Exception as e:
    return "ERROR!!"
  return format_result(result)

def calculate_equal_pay(person_data, datafile):
  min_sample_size = 100
  data = read_data(datafile)
  filtered_data = filter_data(person_data['Role'],
  person_data['Location'], data)

  if len(filtered_data) < min_sample_size:
    return "Not enough data points for secure anonymous result."

  salaries = [int(line['Salary']) for line in filtered_data]
  equities = [int(line['Equity']) for line in filtered_data]
  total_salary = sum(salaries)
  total_equity = sum(equities)

  if person_data['Salary'] is None: # Average compensation for role and location
    return total_salary * 1.0 / len(filtered_data), total_equity * 1.0 / len(filtered_data)
    average_salary = (total_salary-person_data['Salary']) * 1.0 / (len(filtered_data) - 1)
    average_equity = (total_equity-person_data['Equity']) * 1.0 / (len(filtered_data) - 1) # Your compensation as percentage of average
    return (person_data['Salary'] * 100.0) / average_salary, (person_data['Equity'] * 100.0) / average_equity

def read_data(filename):
  data = {}
  with open(filename, "r") as f:
    csv_reader = csv.DictReader(f)
    data = list(csv_reader)
  return data

def filter_data(role, location, data):
  filtered_data = []
  for line in data:
    if ((line['Role'] == role) and (line['Location'] == location)):
      return filtered_data

def format_result(result):
  salary = result[0]
  equity = result[1]
  out = "Provided salary is "

  if salary > 100:
    out = out + str(round((salary - 100) , 2)) + "% above"
    out = out + str(round((100 - salary) , 2)) + "% below"
    out = out + " average\nProvided equity is "

  if equity > 100:
    out = out + str(round((equity - 100) , 2)) + "% above"
    out = out + str(round((100 - equity) , 2)) + "% below"

  return out + " average"

Add a dataset Add your compensation dataset in csv format in equalpay directory as compensationdataset.csv with headers EmployeeID,Location,Role,Salary,Equity _Deploy function We can deploy this to Cape withcape deploy. Assuming your function code and the dataset is within a folder named equalpay, run the following command: $ cape deploy equalpay Your output should look similar to:

Deploying *function* to Cape...
Function ID ➜ hMnFfqBnpwkFktLamruDVf
Checksum ➜ 66731e5ccf226680dd5c98a1d1ad52b7a4c986984042d0672d8f3153130b34a8

Note: You will use the function ID (and optionally the checksum) to invoke your function later. Run function After a function has been deployed, you can invoke it usingcape runand provide input as a csv file:

`$ cat input.csv
Software Engineer,Miami,70000,31000

$ cape run hMnFfqBnpwkFktLamruDVf -f input.csv

Which produces:
Provided salary is 33.39% below average
Provided equity is 2.43% above average

This use case was implemented as part of a hackathon and ideally we would have the function and dataset deployed securely by the organization using cape. The function id can then be shared with the organization's employees who can securely check their compensation against their peers without learning about any individual's compensation.

Check out the Getting Started Docs to try Cape for free. We'd love to hear what you think.

Share this post