The ability to process data securely is historically managed through various tools and policies. In the world of human resources, organizations may manage employee data through a single system and have policies in place to limit access. However, this data may at times need to be accessed and shared with other parties, internally and externally, including finance, department managers, insurance providers. Since this data tends to be sensitive information including employee personal information including family members, social security numbers and salary information, the ability to process this data securely is important. Because of this sensitivity of personally identifiable information (PII); generally, pay inequality findings are reported only to senior management and managers as it could potentially reveal inequitable compensation if shared with all the employees.
Organizations can address these problems to avoid litigation and address new legislation and promote an inclusive culture by making their compensation data transparent. With Cape, we can build a product that would allow employees to gain insight and visibility into how their salaries compare with their peers.
For this assignment, we will be using the Cape CLI, but it can also be implemented using PyCape or the Javascript SDK.
Use cape function
to create a function directory called equalpay with an example app.py.
$ cape function create equalpay
Here is the complete code for app.py that takes your role, location, salary and equity in csv format. Once it's deployed and run, it returns your compensation as a percentage of average salary across the location for the specific role.
#!/usr/bin/env python3
import csv
def cape_handler(arg):
arg_decoded = arg.decode('utf-8')
args = str(arg_decoded).split(",")
role = args[0]
location = args[1]
if len(args) > 2:
salary = args[2]
else:
salary = None
if len(args) > 3:
equity = args[3]
else: equity = None
datafile = "./compensation_dataset.csv"
person_data = { "Role" : role, "Location": location, "Salary": int(salary), "Equity": int(equity)}
try:
result = calculate_equal_pay(person_data, datafile)
except Exception as e:
return "ERROR!!"
return format_result(result)
def calculate_equal_pay(person_data, datafile):
min_sample_size = 100
data = read_data(datafile)
filtered_data = filter_data(person_data['Role'],
person_data['Location'], data)
if len(filtered_data) < min_sample_size:
return "Not enough data points for secure anonymous result."
salaries = [int(line['Salary']) for line in filtered_data]
equities = [int(line['Equity']) for line in filtered_data]
total_salary = sum(salaries)
total_equity = sum(equities)
if person_data['Salary'] is None: # Average compensation for role and location
return total_salary * 1.0 / len(filtered_data), total_equity * 1.0 / len(filtered_data)
else:
average_salary = (total_salary-person_data['Salary']) * 1.0 / (len(filtered_data) - 1)
average_equity = (total_equity-person_data['Equity']) * 1.0 / (len(filtered_data) - 1) # Your compensation as percentage of average
return (person_data['Salary'] * 100.0) / average_salary, (person_data['Equity'] * 100.0) / average_equity
def read_data(filename):
data = {}
with open(filename, "r") as f:
csv_reader = csv.DictReader(f)
data = list(csv_reader)
return data
def filter_data(role, location, data):
filtered_data = []
for line in data:
if ((line['Role'] == role) and (line['Location'] == location)):
filtered_data.append(line)
return filtered_data
def format_result(result):
salary = result[0]
equity = result[1]
out = "Provided salary is "
if salary > 100:
out = out + str(round((salary - 100) , 2)) + "% above"
else:
out = out + str(round((100 - salary) , 2)) + "% below"
out = out + " average\nProvided equity is "
if equity > 100:
out = out + str(round((equity - 100) , 2)) + "% above"
else:
out = out + str(round((100 - equity) , 2)) + "% below"
return out + " average"
Add a dataset Add your compensation dataset in csv format in equalpay directory as compensationdataset.csv with headers EmployeeID,Location,Role,Salary,Equity _Deploy function We can deploy this to Cape withcape deploy
. Assuming your function code and the dataset is within a folder named equalpay, run the following command: $ cape deploy equalpay
Your output should look similar to:
Deploying *function* to Cape...
Function ID ➜ hMnFfqBnpwkFktLamruDVf
Checksum ➜ 66731e5ccf226680dd5c98a1d1ad52b7a4c986984042d0672d8f3153130b34a8
Note: You will use the function ID (and optionally the checksum) to invoke your function later.
Run function After a function has been deployed, you can invoke it usingcape run
and provide input as a csv file:
`$ cat input.csv
Software Engineer,Miami,70000,31000
$ cape run hMnFfqBnpwkFktLamruDVf -f input.csv
Which produces:
Provided salary is 33.39% below average
Provided equity is 2.43% above average
This use case was implemented as part of a hackathon and ideally we would have the function and dataset deployed securely by the organization using cape. The function id can then be shared with the organization's employees who can securely check their compensation against their peers without learning about any individual's compensation.
Check out the Getting Started Docs to try Cape for free. We'd love to hear what you think.