Data science is all about combing through massive stores of data to challenge conventional wisdom or discover new and surprising insights that can lead to better outcomes in business practices, scientific research, public policy, health and safety, and other aspects of living in the modern world.
Data scientists have the privilege of running the models used to gather and analyze large datasets in search of those elusive insights. The better the data available to a data scientist, and the more detailed that data is, the better the end result.
Organizations that recognize the value of turning massive volumes of data into innovations and competitive advantages are driving demand for data scientists, but despite the abundant opportunities, it has been difficult for many of them to put effective data science programs in place to make those meaningful discoveries.
The skills, tools, and strategies for turning raw data into useful insights are still new and evolving. But data privacy and security regulations can keep potentially useful data out of reach, hindering progress, and leading to frustration for both organizations and data science professionals eager to tap these resources.
How can we close this frustration gap, unleash the full potential of data science, and help data scientists and data-driven organizations find what they are looking for? Innovations in artificial intelligence (AI) and machine learning (ML) are the key. There is evidence that forward-thinking enterprises recognize this. Gartner found that enterprise use of AI grew 270% from 2015 to 2019, while IDC projects that annual spending on AI will more than double from 2019 levels and grow to $97.9B by 2023.
Those investments in AI, made for the purpose of supporting data science, need to be supplemented with access to rich sources of data to feed the virtuous cycle of machine learning algorithms that drive data science. Maintaining a technological edge in AI development has been identified as a matter of strategic economic and geopolitical importance by the United States government. Improving AI performance through ML algorithm training is vital to achieving that goal, a fact underscored by the nonprofit policy and research organization RAND Corporation.
In their 2020 report, Maintaining the Competitive Advantage in Artificial Intelligence and Machine Learning, RAND observed that, while the United States may have better AI tools than its rivals, "China has an advantage over the United States in the area of big datasets that are essential to the development of AI applications. This is partly because data collection by the Chinese government and large Chinese tech companies is not constrained by privacy laws and protections."
Because ML requires training data, ML algorithms would benefit from access to information currently protected by law. That means, for organizations in the U.S. and other Western nations, there is a fundamental conflict keeping data scientists from having access to powerful new sources of data that could be used for algorithmic improvement.
Encrypted learning is the key to overcoming this challenge. Encrypted learning is a process that combines machine learning and cryptography to extract insights from data that was previously unavailable to data scientists without violating privacy laws or confidentiality agreements. This happens through a secure multi-party computational process using secret sharing.
Secret sharing protects privacy by default by keeping data encrypted, even as encrypted learning allows the ML algorithms to access it for training. This removes a major barrier that has traditionally kept high-value data from data scientists. Where privacy and data protection regulations may have prevented information sharing in the past, that data can now be used without falling out of compliance.
By applying encrypted learning, organizations with large datasets relevant to highly regulated industries such as financial services, government, and healthcare can now make that data available to other organizations—even turning that data into a monetizable asset. Whereas in the past those datasets may have lacked sufficient detail to give ML algorithms proper context, now those demographic blanks are filled in without compromising privacy.
The goal is to support business strategies and decision-making with more precise insights and predictions that can save money, increase profit margins, and produce better outcomes—the kind of results that the Harvard Business Review described in the September/October 2020 article, "How to Win with Machine Learning," as a "competitive advantage in prediction."
If you'd like to learn more about encrypted learning, or about how Cape Privacy's encrypted learning platform can help you to gain a competitive advantage, you can click here for additional information.