Cape Privacy launches security-conscious collaboration platform for data science

Cape Privacy (formerly Dropout Labs), a startup developing a privacy-preserving platform for collaborative data science, today announced it has raised $5 million. It plans to use the money to accelerate its go-to-market efforts.

AI promises to transform — and indeed has transformed — entire industries, from civic planning and health care to cybersecurity. But privacy remains an unsolved challenge, particularly where compliance and regulation are concerned. Banks, health providers, and even retailers can run into problems when collaborating on AI and machine learning research involving sensitive or proprietary data, like patient records, financial documents, and supply chain details.

Cape, which was founded in 2018 by Ché Wijesinghe and GoInstant cofounder and CTO Gavin Uhma (Salesforce acquired GoInstnat in 2012), aims to help enterprises securely maximize the value of data with a collaboration layer built atop privacy, machine learning, and cryptography technologies. It offers encrypted data-sharing to help teams involved with compliance, legal, and risk management work better with each other and third-party vendors.

Cape Privacy

Cape Privacy

VB Transform 2020 Online – July 15-17. Join leading AI executives:
Register for the free livestream.

Cape’s open source software integrates with existing data science and machine learning infrastructure to provide a workflow guiding contributors toward building projects and policies. It enables admins to decide on the placement of development tools in relation to data storage and pipeline systems, ensuring data access, privacy, and monitoring meet each project’s requirements. And it allows stakeholders to set project-specific monitoring and auditing configurations so all parties receive the logs they need and can review, approve, and amend policies from a dashboard.

Cape Privacy

Cape Privacy

Much of Cape’s platform is underpinned and informed by tf-encrypted, its community-driven suite for experimenting with private machine learning on top of Google’s TensorFlow framework. Tf-encrypted enables training, validation, and prediction over encrypted data. The data remains encrypted during the entire data science workflow, meaning machine learning models can be hosted in the cloud without decrypting the inputs or outputs of the query.

“There are personal, competitive, and regulatory borders that sit between data and intelligence. The most valuable data is locked up today for these reasons,” wrote Uhma in a blog post. “Secure machine learning can enable access to data while complying with these borders. In other words, secure machine learning preserves the privacy of sensitive data … Hospitals could start to take advantage of cloud-based AI while managing the complexity of data privacy regulations and the natural sensitivity of health care data. Imagine assisting ophthalmologists by scanning retinal images for diabetic retinopathy, or pathologists by scanning lymph node biopsies for the spread of breast cancer.”

Uhma also touts the potential for further advances based on collaboration. “[S]ecure computation can lead to entirely new business models. Imagine multiple large banks pooling their data to train a fraud-detection model that is more accurate than what any one bank could develop on their own.”

Cape isn’t the first to propose a privacy-preserving approach to data science collaboration. Companies including Enveil, Cosmian, Duality Technologies, and Intel are also investigating homomorphic encryption for this purpose. This form of cryptography enables computation on plaintext (file contents) encrypted using an algorithm (also known as ciphertexts) so the generated encrypted result exactly matches the result of operations that would have been performed on unencrypted text. Using homomorphic encryption, a “cryptonet” can perform computation on data and return the encrypted result back to a client, which can then use the encryption key — which was never shared publicly — to decrypt the returned data and get the actual result.

In practice, homomorphic encryption libraries don’t yet fully leverage modern hardware and are at least an order of magnitude slower than conventional models. That said, newer projects like the accelerated encryption library cuHE claim speedups of 12 to 50 times on various encrypted tasks over previous implementations. And HE-Transformer, a backend for nGraph (Intel’s neural network compiler), delivers leading performance on some cryptonets.

Among the alternatives (and complements) to homomorphic encryption are federated learning, a technique that trains an AI algorithm across decentralized devices or servers (i.e., nodes), holding data samples without exchanging them, enabling multiple parties to build a common machine learning model without sharing data liberally. Federated learning goes hand in hand with differential privacy, a system for publicly sharing information about a data set by describing patterns of groups within the corpus while withholding data about individuals.

BOLDStart Ventures led the investment in New York-based Cape, with participation from VersionOne, Haystack, Radical, and Faktor.