DefinedCrowd raises $50.5 million for AI data set curation

Seattle-based DefinedCrowd, which describes itself as a “smart” data curation platform, today announced that it raised $50.5 million in equity financing. CEO and founder Daniela Braga says the proceeds will be used to expand the company’s existing solutions, launch subscription-based offerings, and grow DefinedCrowd’s international reach.

AI algorithms typically require high-quality labeled data to train, which is why crafting corpora can take nearly as long as — and oftentimes longer than — than developing the models that ingest them. It’s a problem DefinedCrowd aims to solve with a bespoke model-training service for customers in customer service, automotive, retail, health care, and other enterprise segments.

Braga, who holds a Ph.D. in speech technology, is familiar with the ins and outs of data set curation. Prior to founding DefinedCrowd, she oversaw a $14 million effort to improve Microsoft’s AI-powered Cortana voice assistant, which she described as an uphill battle. Roughly 18 months of every product development cycle was spent procuring data to refresh the underlying models.


Above: A sampling of the tools on offer within DefinedCrowd’s suite.

Image Credit: DefinedCrowd

DefinedCrowd’s approach employs a community — Neevo — of more than 211,000 human contributors (up from 45,000 two years ago) in 195 countries who complete jobs involving labeling, typing, and speaking words and phrases. They contribute well over 500,000 samples a day to the data sets available through DefinedCrowd’s natural language processing, voice recognition, and computer vision tools.

VB Transform 2020 Online – July 15-17. Join leading AI executives:
Register for the free livestream.

Via APIs and a web interface, DefinedCrowd’s customers can filter demographics with a fine-tooth comb, specifying the age, location, and gender of Neevo members and even their proficiency in a given language for applications like transcription, voice emotion tagging, text sentiment and semantic annotation, text collection, question and answer collection, spontaneous speech collection. The platform supports over 50 languages and 79 dialects or about 90% of the world’s most widely spoken languages, with a claimed up to 98% accuracy.

But arguably its real value proposition is its flexibility. Customers can use DefinedCrowd’s platform not only to train models from scratch within budgetary and technical constraints, but to augment existing models with data sets tailored to their specific needs. Those with simpler requirements can take advantage of specialized workflows, templates, and off-the-shelf solutions or upload their own proprietary data sets, all while getting live cost estimates and a dashboard from which they can view progress.

For instance, developers of a news curation skill on Amazon’s Alexa platform could use DefinedCrowd to generate multiple datasets to improve the algorithm’s performance across markets.


Above: DefinedCrowd’s service dashboard.

Image Credit: DefinedCrowd

DefinedCrowd, which saw revenues grow 656% year-over-year last year, counts Fortune 500 companies including BMW, Mastercard, Nuance, and Yahoo Japan among its clientele. The company’s staff — which grew 176% year-over-year in 2019 to over 100 people — is spread among offices in Portugal, Seattle, and Japan, and DefinedCrowd plans to double its workforce and open additional R&D labs by 2021.

This latest round of funding (a series B), which brings DefinedCrowd’s total raised to $63.4 million following an $11.8 million raise in July 2018, saw participation from new investors Semapa Next and Hermes GPE and existing investors Evolution Equity Partners, Kibo Ventures, Portugal Ventures, Bynd Venture Capital, EDP Ventures and IronFire Ventures. They joined long-term backers including Amazon Alexa Fund, Sony Innovation Fund, and Mastercard.

Sign up for
Funding Weekly to start your week with VB’s top funding stories.