Scaling Scale: Riding the Artificial Intelligence Wave

by Mike Volpi


In this post: Scale

Scale has established itself as the clear leader in high-quality training and validation data for machine learning and AI applications. Today we celebrate their $100M Series C financing round.

Almost exactly a year ago, we announced our investment in Scale, an exciting young company whose mission is to accelerate the development of Artificial Intelligence by democratizing access to intelligent data. We invested because of the company’s mission and because of its talented team led by Alex Wang, a remarkable founder.

The year has flashed by in the blink of an eye, but this young company has matured and grown in leaps and bounds. Today, we celebrate their $100M financing round led by Founders Fund and participation from Thrive and Spark. We are honored that the esteemed group of investors that make up this round share in the optimism about Scale’s future.

Scale today has established itself as the clear leader in high-quality training and validation data for machine learning and AI applications. In order to better understand the company’s success, one has to look beyond the common perception of this business. Conventional wisdom would suggest that the creation of training data for Machine Learning is done by large numbers of data labelers in emerging markets. While Scale has indeed assembled an unprecedented community of labelers around the world, the type of performance they have shown comes from much richer competencies that are much deeper and more subtle than what meets the eye.

When a customer evaluates a training data partner, they consider four important parameters. First, the input for data labeling is significantly broader than simple images. Companies that employ machine learning have data from wide-ranging perception sources that not only include RGB images but also extend to many other factors such as Lidars (and there is a broad variety of those), radars, medical imagery from MRI and CAT scans, thermal sensors, moisture detection sensors, computer-generated imagery, texts,social media, and many more. The ability to take input from all these sources, and in some cases sensor-fused imagery, is key to be able to rely on a supplier of training data. Second, the accuracy of labels and annotations is critical. The garbage-in garbage-out paradigm applies particularly well to ML models. If inaccurate data is fed to the models, they will produce faulty outputs. Therefore, labeling that is accurate to the 97%+ level is very important for the use of robust input data. Third, in aggregate, training data is an expensive gambit for companies that employ ML, so the capability to create highly accurate labeled data in a cost-effective way is critical for AI’s financial feasibility in the real world. Last but not least, training data must be delivered in a timely fashion. The longer it takes to label data, the more bottlenecks are created in the development process of ML solutions.

Scale’s secret sauce is the deep collection of technology that it has developed. This technology has allowed the company to lead the market across all of these four parameters. It’s a direct result of Scale being able to both assemble a massive and efficient labeling community as well as create a technology suite that empowers these labelers to outperform their counterparts by order of magnitude in terms of accuracy, cost-effectiveness and timeliness. The scale of the community is a product of having the operational excellence necessary to recruit, manage and incentivize this community. Labeling is very much a two-sided market. In achieving the current (and future) scale, Scale has built a significant moat to ward off competitors. The product suite is a result of the company’s core technical team that rivals much larger companies in its understanding of how ML works and how it is utilized. By deeply understanding ML, and, in some cases, using its own ML models, Scale’s labels have unmatched superpowers. In the abstract, Scale is a two-sided marketplace that transforms commodity labor into professional and differentiated craftsmanship.

It is not surprising, therefore, that Scale’s customers are the who’s who in the ML world. Today, the largest volumes of training data are used by leading autonomous vehicle (AV) players like Waymo, Cruise, Zoox, Toyota, Nuro, Nutonomy, Voyage and many unnamed others that have voted with their purchase order to select Scale. Yet Scale’s customer base extends significantly beyond AV to include a wide array of companies like OpenAI, Pinterest, Mapbox, Uber, P&G and Liberty Mutual who are increasingly relying on AI to scale their operations. It is clear from Scale’s wide adoption that there is something special going on here.

What gets us most excited about the potential of this company is that we are really just in the early innings of the adoption of ML/AI. At its core, most ML applications are developing pattern recognition by observing massive amounts of data. That’s no different than how humans learn. Just as the human mind is able to learn vast numbers of tasks that are so dependent on pattern recognition, slowly but surely, machines will do the same. For now, ML is generally very application-specific. Predictions vary on how quickly machines will master general learning. However, application-specific learning will likely be the mainstay of AI for years to come. And with that, massive amounts of data will continue to need labeling and annotation. Scale will be a huge beneficiary of this enormous wave.

Our journey with Scale has in many ways just begun. Alex and his team will continue to grow as pioneers and leaders in the field. If the last twelve months are any indication of what is to come, they are headed towards a very bright future. From the entire Index team, hearty congratulations to the team at Scale on this impressive milestone.

Published — Aug. 5, 2019