Bob van Luijt, co-founder and CEO of Weaviate, and I first met over lunch more than a year ago in San Francisco. He was in town from Amsterdam and wanted to talk about open source business models. I was looking for the opportunity to better understand the role of vector databases in the emerging AI stack.
It was one of my most memorable bowls of spaghetti; Bob and I talked for hours about the evolution of search and recommendation systems with advancements in AI, databases, community, and building open source businesses. Needless to say, I was wildly late to my next meeting.
In the time since, I’ve come to expect that when I spend time with Bob - his insight, depth, thoughtful optimism, and deep curiosity (he asks more questions than any other founder I’ve ever met) keeps a conversation going long after a bill is paid or a calendar invite has run its course.
"I’ve come to expect that when I spend time with Bob - his insight, depth, thoughtful optimism, and deep curiosity (he asks more questions than any other founder I’ve ever met) keeps a conversation going long after a bill is paid or a calendar invite has run its course."
— Erin Price-Wright, Partner, Index Ventures
After that first meeting I was enthralled by Bob but still trying to wrap my head around whether or not the world needed another database. At Index we’ve been investing in open source databases for over a decade, backing developer-first companies like Elastic, Confluent, Cockroach, Starburst, and Clickhouse. And we have been big believers of the transformative power of AI since our early investments in Aurora, Scale, and Cohere, plus dozens more since.
The Vector Advantage
Traditionally, databases use a method of indexing content to enable keyword search in text (e.g. for e-commerce, enterprise search, etc). With the introduction of transformer models and increased adoption of AI, many are transitioning away from keyword search to something called semantic search - also known as vector search - where, rather than matching on specific keywords, you can search for concepts and get back content which is most similar to that concept as determined by a machine learning algorithm.
This works by using a model to represent data as a vector of weights, where each value corresponds to a certain feature produced by the model. This is called an embedding. When a user then runs a query on the database, rather than looking for an exact keyword match, the database returns the most similar vector of weights.
This has a few powerful implications. First, vector search matches on concepts rather than words. So, if you search “lynx,” you’ll also get results for “bobcat.” Also, vector databases can be inherently both multilingual and multimodal. Depending on the model used for the embedding, you can represent information from pictures, videos, audio, text, etc in the same vector space. While text is the biggest use case today, soon all media will follow.
Database Meets AI
AI-powered search is the core methodology powering search and recommendation algorithms at companies like Google, Tiktok, and Spotify today. But it wasn’t quite obvious to us a year ago that search alone was a big enough market to support a generational company in the vector database category.
Enter ChatGPT. The wave of excitement around Large Language Models (LLMs) over the last 6 months has been fast followed by companies - from two-person teams building an AI native product from scratch to enterprises with a data and distribution advantage - building AI-powered applications on top of proprietary data. These use cases require a vector database to serve as their long term memory, e.g. for document summarization or question-answering. We believe the size of this opportunity is massive, as a critical component in one of the largest platform shifts in software ever, on the scale of Mobile or the Internet itself.
Weaviate has a few key advantages here. First, it’s open source. This landscape is evolving incredibly quickly and a canonical reference architecture has not yet emerged. Rather than struggle to keep up with the pace of change, the fact that Weaviate is open, extensible, developer friendly, and super easy to integrate with source allows them to benefit massively from the developer momentum across AI. They have integrations with model providers like OpenAI and Cohere, frameworks like Langchain and LlamaIndex, and developers are consistently contributing new integrations as they emerge.
"It elegantly ties three things together: a key value store, an inverted index, and the vector store itself. This provides an excellent developer experience for teams who are trying to build for real use cases on top of Weaviate."
— Erin Price-Wright, Partner, Index Ventures
Second, Weaviate was built from the ground up to serve these use cases. In particular, it elegantly ties three things together: a key value store, an inverted index, and the vector store itself. This provides an excellent developer experience for teams who are trying to build for real use cases on top of Weaviate. It’s hard to build a database from scratch, and the strength of adoption and community love for the product speak for themselves - they have 10k downloads a day from developers, and are powering production applications in organizations all over the world.
Third, their deployment model offers customers a choice in how they run Weaviate. During a time when cloud budgets are being scrutinised and companies are anxious about data privacy and exfiltration related to their use of Large Language Models, this has a massive advantage. This is especially true for larger enterprises, who end up being the powerhorse customers for most databases. Since they launched the public beta of the Weaviate Cloud Service in the first week of April, they have already seen thousands of signups, and they’ve onboarded dozens of happy enterprises to their hybrid SaaS product.
But what really shines through with Bob, his co-founder Etienne, and the entire Weaviate team is their relentless focus on delivering customer value. They are grounded in curiosity and creativity, and can connect the dots between the latest research and what their users are actually trying to do in the real world. They understand their customers - what drives value for them, what they are worried about, what matters in the context of their business. And for nearly 5 years they have been building a product to serve them. This was evident in every customer conversation we had - developers keep coming back to the quality and responsiveness of the team as a core differentiator. That level of value orientation is rare, and when we see that at Index, we lean in.
With that, I couldn’t be more thrilled to announce our partnership with Weaviate to lead their Series B. I look forward to many more bowls of spaghetti to come as we build a generational company together. The world is likely to look very different in 5 years and I’m confident that Weaviate will help lead the way.
Published — April 20, 2023