Podcast: Edo Liberty on Vector Databases for Successful Adoption of Generative AI and LLM based Applications

Uncategorized

Podcast: Edo Liberty on Vector Databases for Successful Adoption of Generative AI and LLM based Applications

MMS • Edo Liberty

Subscribe on:

Transcript

Srini Penchikala: Hi everyone. My name is Srini Penchikala. I am the Lead Editor for AI/ML and Data Engineering community at the InfoQ website, and I’m also a podcast host.

In this episode, I will be speaking with Edo Liberty, founder and the CEO of Pinecone, the company behind the vector database product. We will be discussing the topic of vector databases, which has gotten a lot of attention recently, and also the critical role these types of databases play in the generative AI or GenAI space.

Introductions [01:24]

Hi Edo. Thank you for joining me today. First, can you introduce yourself and tell our listeners about your career and what areas have you been focusing on recently?

Edo Liberty: Hi Srini. Hello all InfoQ listeners. So my career has been a mix of science and engineering for pretty much the entire time. I did my undergraduate in physics and computer science, my PhD and postdoc in computer science, and then. So my PhD in computer science, my postdoc in applied math, focusing on, back then it was called big data, algorithms, theory of machine learning, theoretical computer science and so on.

Interestingly, I ended up with my PhD thesis and so my postdoc work on something called dimensional reduction, which has to deal with very large data sets that are represented as vectors in high-dimensional spaces, which will become very useful later in my career.

Then I started my first company, moved to Yahoo to be a scientist and an adjunct professor at Tel Aviv University. I stayed there for a long time as director, eventually leading the research out of the New York team, focusing on ML infrastructure for Yahoo.

In 2016 moved to AWS to build AI services and platforms, including a well-known platform called SageMaker, but really plugging in many different pieces of infrastructure from AWS and learning how to build managed services and managed platforms in the cloud. And in 2019 opened Pinecone to build what is now widely known as a vector database, but back then nobody really understood what it is and what it’s supposed to do, and frankly it took almost two years for people to even know what the hell I’m talking about. But we’ve been added ever since, trying to make AI more knowledgeable and less elucidatory and more useful.

Srini Penchikala: Yes, definitely. Vector databases is one of the, I don’t want to call it byproducts, but a nice development out of the LLM adoption. I know these databases have been around, but since the ChatGPT and the large language models have been getting a lot of attention, vector databases are also getting a lot of attention.

What are Vector Databases [03:51]

For our listeners who are new to vector database concept, can you please define what a vector database is and how is it different from traditional databases?

Edo Liberty: They’re very different. I mean, in fact, one of the most exciting things for me with creating Pinecone was the fact that it is such a completely new kind of infrastructure that people use. It’s a database but used predominantly like a search engine. It deals with these types of data that are called vectors, what’s called vector embeddings. They are the output of machine learning models and the way that generative AI models or foundational models, language models represent anything really, whether it’s text or images or what have you.

And those internal representations tend to be very semantically rich are much more actionable in some sense than the raw object that you started with. And if you’re doing something that looks like semantic search or RAG or other use cases we can talk about later, then you really have to retrieve the right context for you on the lens. You really need to bring in relevant information at the right time and the objects that you work with are complex. They’re not rows in a table, right? They’re not rows in a database. They’re all these PDFs or images or JIRA tickets or what have you.

And so a vector database is really highly specialized to work with vectors with those very specialized queries that search and find things by relevance, by similarity, by alignment in this numerical representation. And maybe it’s too early to dive deep at this point, but this seemingly simple change of, oh, I’ll just change the data type and the kind of how I query ends up having far-reaching implications to what architectures, what cloud architectures actually makes sense, how you reduce cost, how you increase efficiency, and it ends up that you really need a completely new kind of database to do this well. Very exciting.

Srini Penchikala: Thank you. Yes. Can you talk about some background of vector databases? Like you mentioned the concept probably is older than LLMs, but how have they evolved in the last three to four or five years?

Edo Liberty: Yes. The idea of vector search is actually not new. Okay. I’ve been working on some of these algorithms even in my PhD, which was, I’m dating myself, but that’s like 20 years ago now. These algorithms have been around, many of them. They present really interesting mathematical challenges, engineering challenges and so on, and they’ve been used internally at big companies for many years at Facebook for fin ranking, at Google for like adserving, at Amazon for shopping recommendation, and in many other places. Again, I’m just listing a few.

Because those hyperscalers already had capacity to train what’s called these embeddings and were able to really benefit from those solutions, with AI and foundational models becoming commonplace enough that most engineers need to deal with it, that need to ingest, manage, and search over a very large number of these embeddings became commonplace. And at the same time, the demands from those systems became a lot higher.

First, they needed to be a lot easier to use. Your consumer is now somebody developing an application and not a systems engineer at Google. Second, they needed to be a lot cheaper and a lot more cost-effective because you’re not running Google Ads, which is maybe the biggest cash cow in tech history. You might be running something that’s a lot less lucrative or frankly that you don’t know the ROI on yet, and you’re just trying to make sure that you build responsibly.

And the third is that just the scale and the performance requirements have gotten a lot more demanding. When we started Pinecone, having a few million or 10 million vectors in an index was considered fairly large. That was a large index. Today we have customers with tens of billions of embeddings in one index, right? Sometimes broken to namespaces, sometimes not. Latency requirements became stricter throughput, SLAs on variance and latency. So even if you’re P90, P90 is good, the people really care about P99s and so on.

So with the more use cases we see, the more demands are put on the system to be extremely performant at many different operating points, which when we built those systems internally in big companies, you were really building for one type of application so you really didn’t have to care about other operating points. Those are the main three differences in how these platforms have had to evolve. And frankly, we had the privilege to be grappling with a lot of those engineering and then science issues for almost five years now, and just see them maybe a year or two before the rest of the market all the time and start working on them before people understand it’s actually a problem.

Use Cases [09:34]

Srini Penchikala: You mentioned a few use cases earlier. So are there any other specific use cases that are good candidates for a vector DB? What are some popular use cases or applications you see using the vector database solutions?

Edo Liberty: Vector databases really excel at retrieving complex objects. It usually takes, to be honest, but also images and others based on similarity and meaning. And as a result, they really excel at partnering in some sense, like being paired up with large language models in what’s something called RAG, retrieval augmented generation. That’s been a very, very strong pattern because it’s a very successful pairing. People create embeddings and push data into a vector database, use that basically as the search engine to create the context for the LLM, and that ends up being, even if you do something fairly basic, it already outperforms most systems out there and then if you spend a week or two more playing with it and improving it here and there, you end up getting really impressive results.

But people are doing all sorts of really amazing things with vector databases now and in general, this idea of semantics or similarity search at scale. People do drug design and chemical compound search and people do security and abuse prevention online. People do cyber prevention and people do support chats for call centers and you name it because use cases are really limitless. And of course all the original use cases we started thinking about when we created the company, including recommendation engines and others are still there as well.

Vector Database Terminology [11:24]

Srini Penchikala: Sounds good. Before we get into the technical details of the vector databases, I would like to ask a question about the terminology. There is the vector embedding and there is the vector index and of course the vector database. So can you talk about what these terms are and then how they are used in these databases?

Edo Liberty: Yes. So all three are parts of what it means to run this, for example, RAG introduction. The vector embedding is just a numerical representation of an item in your system. Say it’s like a document or a part of a text document. You would push that the way an embedding model. This is a foundational model that takes this input and outputs a numerical presentation, so let’s say a thousand-dimensional float array, like a thousand long float array. That array is the embedding.

By the way, the word embedding is just a mathematical term of flow. Taking an item from one space and lodging it into a different space flow, some different properties are preserved. That’s how the word embedding came to be. Got nothing to do with embedded hardware, stuff like that, or embedded software and that people sometimes think. It’s not. It’s a mathematical term. So that is the embedding. The embedding is just measuring that set of numbers. Okay?

A vector index is a piece of code, it’s an algorithm, it’s a data structure that takes a set of vectors of this embedding, say it’s 1,000 or 10,000 or 100,000 or a million, and recomputes all sorts of statistics, data structures, pointers, organizations like clustering and so on, organizes them such that given a query, you can pinpoint the most similar or otherwise best matching in that set. This really is an almost always in-memory algorithms focused, high-performance computing type of effort. There are many competitions on running those indexes faster and so on.

We ourselves sponsored and ran one of those called BigANN. We ran it with Facebook and Microsoft and others and AWS. So this is not our competition. It’s an open thing. People can contribute to it. There’s a running leading board and that is the vector index.

A vector database is a much more complicated object. First of all, it’s at least Pinecone, and I think most systems today are cloud-native systems that separate write paths and read paths and procuring hardware and so on, but also really have to do with how you organize very large indexes in disk or on blob storage efficiently, how you access them efficiently, how you distribute loads. And be able to do way more complicated kinds of questions. This is not just vector search finding the most similar thing, but also filter net data. So I want to find the most recommended item in my shop, but I want it to be in stock or I want the most similar document, but I want it to have been written in the last month and so on. So hard filters.

You need to do what’s called boosting or it’s called sparse boosting or sparse search, which is basically something that looks like keyword search. It’s much more general, but you can bake in something like keyword search into the same kind of index and so on, allow for fresh updates, allow for deletes, really build a whole distributed system around it. So that is a vector that can, all three of course are objects that we deal with on a regular basis.

Vector Algorithms [15:23]

Srini Penchikala: So you mentioned about some algorithms to facilitate these vector database management, right? So can you talk about more of these algorithms, which use cases use which type of algorithms?

Edo Liberty: Yes. So first I’ll just say this is one of my favorite topics to speak about. I just gave a whole semester course at Princeton on exactly this topic, focusing specifically on the algorithms in the core of vector database system. So I’ll just say that this is potentially a very large topic.

I’ll say that there are two kinds of optimizations that are somewhat orthogonal or whatever like this, somewhat distinct. First, it’s how you organize the data and what fraction of data you need to look at when you get a query. In Pinecone, we do this in blob storage, so we can do this incredibly efficiently. But you can organize your data in these small clumps called clusters that at query time you can intelligently figure out, oh, out of my maybe 10,000 files, I can only look at 1,000 and see, get the right answers, almost always, and if not, get a good approximation for them.

And how you do that is a whole science. There’s randomized algorithms to do that, for doing this with random projections and random hashing and what’s called making semantic hashing. There are clustering algorithms and so on. We internally also devised more complex versions of those that lend themselves well, not only high quality, but also for very efficient query routing.

The second type of optimization is now that you’ve reduced the number of vectors that you look at, how do you now compute very efficiently the top matches? How do you now scan them in the most efficient way? And that goes back to the vector indexing. Again, there are very, very interesting trade-offs with how much memory you compute, how much disk you compute, and you can see how much are you willing to spend in latency versus memory consumption versus what kind of instruction sets that you have on your machine and so on.

But once you scan in basically or computing the best results within, get one of those shards, and again, a whole world of innovation with what’s called quantization and compression and dimension reduction and very deep accelerations of compute with the vectorized instruction cells like CND instructions and so on. And again, there are whole competitions dedicated to running this more efficiently. So there’s an infinite amount of literature in order to go dive into if one is really into that kind of stuff.

Retrieval Augmented Generation (RAG) [18:10]

Srini Penchikala: Thank you. You also mentioned about the RAG, Retrieval Augmented Generation. Definitely I see this is being one of the main developments this year. I know a lot of people are using ChatGPT and Copilot, but I think the real value of these large language models is where the companies can train the models using their own data, their own proprietary data and ask questions about their own business domain related requirements.

So how do you see vector databases kind of helping with this RAG based applications development?

Edo Liberty: So RAG is one of the most common use cases of vector database nowadays. We see that people are able to get better context for the LLMs and get better results without retraining or fine-tuning their models. They’re able to securely and in a data governed way, which they couldn’t do before. And that became a very common use case for us. And so I agree with you a million percent that these general pre-trained LLMs are great at what they do, but for me as a business, if they can’t interact with my data somehow, then they become somewhat useless, or at least I can’t apply them to most of the problems that I care about. I shouldn’t say useless.

Security [19:39]

Srini Penchikala: Okay. We can switch gears a little bit Edo and then talk about security. So how do you see with the AI and all this power, so we definitely need to balance the security concerns. So how do you see if somebody is using vector databases? How can they make sure that their solutions are still secure and there are no other concerns?

Edo Liberty: I think we should separate two different kinds of securing. One of them has to do with cyber, and then one of them has to do with data governance. I want to address both.

First in terms of not shipping data where you shouldn’t ship data and so on, it’s critical. I mean, this is why we invest so much in security features and being HIPAA-compliant and SOX-compliant and so on. Every other managed service has to put a significant amount of effort and make sure that this is one of the core operating tenets like with any data infrastructure by the way. Any infrastructure that touches your data needs to be, you have to have a high level of confidence it does the right thing. You have to have a high level of trust in the vendor and their ability to actually deliver.

There is a data governance issue, which I think people have to think about very seriously. When you fine tune a model with data, you can’t later “delete” that data from the model. And so now you upfront commit to a future of very complex operation that if you want to be, say, GDPR-compliant, you really can do it. The only solution is keep retraining your models, keep fine-tuning them, so on, based on whatever it is you’re allowed to do and so on. And we see that with basically putting the data adjacent to the foundation model.

In a vector database, you can now decide dynamically what information is available to the model. You can keep it fresh, and then if a data was added literally five seconds ago, it is now available to your system. If you had to delete some portion of the data because of GDPR compliance, five seconds later that data is gone and your model never knows about it anymore and so on. So we find that to be very, very convenient, one of the main reasons why people choose this paradigm versus fine-tuning or retraining.

Serverless Data [22:12]

Srini Penchikala: One of the new features in the vector databases is the serverless architecture where the developers don’t need to worry about managing or provisioning the infrastructure. Can you talk about this and how these serverless architectures are enabling the vector databases even more and maybe give some examples of how this contributes to faster market entry for Gen-AI applications?

Edo Liberty: Yes. So first of all, different people call serverless different things, and so we want to make sure that we understand what serverless data is to begin with. What I mean by serverless and what we ship is a complete disassociation between your workload and the hardware that it is used to run it. It’s our job to figure out hardware to run where to answer your queries well, and in some sense we want to make all of that completely none of your problem. You want to know what data you have and what queries you run, and that’s it. And if you have some amount of data today and tomorrow that amount of data doubles or cuts in half, that’s great. You don’t need to worry about rescaling, resharding, moving data around, any of that stuff. If you cut the amount of data in half, you pay half as much and that’s it. You don’t need to worry about anything else.

And so that’s solved two main problems for us and for our customers at the same time. One is planning. We have a lot of users who are customers who build very aggressive and very ambitious projects, and they hope that the projects are going to be adopted by say 5% of the user base. But they say, do we need to provision for 20% of volumes base to adopt this thing or maybe 50%? We don’t know. And so they would go into this really complex decision-making that they just shouldn’t be worried about. They should just use as much as they need, and our system’s job is to scale to meet that demand.

By the way, we’ve had issues in the past where somebody tried to, and I’m not going to name the customer, but they thought that if 10% of the user base would adopt the feature, that would be a stellar success. Within a week, 90% of the use case will be the user base adopted because it was so novel that was so exciting. But also those went through the roof and suddenly they had scaling issues and so on. So we really had to go in and help them fix that.

The second thing is just overall cost reduction. When you procure hardware, you are buying a certain mix of memory, disk, network and so on. You just buy a pre-packaged ratio between different kinds of resources on your machine. And if your workload doesn’t fit exactly that, that machine is severely underutilized. And so anything that operates on a node basis, on a machine basis is inherently in various amount. What we were able to do with serverless is reduce the total application cost for some of our users by 50x or many, many of our users by 5 to 10x. That’s a massive savings. We were able to do that just because of that move.

Our ability to manage hardware a lot more automatically in a lot more sophisticated ways allows us to actually ship something that is way more cost-effective, and frankly, based on our analysis, that’s the only way to get the win.

Responsible AI [26:07]

Srini Penchikala: I want to talk about the social side of these amazing technologies. So I heard this somewhere, and I kind of like this statement. Responsible AI starts with responsible data. Can you talk about what the application developers should consider to create responsible solutions when they use vector databases?

Edo Liberty: I don’t think this has to do with vector databases at all. I think you’re right. I think this is the tension that AI and machine learning developers, engineers in general actually face all the time, balance between how fast you want to move and ship and how much you want to hold back and be responsible and be thoughtful and be careful.

And today, with incredibly fast moving market in this field, I think that tension has never been higher and companies are under a tremendous amount of pressure to ship something, and they’re oftentimes very concerned that what they’re going to put out there is going to backfire in some spectacular way. And it could be in some sense the least of their problem is if it’s just a marketing debacle. Sometimes they produce actual harm and that’s dangerous.

So I would say that being thoughtful and careful with what data you use for what, what guardrails you put in place, in general and specifically with AI is incredibly important.

That said, people still ship amazing things and companies should not be paralyzed with fear that something will potentially go wrong. And what I’ve seen happen is that people take on the slightly less risky parts of the stack or the applications that require slightly less access to very, very governed data or touch some really high stakes decision-making and go and ship those products first. And as a result, not only are they able to progress and get some value out of it, they actually get to learn and build talent in-house and build know-how, and then they’re able to get a lot more confidence that they can go and execute on the more complex or the most somehow potentially risky use cases.

Srini Penchikala: I heard this prediction at a recent Microsoft conference. So what they said was in no more than three years, anything that is not connected to AI will be considered broken or invisible. So that kind of tells the presence of AI in our daily lives. So how do you see AI and maybe even vector databases as part of the main part of this overall evolution playing a larger role in our work and daily lives in the future?

Edo Liberty: Yes, I mean, I agree with that statement and I think it’s much more sort of a mundane and practical statement than people make it to be. It sounds like some grandiose like prediction about general intelligence. I view this as something a lot more practical and obvious.

I have young kids. I have a nine-year-old and I have six-year-old twins, and they touch every screen and talk to every speaker. The idea that the speaker can’t hear them and talk back is weird to them, or the fact that some screen doesn’t react to being touched. That’s just the interface they grew up with. Things that don’t behave that way, are just like, whatever, old timing if you know. Whatever, it just don’t work in their opinion. For us, taking a flip phone, it’s just like, oh yes, this is a thing. It used to be a thing.

Interfacing with technology through language has become such an expected interface that people will just come to expect it. I think that’s the statement. It just became an interface that every product needs to have in some capacity. In that vein, I 100% agree, right?

What part we play in this? We play the part of the vector database, we play the part of knowledge and retrieval and managing data. We play a part of the ecosystem with models, with application, with evaluations, with monitoring, with ETLs, with ML hosting, with a few other technologies that are required to do top-grade AR platforms. And every company is going to have to invest in many different products because again, that’s an interface that people expect.

Online Resources [30:59]

Srini Penchikala: Thank you, Edo. So one last question. Do you have any recommendations on what resources our listeners can check out to learn more about vector databases, RAG or any other related technologies?

Edo Liberty: Yes. So we have put up a ton of material at Pinecone on how to do RAG, how to use LangChain, how to use open models from all of our partners, from Anthropic and Cohere and Hugging Face and OpenAI and so on. We have taken a conscious decision very early on to not evangelize our product, but rather to teach people how to build stuff. And so as a result, there’s a lot of notebooks and examples and integrations and so on.

For other sources, I find documentation and blog posts from different technology evangelists could be a lot more useful than say, GDOs or Docs. I personally love to follow examples and to learn by doing rather than go read the manual. And frankly, I mean just extrapolating from what I see, people who are successful in building with AI are people who just start doing it. And still by far the most common mode of failure is people get either so afraid or so whatever, concerned, they don’t start at all, or go into this analysis-paralysis mode where they start serving 7,000 technologies. If you’re using the wrong thing, you’ll figure it out mid-way. Just start building. And people who’ve done that have by and large been successful eventually.

Srini Penchikala: Thank you. Do you have any additional comments before we wrap up today’s discussion?

Edo Liberty: No, just to say thank you again, it was lovely and I’m glad I get to tell your listeners that building with AI it’s not as hard as it potentially used to be, and if they get started, they can start getting something done within probably a day or two and put everything aside. It’s fun. It’s really exciting new technology, and when things start to work, it’s like an aha moment. It’s like, “Holy shit. I just can do that. I can build a lot of things with this thing.” So highly recommend it and I’m glad I could encourage people to start whatever it is they already planned to do.

Srini Penchikala: Yes, AI and GenAI definitely bring different perspectives, different dimensions to problem solving and the multiple dimensions to solve a problem which humans cannot do as fast as machines, right? So thank you Edo, thank you very much for joining this podcast. It’s been great to discuss one of the very important topics in the AI space, the vector databases, which I see as the foundation of the GenAI and LLM evolution.

And to our InfoQ listeners, thank you for listening to this podcast. If you would like to learn more about the AI/ML topics, please check out the AI/ML and Data Engineering community on infoq.com website. I also encourage you to listen to the other podcasts and also check out the articles and news items on the website. Thank you.

Mentioned:

SageMaker
Pinecone
vector embeddings
Retrieval Augmented Generation (RAG)
LangChain
Anthropic
Cohere
Hugging Face

About the Author

Edo Liberty

Show moreShow less

.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Uncategorized