MMS • RSS
Posted on nosqlgooglealerts. Visit nosqlgooglealerts
Vector databases are making a resurgence as generative AI hype sweeps across every industry. Organizations with developed graph and relational databases may question if they need to add another database, but vectors can provide benefits to any aspiring GenAI effort.
Vector databases are data stores specializing in similarity searches. Relational databases, as implied by the name, are for storing entities and their relationships with one another and enabling querying of the relationships. A graph database is a type of NoSQL data store, which excels at searching vast amounts of text, among other things.
When evaluating the use of vector versus graph or relational databases, it’s not an or; it’s an and, said Sinclair Schuller, co-founder and managing partner of Nuvalence.
Vector databases have been around for decades and are a proven tool in search and recommendation systems. Interest in vector databases has recently picked up steam thanks to the popularity of GenAI services. The large language models (LLMs) that underpin GenAI services process text and other data as high-dimensional vectors in an intermediate embedded space.
In addition to traditional data warehouse structures, vector databases play a crucial role in facilitating the utilization of unstructured data, such as text, documents and images, in a format that is compatible with GenAI’s LLMs.
Relational databases remain essential for managing structured, tabular data, and graph databases hold a unique position in defining the intricate relationships between various data points. Many GenAI applications use both structured data and knowledge graph data, along with documents, to deliver comprehensive insights tailored to specific enterprise inquiries.
Vector databases can provide plenty of additional value compared to other popular tools built on relational and graph databases.
Vector vs. graph vs. relational databases
As data management teams come to grips with the popularity of vector databases compared to graph and relational approaches, some teams may debate whether it’s a choice of one or the other.
Vector databases and graph databases are more specialized than relational databases and designed for specific use cases, said Jeff Springer, principal consultant at DAS42.
Vector databases excel at handling high-dimensional, similar data in natural language processing, LLMs and recommendation engines, which require semantic similarity searches. They’re also good at performing time-series analyses with vast quantities of data, such as predicting stock prices.
Graph databases are good at modeling and analyzing complex, interconnected data where it’s important to understand the relationship between data points. Examples include social networks and factors related to fraud detection. Historically, they are typically categorized into two types: property graphs and knowledge graphs.
Relational databases handle pretty much everything else and can handle most of the world’s business questions. A company needs a relational database unless it deals only with high-dimensional data, which means it doesn’t have any sales, finance, HR or supply chain divisions. Companies in that situation should ask themselves if a vector database is needed, Springer said.
Data teams may also wonder why they need another option if they already have multiple database models employed across relational, graph, NoSQL and other data stores. One important reason is improving data analysis and workflows around unstructured data.
“Vector databases offer the optimal solution for storing and utilizing unstructured data,” said Bret Greenstein, data and analytics partner at PwC. “They excel in converting text, documents and images into vector representations of their content.”
Enterprises are adopting vectors at an increasing rate to complement existing relational and graph databases. By incorporating vector databases into their data infrastructure, organizations can enhance their data management capabilities and unlock valuable insights from unstructured data sources.
Vector databases are fundamentally and architecturally different from other databases. It’s not just a question of modeling data differently; vector databases store, index and query data differently, Springer said.
Additionally, vector database designs typically scale horizontally by adding more servers to a cluster. In contrast, relational databases can scale both horizontally and vertically by adding more resources to a single server.
Why use a vector database
It may be helpful to think of vector databases as math engines, Schuller said. As such, several distinctions are important. For example, indexes work differently in vector databases. Vector indexes excel at helping optimize mathematical operations.
Similarity search is key
Most vector databases rely on approximate nearest neighbor search to execute similarity searches. A similarity search is a query where results are typically ordered by how similar they are to the query. Other forms of databases are arguably ill suited to support similarity searches, Schuller said.
In a vector database, indexes are specifically designed around a similarity metric to optimize how searches work using vectors. A vector is a mathematical unit defined by a set of values that give a direction and magnitude or distance. For example, if one were to draw a straight line from Los Angeles to New York, one could say the line — or vector — is directionally northeast, 2,500 miles long and two-dimensional.
Vector databases store high-dimension vectors with potentially thousands of dimensions that approximate characteristics of the data each vector intends to represent. Storing vectors enables an interesting form of querying that underpins similarity search.
Going back to the Los Angeles-to-New York example, which would be more similar to the vector between the two cities: San Diego to Boston or San Francisco to Houston? A cursory map review reveals its San Diego to Boston.
“A vector database enables the sort of query that was just presented, except with data that represents text, images or other forms of data,” Sinclair said.
Sinclair SchullerCo-founder and managing partner, Nuvalence
Turbocharging LLMs
Organizations that use LLMs should consider vector databases given that similarity searches combined with LLMs improve the use of context.
Vectors play a crucial role in LLMs within GenAI, Greenstein said. Vectors enable the comparison of concepts within an LLM by measuring the distance between vectors in a dimension.
“While this concept may seem complex, it is the most practical solution available and scales effectively for enterprise applications,” he said.
One practical example within LLMs might be considering the mathematical descriptions of the concepts dog and cat within an LLM. When discussing the topic of pets, their vectors are mathematically close to each other. However, when considering the topic of species, the vectors for cat and tiger are closer to each other. Vectors capture the relationships between the similar concepts within different topics. The possibilities increase when applied to all concepts and topics within an LLM.
Vectors are also essential for encoding prompts and enterprise data, as they enable the calculation of distances and facilitate effective answers within the LLM. They help organizations unlock the full potential of their data and prompt the LLM to generate insightful and relevant responses.
Vector databases face different challenges
Vector databases also introduce several new challenges to existing processes compared with relational and graph databases. According to Greenstein, the main challenges include machine readability, the maturity of tools and access, and new approaches for information retrieval:
- Machine readability. Vector databases store representations optimized for machine understanding rather than human comprehension. It requires additional work to index the vectors effectively, enabling applications to identify the relevant vectors for a given question or prompt.
- Maturity of tools and data access. Vector databases support concepts such as role-based access control, but the tools and approaches for data access are still evolving. Although options exist, it is essential to consider the specific requirements and ensure the appropriate tools are in place to facilitate seamless data access and management.
- New approaches for search and information retrieval. Vector databases necessitate fresh approaches to search and information retrieval, particularly when dealing with large-scale unstructured data, which can make up 80% of all enterprise data. Tailor indexing and content chunking techniques and skills to each use case to achieve optimal results.
Paving the skills gap
Working with vector databases requires skills and abilities not commonly found in the current data analytics field, Springer said. Relational databases and SQL are commonly used, and working with them is table stakes for any data team. Vector databases are newer, and the individuals with the ability to work with them are much rarer.
However, efforts to integrate traditional relational tools with vector databases may help flatten the learning curve, Springer said. For example, the KX vector database is making tremendous strides in paving workflows and skills across vector and relational domains. The KX partnership with Snowflake may further reduce the challenges in implementing vector databases for many organizations.
In the long run, integrating the tools together may help enterprises consider how to use vector, relational and graph databases together.
“There are a number of valuable reasons to create multi-data store architectures, with the primary reason being the right tool for the job,” Schuller said. Structured data is always a need, and the new approaches using vector and graph databases can help complement existing data management operations.
George Lawton is a journalist based in London. Over the last 30 years, he has written more than 3,000 stories about computers, communications, knowledge management, business, health and other areas that interest him.