MMS • RSS
Posted on nosqlgooglealerts. Visit nosqlgooglealerts
On Thursday, Google announced a whole series of database and data analytics improvements to its cloud data architecture.
In this article, we’ll focus on the substantial improvements to Spanner and Bigtable (two of Google’s cloud database offerings). These announcements substantially increase interoperability and open the door to additional AI implementations through the use of new features Google is showcasing.
Also: Cost of data breach climbs 10%, but AI helping to limit some damage
Spanner is Google’s global cloud database. It excels in providing worldwide consistency (which is way harder to implement than it may seem) due to a plethora of time-related issues that Google has solved. It’s also scalable, meaning the database can grow big and span countries and regions. It’s multi-modal, meaning it supports media data and not just text. It’s also all managed through SQL (Structured Query Language) queries.
Bigtable is also hugely scalable (hence the “big” in Bigtable). Its focus is very wide columns that can be added on the fly and don’t need to be uniformly defined across all rows. It also has very low latency and high throughput. Until now, it’s been characterized as a NoSQL database, a term used to describe non-relational databases that allow for flexible schemas and data organization.
<!–>
Both of these tools provide support for giant enterprise databases. Spanner is generally a better choice for applications using a globally distributed database that requires robust and immediate consistency and complex transactions. Bigtable is better if high throughput is important. Bigtable has a form of consistency, but propagation delays mean that data will not immediately, but eventually, be consistent.
Bigtable announcements
Bigtable is primarily queried through API calls. One of the biggest and most game-changing features announced today is SQL queries for Bigtable.
This is huge from a programming skills point of view. In a 2023 Stack Overflow survey of programming language use, SQL ranked fourth, with 48.66% of programmers using it. There was no mention of Bigtable in the Stack Overflow survey, so I turned to LinkedIn for some perspective. A quick search of jobs containing “SQL” resulted in 400,000+ results. Meanwhile, a search for “Bigtable” resulted in 1,561 results, less than 1% of the SQL number.
Also: Google upgrades Search to combat deepfakes and demote sites posting them
So, while any number of folks who know SQL could have learned how to make Bigtable API calls, SQL means that the learning curve has been flattened to nearly zero. Almost one out of every two developers can now use the new SQL interface to Bigtable to write queries whenever they need to.
One note, though: this Bigtable upgrade doesn’t support all of SQL. Google has, however, implemented more than 100 functions and promises more to come.
Also on the Bigtable table is the introduction of distributed counters. Counters are features like sum, average, and other related math functions. Google is introducing the ability to get these data aggregations in real-time with a very high level of throughput and across multiple nodes in a Bigtable cluster, which lets them perform analysis and aggregation functions concurrently across sources.
This lets you do things like calculate daily engagement, find max and minimum values from sensor readings, and so on. With Bigtable, you can deploy these on very large-scale projects that need rapid, real-time insights and that can’t support bottlenecks normally coming from aggregating per node and then aggregating the nodes. It’s big numbers, fast.
Spanner announcements
Google has a number of big Spanner announcements that all move the database tool towards providing support for AI projects. The big one is the introduction of Spanner Graph, which adds graph database capabilities to the global distributed database functionality at the core of Spanner.
Don’t confuse “graph database” with “graphics.” The term means the nodes and connections of the database can be illustrated as a graph. If you’ve ever heard the term “social graph” in reference to Facebook, you know what a graph database is. Think of the nodes as entities, like people, places, items, etc., and the connections (also called edges) as the relationships between the entities.
Facebook’s social graph of you, for example, contains all the people you have relationships with, and then all the people they have relationships with, and so on and so on.
Spanner can now natively store and manage this type of data, which is big news for AI implementations. This gives AI implementations a global, highly consistent, region-free way to represent vast relationship information. This is powerful for traversal (finding a path or exploring a network), pattern matching (identifying groups that match a certain pattern), centrality analysis (determining which nodes are more important than the other nodes), and community detection (finding clusters of nodes that comprise a cluster of some sort, like a neighborhood).
Also: OpenAI rolls out highly anticipated advanced Voice Mode, but there’s a catch
Along with the graph data representation, Spanner now supports GQL (Graph Query Language), an industry-standard language for performing powerful queries in graphs. It also works with SQL, which means that developers can use both SQL and GQL within the same query. This can be a big deal for applications that need to sift through row-and-column data and discern relationships in the same query.
Google is also introducing two new search modalities to Spanner: full-text and vector. Full-text is something most folks are familiar with – the ability to search within text like articles and documents for a given pattern.
Vector search turns words (or even entire documents) into numbers that are mathematical representations of the data. These are called “vectors,” and they essentially capture the intent, meaning, or essence of the original text. Queries are also turned into vectors (numerical representations), so when an application performs a lookup, it looks for other vectors that are mathematically close to each other – essentially computing similarity.
Vectors can be very powerful because matches no longer need to be exact. For example, an application querying “detective fiction” would know to search for “mystery novels,” “home insurance” would also work for “property coverage,” and “table lamps” would also work for “desk lighting.”
You can see how that sort of similarity matching would be beneficial for AI analysis. In Spanner’s case, those similarity matches could work on data that’s stored in different continents or server racks.
Opening up data for deeper insights
According to Google’s Data and AI Trends Report 2024, 52% of the non-technical users surveyed are already using generative AI to provide data insights. Almost two-thirds of the respondents believe that AI will cause a “democratization of access to insights,” essentially allowing non-programmers to ask new questions about their data without requiring a programmer to build it into code. 84% believe that generative AI will provide those insights faster.
I agree. I’m a technical user, but when I fed ChatGPT some raw data from my server, and the result was some powerfully helpful business analytics in minutes, without needing to write a line of code, I realized AI was a game-changer for my business.
Also: The moment I realized ChatGPT Plus was a game-changer for my business
Here’s the problem. According to the survey, 66% of respondents report that at least half of their data is dark. What that means is that the data is there, somewhere, but not accessible for analysis.
Some of that has to do with data governance issues, some has to do with the data format or a lack thereof, some of it has to do with the fact that the data can’t be represented in rows and columns, and some of it has to do with a myriad of other issues.
Essentially, even though AI systems may “democratize” access to data insights, that’s only possible if the AI systems can get at the data.
That brings us to the relevance of today’s Google announcements. These features all increase the access to data, whether because of a new query mechanism, due to the ability of programmers to use existing skills like SQL, the ability of large databases to represent data relationships in new ways, or the ability of search queries to find similar data. They all open up what may have been previously dark data to analysis and insights.
You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, and on YouTube at YouTube.com/DavidGewirtzTV.
–>