Opportunities exist to increase efficiency and save money by moving to a document database and practicing appropriate data modeling techniques.
Mar 23rd, 2023 8:25am by
Rising costs and uncertain economic times are causing many organizations to look for ways to do more with less. Databases are no exception. Fortunately, opportunities exist to increase efficiency and save money by moving to a document database and practicing appropriate data modeling techniques.
Document databases save companies money in two ways:
- Object-centric, cross-language SDKs and schema flexibility mean developers can create and iterate production code faster, lowering development costs.
- Less hardware is necessary for a given transaction throughput, significantly reducing operational costs.
Developer Efficiency
All modern development uses the concept of objects. Objects define a set of related values and methods for reading, modifying and deriving results from those values. Customers, invoices and train timetables are all examples of objects. Objects, like all program variables, are transient and so must be made durable by persisting them to disk storage.
We no longer manually serialize objects into local files the way Windows desktop developers did in the 1990s. Nowadays data is stored not on the computer running our application, but in a central place accessed by multiple applications or instances of an application. This shared access means not only do we need to read and write data efficiently over a network, but also implement mechanisms to allow concurrent changes to that data without one process overwriting another’s changes.
Relational databases predate the widespread use and implementation of object-oriented programming. In a relational database, data structures are mathematical tables of values. Interaction with the data happens via a specialized language, SQL, that has evolved in the past 40 years to provide all sorts of interaction with the stored data: filtering and reshaping it, converting it from its flat, deduplicated interrelated model into the tabular, re-duplicated, joined results presented to applications. Data is then painfully converted from these rows of redundant values back into the objects the program requires.
Doing this requires a remarkable amount of developer effort, skill and domain knowledge. The developer has to understand the relationships between the tables. They need to know how to retrieve disparate sets of information, and then rebuild their data objects from these rows of data. There is an assumption that developers learn this before entering the world of work and can just do it. But this is simply untrue. Even when a developer has had some formal education in SQL, it’s unlikely that the developer will know how to write nontrivial examples efficiently.
Document databases start with the idea of persisting objects. They allow you to persist a strongly typed object to the database with very little code or transformation, and to filter, reshape and aggregate results using exemplar objects, not by trying to express a description in the broken English that is SQL.
Imagine we want to store a customer object where customers have an array of some repeating attribute, in this case, addresses. Addresses here are a weak entities not shared between customers. Here is the code in C# /Java-like pseudocode:
Integer number;
String street, town, type;
Address(number, street, town, type) {
this.number = number
this.street = street
this.town = town,
this.type = type
}
//Getters and setters or properties as required
}
class Customer : Object {
GUID customerId;
String name, email
Array
Customer(id, name, email) {
this.name = name;
this.email = email;
this.customerId = id
this.addresses = new Array
}
//Getters and setters or properties as required
}
Customer newCustomer = new Customer(new GUID(),
“Sally Smith”, “sallyport@piratesrule.com”)
Address home = new Address(62, ‘Swallows Lane’, ‘Freeport’, ‘home’)
newCustomer.addresses.push(home)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
class Address : Object {
Integer number; String street, town, type; Address(number, street, town, type) { this.number = number this.street = street this.town = town, this.type = type } //Getters and setters or properties as required } class Customer : Object { GUID customerId; String name, email Array < Address > addresses; Customer(id, name, email) { this.name = name; this.email = email; this.customerId = id this.addresses = new Array < Address > () } //Getters and setters or properties as required }
Customer newCustomer = new Customer(new GUID(), “Sally Smith”, “sallyport@piratesrule.com”)
Address home = new Address(62, ‘Swallows Lane’, ‘Freeport’, ‘home’) newCustomer.addresses.push(home) |
To store this customer object in a relational database management system (RDBMS) and then retrieve all the customers in a given location, we need the following code or something similar:
// Add a customer
insertAddressSQL = “INSERT INTO Address (number,street,town,type,customerId) values(?,?,?,?,?)”
preparedSQL = rdbms.prepareStatement(insertAddressSQL)
for (Address address of newCustomer.addresses) {
preparedSQL.setInt(1, address.number)
preparedSQL.setString(2, address.street)
preparedSQL.setString(3, address.town)
preparedSQL.setString(4, address.type)
preparedSQL.setObject(5, customer.customerId)
preparedStatement.executeUpdate()
}
insertCustomerSQL = “INSERT INTO Customer (name,email,customerId) values(?,?,?)”
preparedSQL = rdbms.prepareStatement(insertCustomerSQL)
preparedSQL.setString(1, customer.name)
preparedSQL.setString(2, customer.email)
preparedSQL.setObject(3, customer.customerId)
preparedStatement.executeUpdate()
rdbms.commit()
//Find all the customers with an address in freeport
freeportQuery = “SELECT ct.*, ads.* FROM address ad
INNER JOIN address ads ON ad.customerId=ads.customerId AND ad.town=?
INNER JOIN customer ct ON ct.customerId = ad.customerId”
preparedSQL = rdbms.prepareStatement(freeportQuery)
preparedSQL.setString(1, ‘Freeport’)
ResultSet rs = preparedSQL.executeQuery()
String CustomerId = “”
Customer customer;
//Convert rows back to objects
while (rs.next()) {
//New CustomerID value
if rs.getObject(‘CustomerId’).toString != Customerid) {
if (customerId != “”) { print(customer.email) }
customer = new Customer(rs.getString(“ct.name”),
rs.getString(‘ct.email’),
rd.getObject(‘CustomerId’)
}
customer.addresses.push(new Address(rs.getInteger(‘ads.number’),
rs.getString(“ads.street”),
rs.getString(‘ads.town’),
rs.getString(“ads.type”)))
}
if (customerId != “”) { print(customer.email) }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
//Connect RDBMSClient rdbms = new RDBMSClient(CONNECTION_STRING) rdbms.setAutoCommit(false); // Add a customer insertAddressSQL = “INSERT INTO Address (number,street,town,type,customerId) values(?,?,?,?,?)” preparedSQL = rdbms.prepareStatement(insertAddressSQL) for (Address address of newCustomer.addresses) { preparedSQL.setInt(1, address.number) preparedSQL.setString(2, address.street) preparedSQL.setString(3, address.town) preparedSQL.setString(4, address.type) preparedSQL.setObject(5, customer.customerId) preparedStatement.executeUpdate() } insertCustomerSQL = “INSERT INTO Customer (name,email,customerId) values(?,?,?)” preparedSQL = rdbms.prepareStatement(insertCustomerSQL) preparedSQL.setString(1, customer.name) preparedSQL.setString(2, customer.email) preparedSQL.setObject(3, customer.customerId) preparedStatement.executeUpdate() rdbms.commit() //Find all the customers with an address in freeport freeportQuery = “SELECT ct.*, ads.* FROM address ad INNER JOIN address ads ON ad.customerId=ads.customerId AND ad.town=? INNER JOIN customer ct ON ct.customerId = ad.customerId” preparedSQL = rdbms.prepareStatement(freeportQuery) preparedSQL.setString(1, ‘Freeport’) ResultSet rs = preparedSQL.executeQuery() String CustomerId = “” Customer customer; //Convert rows back to objects while (rs.next()) { //New CustomerID value if rs.getObject(‘CustomerId’).toString != Customerid) { if (customerId != “”) { print(customer.email) } customer = new Customer(rs.getString(“ct.name”), rs.getString(‘ct.email’), rd.getObject(‘CustomerId’)
} customer.addresses.push(new Address(rs.getInteger(‘ads.number’), rs.getString(“ads.street”), rs.getString(‘ads.town’), rs.getString(“ads.type”))) } if (customerId != “”) { print(customer.email) } |
This code is not only verbose and increasingly complex as the depth or number of fields in your object grows, but adding a new field will require a slew of correlated changes.
By contrast, with a document database, your code would look like the following and would require no changes to the database interaction if you add new fields or depth to your object:
//Add Sally with her addresses
customers.insertOne(newCustomer)
//Find all the customers with an address in freeport
FreeportCustomer = new Customer()
FreeportCustomer.set(“addresses.town”) = “Freeport”
FindIterable freeportCustomers = customers.find(freeportCustomer)
for (Customer customer : freeportCustomers) {
print(customer.email) //These have the addresses populated too
}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
//Connect mongodb = new MongoClient(CONNECTION_STRING) customers = mongodb.getDatabase(“shop”).getCollection(“customers”,Customer.class) //Add Sally with her addresses customers.insertOne(newCustomer) //Find all the customers with an address in freeport FreeportCustomer = new Customer() FreeportCustomer.set(“addresses.town”) = “Freeport” FindIterable < Customer > freeportCustomers = customers.find(freeportCustomer) for (Customer customer : freeportCustomers) { print(customer.email) //These have the addresses populated too } |
When developers encounter a disconnect between the programming model (objects) and the storage model (rows), they’re quick to create an abstraction layer to hide it from their peers and their future selves. Code that automatically converts objects to tables and back again is called an object relational mapper or ORM. Unfortunately, ORMs tend to be language-specific, which locks development teams into that language, making it more difficult to use additional tools and technologies with the data.
Using an ORM doesn’t free you from the burden of SQL either when you want to perform a more complex operation. Also, since the underlying database is unaware of objects, an ORM usually cannot provide much efficiency in its database storage and processing.
Document databases like MongoDB persist the objects that developers are already familiar with so there’s no need for an abstraction layer like an ORM. And once you know how to use MongoDB in one language, you know how to use it in all of them and you never have to move from objects back to querying in pseudo-English SQL.
It’s also true that PostgreSQL and Oracle have a JSON data type, but you can’t use JSON to get away from SQL. JSON in an RDBMS is for unmanaged, unstructured data, a glorified string type with a horrible bolt-on query syntax. JSON is not for database structure. For that you need an actual document database.
Less Hardware for a Given Workload
A modern document database is very similar to an RDBMS internally, but unlike the normalized relational model where the schema dictates that all requests are treated equally, a document database optimizes that schema for a given workload at the expense of other workloads. The document model takes the idea of the index-organized table and clustered index to the next level by co-locating not only related rows as in the relational model, but all the data you are likely to want to use for a given task. It takes the idea that a repeating child attribute of a relation does not need to be in a separate table (and thus storage) if you have a first-order array type. Or in other terms, you can have a column type of “embedded table.”
This co-location or, as some call it, the implicit joining of weak entity tables reduces the costs of retrieving data from storage as often only a single cache or disk location needs to be read to return an object to the client or apply a filter to it.
Compare this with needing to identify, locate and read many rows to return the same data and the client-side hardware necessary to reconstruct an object from those rows, a cost so great that many developers will put a secondary, simpler key-value store in front of their primary database to act as a cache.
These developers know the primary database cannot reasonably meet the workload requirements alone. A document database requires no external cache in front of it to meet performance targets but can still perform all the tasks of RDBMS, just more efficiently.
How much more efficiently? I’ve gone through the steps of creating a test harness to determine how much efficiency and cost savings can be achieved by using a document database versus a standard relational database. In these tests, I sought to quantify the transactional throughput per dollar for a best-in-class cloud-hosted RDBMS versus a cloud-hosted document database, specifically MongoDB Atlas.
The use case I chose represents a common, real-world application where a set of data is updated frequently and read even more frequently: an implementation of the U.K. Vehicle Inspection (MOT) system and its public and private interfaces, using its own published data.
The tests revealed that create, update and read operations are considerably faster in MongoDB Atlas. Overall, on similarly specified server instances with a similar instance cost, MongoDB Atlas manages approximately 50% more transactions per second. This increases as the relational structure becomes more complex, making the joins more expensive still.
In addition to the basic instance costs, the hourly running costs of the relational database varied between 200% and 500% of the Atlas costs for these tests due to additional charges for disk utilization. The cost of hosting the system, highly available to meet a given performance target, was overall three to five times less expensive on Atlas. In the simplest terms, Atlas could push considerably more transactions per second per dollar.
Independent tests confirm the efficiency of the document model. Temenos, a Swiss-based software company used by the largest banks and financial institutions in the world, has been running benchmark tests for more than 15 years. In its most recent test, the company ran 74,000 transactions per second (TPS) through MongoDB Atlas.
The tests resulted in throughput per core that was up to four times better than a similar test from three years ago while using 20% less infrastructure. This test was performed using a production-grade benchmark architecture with configurations that reflected production systems, including nonfunctional requirements like high availability, security and private links.
During the test, MongoDB read 74,000 TPS with a response time of 1 millisecond, all while also ingesting another 24,000 TPS. Plus, since Temenos is using a document database, there’s no caching in the middle. All the queries run straight against the database.
Summary
Unless you intend to have a single database in your organization used by all applications, moving your workloads from relational to a document model would allow you to build more in less time with the same number of developers, and spend considerably less running it in production. It’s unlikely that your business hasn’t started using object-oriented programming yet. So why haven’t you started using object-oriented document databases?
Created with Sketch.