Mobile Monitoring Solutions

Search
Close this search box.

MongoDB vs. ScyllaDB: Performance, Scalability and Cost – The New Stack

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

<meta name="x-tns-categories" content="Operations“><meta name="x-tns-authors" content="“>

MongoDB vs. ScyllaDB: Performance, Scalability and Cost – The New Stack

Open Source Side Hustles

What impact do you think working on open source side projects has on a contributor’s career prospects?

Positive. It shows how dedicated they are to improving the technology and being part of the community.

0%

Negative. If they’re such a dedicated employee, why are they spending so much time writing code for free?

0%

Neutral. Most employers don’t know or care how you spend your free time.

0%

Don’t know. What’s open source?

0%

2024-01-08 06:20:33

MongoDB vs. ScyllaDB: Performance, Scalability and Cost

sponsor-scylladb,



Operations

We performed an in-depth benchmarking study comparing the two databases that applied more than 133 measurements. Here are the results.


Jan 8th, 2024 6:20am by


Featued image for: MongoDB vs. ScyllaDB: Performance, Scalability and Cost

Image from svic on Shutterstock.

We previously compared the technical characteristics of two important NoSQL databases: the market-leading general-purpose NoSQL database MongoDB, and its performance-oriented challenger ScyllaDB. Both MongoDB and ScyllaDB promise a highly available, performant and scalable architecture. But the way they achieve these objectives is much more different than you might think at first glance.

To quantify the performance impact of these architectural differences, we performed an in-depth benchmarking study that applied more than 133 performance measurement results for performance and scalability. This article shares the high-level results.

TL;DR: ScyllaDB is best suited for applications that operate on data sets in the terabyte range and that require high (over 50 kOps) throughput while providing predictable low latency for read and write operations.

About This Benchmark

The NoSQL database landscape is continuously evolving. Over the past 15 years, it has already introduced many options and trade-offs when it comes to selecting a high-performance and scalable NoSQL database. We recently benchmarked MongoDB versus ScyllaDB to get a detailed picture of their performance, price performance and scalability capabilities under different workloads.

For creating the workloads, we used the Yahoo! Cloud Serving Benchmark YCSB, an open source and industry standard benchmark tool. Database benchmarking is often said to be nontransparent and to compare apples to pears. To address these challenges, this benchmark comparison was based on benchANT’s scientifically proven Benchmarking as a Service platform. The platform ensures a reproducible benchmark process (for more details, see the associated research papers on Mowgli and benchANT), which follows established guidelines for database benchmarking.

This benchmarking project was conducted by benchANT and sponsored by ScyllaDB to provide a fair, transparent and reproducible comparison of both database technologies. For this purpose, all benchmarks were carried out on the database vendors’ DBaaS offers, namely MongoDB Atlas and ScyllaDB Cloud, to ensure a comparable production-ready database deployment. Further, the applied benchmarking tool was the standard YCSB benchmark and all applied configuration options are exposed.

The DBaaS clusters ranged from three to 18 nodes, classified in three scaling sizes that are comparably priced. The benchmarking study comprised three workload types that cover read-heavy, read-update and write-heavy application domains with data set sizes from 250 GB to 10 TB. We compared a total of 133 performance measurements that range from throughput (per cost) to latencies to scalability. ScyllaDB outperformed MongoDB in 132 of 133 measurements:

  • For all the applied workloads, ScyllaDB provides higher throughput (up to 20 times) results compared to MongoDB.
  • ScyllaDB achieves P99 latencies below 10 milliseconds for insert, read and write operations for almost all scenarios. In contrast, MongoDB achieves P99 latencies below 10 ms only for certain read operations while the MongoDB insert and update latencies are up 68 times higher compared to ScyllaDB.
  • ScyllaDB achieves up to near-linear scalability, while MongoDB shows less efficient horizontal scalability.
  • The price-performance ratio clearly shows the strong advantage of ScyllaDB with up to 19 times better price-performance ratio depending on the workload and data set size.

To ensure full transparency and also reproducibility of the presented results, all benchmark results are publicly available on GitHub. This data contains the raw performance measurements, as well as additional metadata such DBaaS instance details and VM details for running the YCSB instances. You can reproduce the results on your own even without the benchANT platform.

MongoDB vs. ScyllaDB Benchmark Results Overview

The complete benchmark covers three workloads: social, caching and sensor.

  • The social workload is based on the YCSB Workload B. It creates a read-heavy workload, with 95% read operations and 5% update operations. We use two shapes of this workload, which differ in terms of the request distribution patterns, namely uniform and hotspot distribution. These workloads are executed against the small database scaling size with a data set of 500 GB and against the medium scaling size with a data set of 1 TB.
  • The caching workload is based on the YCSB Workload A. It creates a read-update workload, with 50% read operations and 50% update operations. The workload is executed in two versions, which differ in terms of the request distribution patterns, namely uniform and hotspot distribution. This workload is executed against the small database scaling size with a data set of 500 GB, the medium scaling size with a data set of 1 TB and a large scaling size with a data set of 10 TB.
  • The sensor workload is based on the YCSB and its default data model but with an operation distribution of 90% insert operations and 10% read operations that simulate a real-world Internet of Things (IoT) application. The workload is executed with the latest request distribution patterns. This workload is executed against the small database scaling size with a data set of 250 GB and against the medium scaling size with a data set of 500 GB.

The following summary sections capture key insights into how MongoDB and ScyllaDB compare across different workloads and database cluster sizes. A detailed description of results for all workloads and configurations is provided in the extended benchmark report.

Performance Comparison Summary: MongoDB vs. ScyllaDB

For the social workload, ScyllaDB outperforms MongoDB with higher throughput and lower latency for all measured configurations of the social workload.

  • ScyllaDB provides up to 12 times higher throughput.
  • ScyllaDB provides significantly lower (down to 47 times) update latencies compared to MongoDB.
  • ScyllaDB provides lower read latencies, down to five times.

For the caching workload, ScyllaDB outperforms MongoDB with higher throughput and lower latency for all measured configurations of the caching workload.

  • Even a small three-node ScyllaDB cluster performs better than a large 18-node MongoDB cluster.
  • ScyllaDB provides constantly higher throughput that increases with growing data sizes to up to 20 times.
  • ScyllaDB provides significantly better update latencies (down to 68 times) compared to MongoDB.
  • ScyllaDB read latencies are also lower for all scaling sizes and request distributions, down to 2.8 times.

For the sensor workload, ScyllaDB outperforms MongoDB with higher throughput and lower latency results for the sensor workload except for the read latency in the small scaling size.

  • ScyllaDB provides constantly higher throughput that increases with growing data sizes, up to 19 times.
  • ScyllaDB provides lower (down to 20 times) update latency results compared to MongoDB.
  • MongoDB provides lower read latency for the small scaling size, but ScyllaDB provides lower read latencies for the medium scaling size.

Scalability Comparison Summary: MongoDB vs. ScyllaDB

For the social workload, ScyllaDB achieves near-linear scalability with a throughput scalability of 386% (of the theoretically possible 400%). MongoDB achieves a scaling factor of 420% (of the theoretically possible 600%) for the uniform distribution and 342% (of the theoretically possible 600%) for the hotspot distribution.

For the caching workload, ScyllaDB achieves near-linear scalability across the tests. MongoDB achieves 340% of the theoretically possible 600%, and 900% of the theoretically possible 2400%.


For the sensor workload, ScyllaDB achieves near-linear scalability with a throughput scalability of 393% of the theoretically possible 400%. MongoDB achieves a throughput scalability factor of 262% out of the theoretically possible 600%.


Price-Performance Results Summary: MongoDB vs. ScyllaDB

For the social workload, ScyllaDB provides five times more operations/dollars compared to MongoDB Atlas for the small scaling size and 5.7 times more operations/dollars for the medium scaling size. For the hotspot distribution, ScyllaDB provides nine times more operations/dollars for the small scaling size and 12.7 times more for the medium scaling size.

For the caching workload, ScyllaDB provides 12 to 16 times more operations/dollars compared to MongoDB Atlas for the small scaling size, and 18-20 times more operations/dollars for the medium and large scaling sizes.

For the sensor workload, ScyllaDB provides 6 to 11 times more operations/dollars compared to MongoDB Atlas. In both the caching and sensor workloads, MongoDB is able to scale the throughput with growing instance/cluster sizes, but the preserved operations/dollars are decreasing.

Technical Nugget: Caching Workload, 12-Hour Run

In addition to the default 30-minute benchmark run, we also selected the large scaling size with the uniform distribution for a long-running benchmark of 12 hours.

For MongoDB, we selected the determined eight YCSB instances with 100 threads per YCSB instance and ran the caching workload in uniform distribution for 12 hours with a target throughput of 40 kOps/second.

The throughput results show that MongoDB provides the 40 kOps/s constantly over time as expected.

The P99 read latencies over the 12 hours show some peaks in the latencies that reach 20 ms and 30 ms and an increase of spikes after four hours runtime. On average, the P99 read latency for the 12-hour run is 8.7 ms; for the regular 30-minute run, it is 5.7 ms.

The P99 update latencies over the 12 hours show a spiky pattern over the entire 12 hours, with peak latencies of 400 ms. On average, the P99 update latency for the 12-hour run is 163.8 ms, while for the regular 30-minute run it is 35.7 ms.


For ScyllaDB, we selected the determined 16 YCSB instances with 200 threads per YCSB instance and ran the caching workload in uniform distribution for 12 hours with a target throughput of 500 kOps/s.

The throughput results show that ScyllaDB provides the 500 kOps/s constantly over time as expected.


The P99 read latencies over the 12 hours stay constantly below 10 ms, except for one peak of 12 ms. On average, the P99 read latency for the 12-hour run is 7.8 ms.

The P99 update latencies over the 12 hours show a stable pattern over the entire 12 hours, with an average P99 latency of 3.9 ms.


Technical Nugget: Caching Workload, Insert Performance

In addition to the three defined workloads, we also measured the plain insert performance for the small scaling size (500 GB), medium scaling size (1 TB) and large scaling size (10 TB) in MongoDB and ScyllaDB. It needs to be emphasized that batch inserts were enabled for MongoDB but not for ScyllaDB (since YCSB does not support it for ScyllaDB).

The following results show that for the small scaling size, the achieved insert throughput is on a comparable level. However, for the larger data sets, ScyllaDB achieves a three times higher insert throughput for the medium-size benchmark. But for the large-scale benchmark, MongoDB was not able to ingest the full 10 TB of data due to client-side errors, resulting in only 5 TB inserted data (for more details, see Throughput Results). However, ScyllaDB outperformed MongoDB by a factor of 5.

Technical Nugget: Caching Workload, Client Consistency Performance Impact

In addition to the standard benchmark configurations, we also ran the caching workload in the uniform distribution with weaker consistency settings. Namely, we enabled MongoDB to read from the secondaries (readPreference=secondarypreferred) and for ScyllaDB we set the readConsistency to ONE.

The results show an expected increase in throughput: 23% for ScyllaDB and for 14% MongoDB. This throughput increase is lower compared to the client consistency impact for the social workload since the caching workload is only a 50% read workload, and only the read performance benefits from the applied weaker read consistency settings. It is also possible to further increase the overall throughput by applying weaker write consistency settings.

Conclusion: Performance, Costs and Scalability

The complete benchmarking study comprises 133 performance and scalability measurements that compare MongoDB against ScyllaDB. The results show that ScyllaDB outperforms MongoDB for 132 of the 133 measurements.

For all of the applied workloads, namely caching, social and sensor, ScyllaDB provides higher throughput (up to 20 times) and better throughput scalability results compared to MongoDB. Regarding the latency results, ScyllaDB achieves P99 latencies below 10 ms for insert, read and update operations for almost all scenarios. In contrast, MongoDB only achieves P99 latencies below 10 ms for certain read operations while the insert and update latencies are up 68 times higher compared to ScyllaDB. These results validate the claim that ScyllaDB’s distributed architecture is able to provide predictable performance at scale (as explained in the benchANT technical comparison).

The scalability results show that both database technologies scale horizontally with growing workloads. However, ScyllaDB achieves nearly linear scalability while MongoDB shows a less efficient horizontal scalability. The ScyllaDB results were expected, to a certain degree, based on its multiprimary distributed architecture while a near-linear scalability is still an outstanding result. Also, for MongoDB the less efficient scalability results were expected due to the different distributed architecture (as explained in the benchANT technical comparison).

When it comes to price performance, the results show a clear advantage for ScyllaDB, with up to 19 times better price-performance ratio depending on the workload and data set size. Therefore, achieving comparable performance to ScyllaDB would require a significantly larger and more expensive MongoDB Atlas cluster.

In summary, this benchmarking study shows that ScyllaDB provides a great solution for applications that operate on data sets in the terabytes range and that require high throughput (over 50 kOps) and predictable low latency for read and write operations. This study does not consider the performance impact of advanced data models (such as time series or vectors) or complex operation types (aggregates or scans), which are subject to future benchmark studies. But for these aspects, the current results show that carrying out an in-depth benchmark before selecting a database technology will help you choose a database that significantly lowers costs and prevents future performance problems.

For complete setup and configuration details, additional results for each workload and a discussion of technical nuggets, see the extended benchmark.

YOUTUBE.COM/THENEWSTACK

Tech moves fast, don’t miss an episode. Subscribe to our YouTube
channel to stream all our podcasts, interviews, demos, and more.

Group
Created with Sketch.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.