Google Offer New Licensing and Pricing Options for Their Cloud Platform

MMS Founder
MMS Steef-Jan Wiggers

Article originally posted on InfoQ. Visit InfoQ

Google recently announced a new licensing option called Flex Agreements, allowing customers to migrate their workloads to the cloud with no up-front commitments. As part of this new licensing option, Google Cloud customers still get access to unique incentives (credits, discounts, services).

Previously, customers had to make an upfront commitment, such as signing a considerable multi-year Google Cloud agreement to benefit from certain incentives such as cloud credits or discounts based on monthly spending or committed use. Google’s Committed Use Discounts (CUDs) are an instance of such incentives, offering reduced prices in exchange for committing to utilizing a minimum amount of cloud resources for a specific period.

In addition to Flex Agreements, Google also intends to provide flexibility for its customers to choose features and functionality based on their stages of cloud adoption and the complexity of their business requirements. The company will release new product pricing editions—Standard, Enterprise, and Enterprise Plus—in their cloud portfolio in the upcoming quarters.

Enterprise Plus will grant access to various services, such as compute, storage, networking, and analytics, with advanced features, including high availability, multi-region support, regional failover, and disaster recovery.

Next to the top-tier Enterprise Plus, the Enterprise level provides customers with various features designed to accommodate workloads requiring high scalability, flexibility, and reliability. Finally, the Standard tier includes easy-to-use managed services that come with all essential capabilities and are designed to be cost-efficient. The services, including autoscaling, are optimized to meet the core workload requirements of customers.

Kelly Ducourty, a Vice President of Cloud GTM Strategy and Operations and SMB Sales at Google, stated in a LinkedIn post:

I am excited to announce that this week, we introduced Flex Agreements – a new type of agreement that offers targeted incentives for customers to migrate workloads and grow on Google Cloud, even without a commit. This is an exciting alternative for businesses that aren’t ready to make a multi-year commitment, and I can’t wait to see how our customers benefit!

In addition, Ducourty, and Joe Matz, vice president of Business Planning and Pricing, wrote in a Google blog post:

Every organization is on its own unique cloud journey. To help, we’re developing new ways for customers to consume and pay for Google Cloud services. We’re doing this by removing barriers to entry, aligning cost to consumption, and providing contractual and product flexibility.

Lastly, Holger Mueller, principal analyst and vice president at Constellation Research Inc., told InfoQ:

As the distanced #3 in the cloud, Google keeps pushing the big two competitors – second billing, 100% green, Anthos, verticals … All initiatives AWS and Microsoft had to respond. Now Kurian and team target where it matters most in an economy facing headwinds – the budget. New pricing that gives away discounts as you go is innovative and will be welcomed by enterprises. If it works, we will see when AWS and Azure respond.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Building Large-Scale Real-Time JSON Applications

MMS Founder
MMS RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

<!–


Join us free for the Data Summit Connect 2021 virtual event!
We have a limited number of free passes for our White Paper readers.

Claim yours now when you register using code WP21.


–>

The blog post explains how to build large-scale, real-time JSON applications using Aerospike’s NoSQL database. It discusses data modeling, indexing, and querying techniques, as well as how to optimize performance and scalability.

Download PDF

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


DataStax clones decentralized blockchains into centralized AstraDB database

MMS Founder
MMS RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

Cassandra NoSQL database supplier DataStax has announced an Astra Block service to clone and store Ethereum blockchains in its Astra DB cloud database. 

A blockchain is a decentralized distributed ledger recording transactions in a shared, immutable way in a peer-to-peer network with no central authority. Cryptographically protected chain linkages allow ledgers to be updated and viewed. An entire Ethereum blockchain can be cloned and stored in AstraDB, and is then updated in real time as new blocks are mined. DataStax claims this streamlines and transforms the process of building and scaling Web3 applications. It plans to expand this Astra Block service to other blockchains in the future, based on user demand.

Ed Anuff.

Ed Anuff, chief product officer at DataStax, said: “These distributed ledgers open up a whole new world of innovation similar to what we saw with digital content 20 years ago or social media 15 years ago – that is, the possibilities are only limited by our imaginations. Crypto currencies, non-fungible tokens, and smart contracts have drawn a lot of attention, but there are many other areas that will benefit from blockchain innovation: healthcare, real estate, IoT, cybersecurity, music, identity management, and logistics, to name a few.”

DataStax says its new service allows advanced querying and real-time analytics to be run at sub-second speeds, enabling developers to build blockchain-based functionalities into their applications. For example, developers can build applications with the capability to analyze any transaction from the entire blockchain history, including crypto or NFTs, for instant, accurate insights. 

Blockchain computations are compute-intensive and it can take seconds to to access blockchain data. The ability to analyze and track blockchain transactions is difficult, making many use cases untenable – particularly real-time ones. DataStax says that, according to Gartner’s 2022 Hype Cycle for Blockchain and Web3, “By 2024, 25 percent of enterprises will interact with their customers or partners using decentralized Web3 applications.” But developers have struggled to access this data, having to resort to hundreds of API connections, building their own indexers, and manually managing the data infrastructure.  

Astra Block removes these problems by, ironically, providing a centralized copy of the decentralized blockchain – thus subverting blockchain’s design philosophy.

Centralized vs decentralized

Peter Greiff, data architect leader, DataStaxl, said in answer to this point: “Astra Block is not about centralizing blockchain data but addressing some of the dilemmas that developers of Web3 distributed applications, or dApps, have with using those distributed ledgers – accessing that data is hard, slow and expensive.”

Grief reckons “Astra Block provides a solid base for a hybrid architecture for your dApp, using Astra Block for very low latency reads to access data and then writing info and data back to the blockchain distributed ledger directly. Astra Block manages the cloning of the chain into an operational database for dApp reads to be performed. Then, you’re only writing back the absolute minimum necessary to the chain. Those transactions might be slow and expensive compared to the transactions taking place in Astra Block, but you are still using that distributed approach where you need it.”

So: “DataStax operates blockchain nodes for its customers, and whenever a new block is mined, Astra Block detects that event, processes it, and does all the enrichment. Your Astra account is kept up to date with that data via built in CDC (change data capture) synchronization. Block is able to use CDC for Astra DB to propagate any further change events to your Astra Block database.” 

He asserted that: “You get all the features you would expect from a cloud-managed database, like a multi-tenant system, globally distributed, push button cloud clusters, intelligent auto scaling, and Data API Gateways, and then you can combine that with the fully distributed and trusted approach that blockchain or distributed ledger deployments can offer.”

Customer Elie Hamouche, founder of Clearhash, says: “AstraDB with Astra Block removes much of the complexity that comes with collecting and storing unbounded blockchain datasets to power real-time analytics workloads.”

The free tier of AstraDB offers a 20GB partial blockchain dataset to get started. The paid tier gives developers the ability to clone the entire blockchain, which is then updated in real time as new blocks are mined.

Microsoft and Ankr

Separately, Ankr and Microsoft have partnered to provide a reliable, easy-to-use blockchain node hosting service. The enterprise node deployment service will offer global, low-latency blockchain connections for any Web3 project or developer. Ankr and Microsoft intend making this service available soon through Microsoft’s Azure marketplace, providing a readily accessible gateway to blockchain infrastructure.

Customers will be able to launch enterprise-grade blockchain nodes with custom specifications for global location, memory, and bandwidth. When it’s launched, customers will be able to optimize data querying for high levels of speed and reliability on their choice of dozens of different blockchains with serverless functionality utilizing GeoIP, failovers, caching rules, and monitoring. They can track the performance of their nodes anytime, anywhere. Enterprise RPC clients can access usage data and advanced telemetry across 38+ blockchains.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


SapMachine Vitals Provides Operating System and JVM Statistics

MMS Founder
MMS Johan Janssen

Article originally posted on InfoQ. Visit InfoQ

SapMachine, a downstream distribution of OpenJDK, has introduced a new monitoring tool, SapMachine Vitals, that keeps a condensed history of operating system and JVM statistics. . The monitoring feature is activated by default and may be used to retrieve information such as heap usage, metaspace size, container memory statistics and limits, the number of classes loaded and the amount of threads spawned.

Vitals is a combination of sysstat operating system tools and Java Virtual Machine (JVM) statistics. The data is retrieved every ten seconds, by default, and buffered for ten days inside the JVM. The buffer contains more detailed information about the last hour. While the overhead of Vitals is relatively low with about 300 Kb of memory consumption and little CPU consumption, it’s still possible to disable the feature with the -XX:-EnableVitals command.

The jcmd VM.vitals command may be used to create the report for a running application. The report is also part of the hs_err_pid.log which the JVM generates as part of an abnormal exit. Lastly, when the JVM is started with the -XX:+PrintVitalsAtExit argument, the Vitals report is printed to stdout before the exit is completed. Alternatively, the -XX:+DumpVitalsAtExit argument may be used to create the following files: sapmachine_vitals_pid.txt and sapmachine_vitals_pid.csv. The names of the files may be customized with the argument -XX:VitalsFile=.

The various suboptions of VM.vitals are displayed with the help argument:

jcmd 11818 help VM.vitals
…
Options: (options must be specified using the  or = syntax)
scale : [optional] Memory usage in which to scale. Valid values are: k, m, g 
    (fixed scale) or "dynamic" for a dynamically chosen scale. (STRING, dynamic)
csv : [optional] csv format. (BOOLEAN, false)
no-legend : [optional] Omit legend. (BOOLEAN, false)
reverse : [optional] Reverse printing order. (BOOLEAN, false)
raw : [optional] Print raw values. (BOOLEAN, false)
now : [optional] Sample now values (BOOLEAN, false)

The Vitals report has three subsections: system including container stats, process stats and JVM stats. Each subsection contains a list of the metrics and their abbreviations and at the end of the report each metric is listed in a column and the measurements are displayed in the rows. In order to improve readability, we combine the metrics and measurements in this article per subsection from the jcmd 11818 VM.vitals command.

First the list of system and container metrics and their corresponding abbreviations is displayed:

------------system------------
avail: Memory available without swapping [host] [krn]
comm: Committed memory [host]
crt: Committed-to-Commit-Limit ratio (percent) [host]
swap: Swap space used [host]
si: Number of pages swapped in [host] [delta]
so: Number of pages pages swapped out [host] [delta]
p: Number of processes
t: Number of threads
tr: Number of threads running
tb: Number of threads blocked on disk IO
cpu-us: CPU user time [host]
cpu-sy: CPU system time [host]
cpu-id: CPU idle time [host]
cpu-st: CPU time stolen [host]
cpu-gu: CPU time spent on guest [host]
cgroup-lim: cgroup memory limit [cgrp]
cgroup-slim: cgroup memory soft limit [cgrp]
cgroup-usg: cgroup memory usage [cgrp]
cgroup-kusg: cgroup kernel memory usage (cgroup v1 only) [cgrp]

The measurements related to the system and container metrics:

                  	---------------------------------system----------------------------------
                                                           	-----cpu------ -----cgroup------
                  	avail comm  crt swap si so p   t   tr tb us sy id st gu lim slim usg kusg
2023-02-14 19:51:57   25,7g 11,4g  64   0k  0  0 305 804  1  0  3  1 95  0  0   	0k 64m	 
2023-02-14 19:51:47   25,7g 11,4g  64   0k  0  0 304 802  5  0  5  1 94  0  0   	0k 64m	 
2023-02-14 19:51:37   25,7g 11,3g  64   0k  0  0 304 800  2  0  5  2 92  0  0   	0k 64m    
2023-02-14 19:51:27   25,7g 11,4g  64   0k  0  0 305 800  4  0  5  2 92  0  0   	0k 64m    
2023-02-14 19:51:17   25,7g 11,4g  64   0k  0  0 305 804  2  0  9  2 88  0  0   	0k 64m    
2023-02-14 19:51:07   23,8g 13,4g  76   0k   	 308 915  2  0                  	0k 26m   

The list of process metrics and their corresponding abbreviations is displayed:

-----------process------------
virt: Virtual size
rss-all: Resident set size, total
rss-anon: Resident set size, anonymous memory [krn]
rss-file: Resident set size, file mappings [krn]
rss-shm: Resident set size, shared memory [krn]
swdo: Memory swapped out
cheap-usd: C-Heap, in-use allocations (may be unavailable if RSS > 4G) [glibc]
cheap-free: C-Heap, bytes in free blocks (may be unavailable if RSS > 4G) [glibc]
cpu-us: Process cpu user time
cpu-sy: Process cpu system time
io-of: Number of open files
io-rd: IO bytes read from storage or cache
io-wr: IO bytes written
thr: Number of native threads

The measurements related to the process metrics:

--------------------------process--------------------------
  	-------rss-------  	-cheap-- -cpu- ----io-----    
virt  all anon file shm swdo usd free us sy of rd   wr  thr
10,9g 85m  50m  35m  0k   0k 24m   9m  0  0  6  76k  0k  20
10,9g 85m  50m  35m  0k   0k 24m   9m  0  0  6  76k  0k  20
10,9g 85m  50m  35m  0k   0k 24m   9m  0  0  6  76k  0k  20
10,9g 85m  50m  35m  0k   0k 24m   9m  0  0  6  76k  0k  20
10,9g 85m  50m  35m  0k   0k 24m   9m  1  0  6 724k <1k  20
10,6g 36m  12m  24m  0k   0k 21m   2m    	 5           18

The list of JVM metrics and their corresponding abbreviations is displayed:

-------------jvm--------------
heap-comm: Java Heap Size, committed
heap-used: Java Heap Size, used
meta-comm: Meta Space Size (class+nonclass), committed
meta-used: Meta Space Size (class+nonclass), used
meta-csc: Class Space Size, committed [cs]
meta-csu: Class Space Size, used [cs]
meta-gctr: GC threshold
code: Code cache, committed
nmt-mlc: Memory malloced by hotspot [nmt]
nmt-map: Memory mapped by hotspot [nmt]
nmt-gc: NMT "gc" (GC-overhead, malloc and mmap) [nmt]
nmt-oth: NMT "other" (typically DBB or Unsafe.allocateMemory) [nmt]
nmt-ovh: NMT overhead [nmt]
jthr-num: Number of java threads
jthr-nd: Number of non-demon java threads
jthr-cr: Threads created [delta]
jthr-st: Total reserved size of java thread stacks [nmt] [linux]
cldg-num: Classloader Data
cldg-anon: Anonymous CLD
cls-num: Classes (instance + array)
cls-ld: Class loaded [delta]
cls-uld: Classes unloaded [delta]

The measurements related to the JVM metrics:

----------------------------------jvm----------------------------------
--heap--- ---------meta----------  	--jthr--- --cldg-- -----cls-----
comm used comm used csc  csu gctr code num nd cr num anon num  ld   uld
508m  23m   9m   9m   1m  1m  21m   7m  11  1  0  36   32 2515	0   0
508m  23m   9m   9m   1m  1m  21m   7m  11  1  0  36   32 2515	0   0
508m  23m   9m   9m   1m  1m  21m   7m  11  1  0  36   32 2515	0   0
508m  23m   9m   9m   1m  1m  21m   7m  11  1  0  36   32 2515	0   0
508m  23m   9m   9m   1m  1m  21m   7m  11  1  2  36   32 2515 1841 0
508m   6m 448k 217k 128k  5k  21m   7m  11  1      4    1  674 

The Vitals feature is only available as part of the SapMachine builds of OpenJDK. Thomas Stuefe, JVM engineer at SAP, indicated that the feature was not accepted upstream as its features overlap with Java Flight Recorder. More information can be found on GitHub and in the SapMachine Vitals blog.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Jarviz Delivers Inspection and Insights to JAR Files

MMS Founder
MMS Shaaf Syed

Article originally posted on InfoQ. Visit InfoQ

A new Java JAR inspection and insights tool, called Jarviz, helps developers find different bytecode versions in a JAR, querying it for attributes, services, and more. Sonatype statistics show that there are 517,231 unique artifacts on Maven Central. At the same time, a new version of Java is released by the OpenJDK community every six months, with each LTS release supported for a minimum of two years.

A Java application could use a lot of different libraries, and those libraries could depend on many others. It can be challenging to keep track of all the different artifacts a Java application depends on relative to the various Java versions it can run on. Jarviz sets out to solve this problem by providing a user-friendly CLI to inspect different JAR files and their dependencies.

Executing the following command would check which classes in the JAR file use a particular bytecode version. In this example, the command will show that Java 11 has bytecode version 53:

    
$ jarviz bytecode show --gav com.fasterxml.jackson.core:jackson-core:2.14.1 --bytecode-version 53 --details

Versioned classes 9. Bytecode version: 53 total: 1
module-info
    

Or check all the different bytecode versions that exist in the JAR file:

    
$ jarviz bytecode show --gav com.fasterxml.jackson.core:jackson-core:2.14.1

Unversioned classes. Bytecode version: 52 total: 160
Versioned classes 9. Bytecode version: 53 total: 1
    

InfoQ interviewed Jarviz creator Andres Almiray on the occasion of the recent 0.2.0 release to discuss Jarviz and its future.

InfoQ: What was the inspiration to start this project?

Andres Almiray: I write CLI tools and libraries as a hobby. Of the many aspects that I have to take care of is binary compatibility. We know that tools and libraries may bring additional dependencies, which in turn must conform to the binary compatibility rules set in place. Say, for example, that a Maven plugin sets its bytecode baseline to Java 8 (bytecode 52). This means none of its dependencies must exceed that number. If that were to be the case, then consumers of the plugin would be forced to upgrade to the next compatible Java version, and sometimes that cannot happen for various reasons.

Furthermore, classes have the final say no matter what the source or the documentation may state. The build might pull the wrong dependency without knowing, or a shaded dependency may bring invalid classes. There are other ways in which a class may find its way into a JAR and break your bytecode expectations. This is why inspecting JAR files is the only way to be sure.

Jarviz was created to solve these issues in mind. Once the plumbing mechanism was put in place to inspect a JAR file, it became obvious that additional data could be queried, for example, if a given manifest entry were to be available and extract its value or the names of declarative services (typically found inside /META-INF/services) and their given implementations.

InfoQ: Does the Java community need more tools like Jarviz?

Almiray: I think so, yes. I would love to see a resurgence of command-line tools written in Java and/or targeting the JVM ecosystem. GraalVM Native Image makes it simple to build single platform-specific executables if one wants to run said commands without a JVM or if startup time and memory footprint prove to be performance drivers.

Alternatively, these tools could be bundled with a minimal Java Runtime crafted using jlink. There is no shortage of options for packaging these kinds of tools. The vastness and reach of Java libraries found at Maven Central, paired with the richness of the Java Standard Library, should make it easier to implement a particular use case.

InfoQ: Are there any specific features that you would like to implement in the near future?

Almiray: Yes, specifically handling multi-resource JARs and Java modules. In this regard, recent versions of the Java SDK (Java 17 and onwards) enhance the capabilities provided by its tools, such as the JAR tool. This is fine as long as you have access to a Java 17 SDK, but if you are stuck with older versions well, then you may be out of luck. Jarviz sets Java 11 as its baseline, thus making it easier for those running older JVMs.

Moreover, reporting at the moment occurs using a plain-text format. It could be enhanced to support additional formats such as JSON, XML, YAML, markdown tables, or others.

InfoQ: What is next for Jarviz?

Almiray: Besides new features, the next step would be offering more options to execute the tool. If you look at the version 0.1.0 release, you’ll notice there are three types of deliverables: universal (plain zip/tar) and a tool-provider as a single executable JAR which requires a Java 11 runtime, and standalone, which includes a bundled Java runtime for 8 different platforms Additional deliverables could be native executables crafted with GraalVM Native Image.

Jarviz also offers more advanced features, such as inspecting or querying the contents of the MANIFEST.MF file based on attributes. In most cases, it is important to see which services the JAR file depends on without needing to unarchive it and search through it.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


How Discord Migrated Trillions of Messages to ScyllaDB – The New Stack

MMS Founder
MMS RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

<meta name="x-tns-categories" content="Data“><meta name="x-tns-authors" content="“>

How Discord Migrated Trillions of Messages to ScyllaDB – The New Stack

Modal Title

2023-02-21 04:00:46

How Discord Migrated Trillions of Messages to ScyllaDB

When popular social networking service Discord migrated its messages cluster from Cassandra to ScyllaDB, it reduced message latencies from 200 milliseconds to 5 milliseconds.


Feb 21st, 2023 4:00am by


Featued image for: How Discord Migrated Trillions of Messages to ScyllaDB

Image via Shutterstock

Popular social networking service Discord migrated its messages cluster from the open source Cassandra database system to distributed data store ScyllaDB and reduced latencies from 200 milliseconds to 5 milliseconds.

The ScyllaDB Summit 2023 opened last Wednesday with a talk, “How Discord Migrated Trillions of Messages from Cassandra to ScyllaDB,” given by Discord Senior Software Engineer Bo Ingram. It’s not a question per se, but the answer is a lot of work. There was creative engineering in the software and hardware that brought Discord from Cassandra’s hot partitions to ScyllaDB’s consistent latencies.

In order to improve efficiency and reduce latencies, Discord engineers built a data service library to streamline queries going to and from the database. This would limit the number of requests putting stress on the database infrastructure.

They also created a “Superdisk” made of persistent disks and NVMe SSDs to optimize their hardware for both speed and efficiency. Lastly, modifications were made to ScyllaDB to increase compatibility with the Superdisk. The end result is consistent latencies in a highly trafficked cluster.

Cassandra was working to a point, but the bigger Discord grew, the more difficult it became to support the NoSQL store. Discord experienced cascading latencies caused by Cassandra’s hot partitions caused cascading latencies when high traffic was driven to one partition. Garbage collection was another issue. “We really don’t like garbage collection,” Ingram confirmed.

ScyllaDB caught the attention of Ingram and his team. “We were looking kind of jealously at ScyllaDB. We were very curious,” he said. ScyllaDB is written in C++, meaning there is no garbage collection which, to Ingram, “sounds like a dream.”

In 2022, Discord migrated most of its data to ScyllaDB. The core messaging database was the holdout because “we didn’t want to learn all our lessons with our messages database,” Ingram said. After gaining more experience with and learning how to best to optimize ScyllaDB, messages would move on over too.

Here are some of the components:

The Data Service Library

Building the data service library was the first step in simplifying the messages workflow. Written in Rust, it sits between the API and Cassandra and communicates via gRPC. The Data Service Library protects the database and serves its main purpose as request coalescing.

Request coalescing reduces multiple requests for the same message into a single database query by using a worker thread. When a giant server with tons of users makes a big announcement that pings tons of people who then open their apps/computers to check out the details, a worker thread is spun up in response to the requests that come in. The worker thread queries the database once and returns the message to all subscribed requests.

The upstream of request coalescing is consistent routing, which is what makes request coalescing possible. In the image above, Discord uses Channel ID, but in reality any routing is available. All requests that go to a specific channel go via the same message instance. So in the big message scenario: same message, same channel. Same instance equals one query.

Ingram credits Rust with making this happen. Rust, being the performant language that it is, “lets us handle this use case without breaking a sweat. It’s awesome,” he said.

The Superdisk

The disk contenders: the faster NVMe SSDs with higher risk of quorum loss and downtime vs. slower persistent disks, which could reignite cascading latencies. Neither were a good solution alone, but together they make the Superdisk!

It started with a 1.5TB persistent disk which was then striped together with 1.5TB of local NMVe SSD RAID0. The RAID0 provided a large logical volume equivalent in size to the persistent disk. Discord then took the logical disk (RAID0) and combined it with the persistent disk via a RAID1 array. The RAID1 provided mirroring functionality so data on the RAID0 local SSDs matched the data on the persistent disk.

The Superdisk in Action

The persistent disks are marked as write mostly and reads go to the RAID1 array. Discord predominantly serves from the RAID0 SSDs.

In the event of a host error, it’s safe to assume the loss of the local NVMe SSDs, but that’s OK now because disk mirroring provides a copy. Raid will execute a recovery operation to bring the SSDs back to parity with the persistent disks once they’re back up and running.

It didn’t work perfectly out of the box. Discord worked with Scylla to implement Duplex IO, which allowed them to split reads and writes onto their own channels. The write intent bitmap was turned off.

Time for the Migration

The migration plan for the cassandra-messages database was straightforward — use ScyllaDB for recent data and migrate historical data behind it. Discord tuned ScyllaDB’s spark migrator and got ready to migrate. By this time (May 2022), Ingram had enough with what he called Cassandra “firefighting,” so when the ETA for the cassandra-messages migration turned out to be three months, he decided it was three months too long.

Faster meant a rewrite of ScyllaDB’s data migrator in Rust. The rewrite took Ingram and two colleagues about a day, and it was faster! The updated data migrator sent 3.2 million records per second. Three months was reduced to just nine days.

Data verification validation was performed once the data was sent over and sampled with random reads to make sure the data was consistent, equivalent and correct.

Almost One Year Later, No Regrets

ScyllaDB is more efficient and because of the higher storage density, it uses half the nodes of the Cassandra cluster. ScyllaDB’s better disk efficiency is measured in a 53% reduction in disk utilization. Latencies in Cassandra’s message cluster fell somewhere in the wide range of 5 to 500 milliseconds. With ScyllaDB, Ingram is sure “it’s five milliseconds on the dot. No microseconds… I’m going to be specific here.”

Group
Created with Sketch.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Stanford Researchers Develop Brain-Computer Interface for Speech Synthesis

MMS Founder
MMS Anthony Alford

Article originally posted on InfoQ. Visit InfoQ

Researchers from Stanford University have developed a brain-computer interface (BCI) for synthesizing speech from signals captured in a patient’s brain and processed by a recurrent neural network (RNN). The prototype system can decode speech at 62 words-per-minute, 3.4x faster than previous BCI methods.

The system was described in a paper published on bioRxiv. Working with a patient who lost speech ability due to amyotrophic lateral sclerosis (ALS), the team used microelectrodes implanted in the patient’s brain to capture neural activity signals generated when the patient attempted to speak. These signals were passed to a RNN, specifically a gated recurrent unit (GRU) model, which was trained to decode the neural signals into phonemes for speech synthesis. When trained on a limited vocabulary of 50 words, the system achieved a 9.1% error rate, and a 23.8% error rate on a 125k word vocabulary. According to the researchers:

[We] demonstrated a speech BCI that can decode unconstrained sentences from a large vocabulary at a speed of 62 words per minute, the first time that a BCI has far exceeded the communication rates that alternative technologies can provide for people with paralysis…Our demonstration is a proof of concept that decoding attempted speaking movements from intracortical recordings is a promising approach, but it is not yet a complete, clinically viable system.

The use of deep learning models to interpret human brain activity is an active research area, and InfoQ has covered several BCI projects involving assistive devices. Many of these use sensors that are implanted in a patient’s brain, because these provide the best signal quality; in 2019 InfoQ covered a system developed by Meta which uses such signals to allow users to “type” by imagining themselves speaking. InfoQ has also covered systems that use external or “wearable” sensor, such as the one developed by Georgia Tech in 2021, which allows users to control a video game by imagining activity.

The Stanford system uses four microelectrode arrays implanted in the patient’s ventral premotor cortex and Broca’s area. To collect data for training the RNN, the patient was given a few hundred sentences each day which she “mouthed,” or pantomimed speaking, which generated neural signals which were captured by the microelectrodes. Overall, the team collected 10,850 sentences. Using “custom machine learning methods” from the speech recognition domain, the researchers trained the RNN to output a sequence of phonemes.

To evaluate the system, the team had the patient mouth sentences that were never used in training; the test sentences included some using only the 50 word vocabulary as well as the 125k one. The researchers also experimented with adding a language model to the decoder, which improved error rate from 23.8% to 17.4%, and with reducing the time between training and testing the RNN, to eliminate the day-to-day changes in neural activity. Their conclusion was that the system could see “substantial gains in performance” with further work on language modeling and more robust decoding techniques.

Lead researcher Frank Willett posted about the work on Twitter and answered several questions. In response to a question about whether the RNN predicted the next word that would be spoken, Willett replied

No next word prediction – the language model simply outputs the best explanation of all RNN outputs produced so far.

Willett also said that the team would publish their code and data after the work is “published in a peer-reviewed journal.”

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


MicroStream Becomes Eclipse Foundation Member

MMS Founder
MMS Johan Janssen

Article originally posted on InfoQ. Visit InfoQ

MicroStream, a Java object-graph persistence framework, has announced their participation in the Eclipse Foundation as a Committer Member. MicroStream offers Micro Persistence allowing for low-latency and in-memory data processing. Java objects and documents can be stored in various storage solutions such as: AWS S3, Hazelcast, Kafka, MongoDB, Redis and various SQL databases.

The Eclipse Foundation offers four membership levels: strategic, contributing, associate and committer. Committer members are developers of the Eclipse projects and are allowed to commit changes to the project source code. Committer members are also represented on the board of directors and may stand for election as a representative each first quarter of the year.

Over the past twenty years, The Eclipse Foundation has created and maintained hundreds of open source and open specification projects and collaborations. Traditionally, when organizations collaborate, they often create a new association. However, The Eclipse Foundation Working Groups offer members a “Foundation in a Box” designed to speedup and increase the success of the collaboration. Members of the Eclipse Foundation have an active role in the Eclipse projects and Working Groups. The Eclipse Foundation lists five reasons why they believe a working group is a better alternative to an association: reduced risk due to a proven governance and legal framework, improved time to market, vendor neutrality, no boundaries for collaboration and increased ecosystem health and visibility due to the established brand and infrastructure. Examples of some of the current working groups include: Adoptium, The Eclipse IDE, Jakarta EE and MicroProfile.

MicroStream has been very active in the Java community as they have recently joined the Micronaut Foundation, a non-profit organization that aims to evangelize and define the future of the Micronaut framework, as a Silver Sponsor. MicroStream is integrated via the Micronaut MicroStream module, but they also provide integrations with Helidon, Spring Boot and CDI. These close collaborations simplify the integration of MicroStream in new and existing applications.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Podcast: Colin McCabe Updates on Apache Kafka KRaft Mode

MMS Founder
MMS Colin McCabe

Article originally posted on InfoQ. Visit InfoQ

Subscribe on:






Introduction

Wes Reisz: In October of 2020, we did a podcast with Colin McCabe and Jason Gustafson about KIP-500 or the removal of the ZooKeeper dependency in Kafka. Today, we catch up with Colin on how the project is going ,and what’s planned for the current release of Kafka.

Hi, my name is Wes Reisz and I’m a technical principal with Thoughtworks and co-host of the InfoQ Podcast. I also have the privilege of chairing the San Francisco edition of the QCon software conference held in October – November of each year. If you enjoy this podcast and are looking to meet and discuss topics, like today, is directly with engineers working on these type of problems, just like Colin, the next QCon will be held in London, March 27 to 28th. There you can meet and hear from software leaders, like Leslie Miley, Radia Perlman, Crystal Hirschorn, Justin Cormack, Gunnar Morling, on topics ranging from computer networks, web assembly, team topologies, and of course, the meat and potatoes of every single QCon, modern software architectures. If you’re able to attend, please stop me and say hi.

As I mentioned on today’s podcast, I’m speaking with Colin McCabe. Colin is a principal engineer at Confluence, where his focus is on performance and scalability of Apache kafka. Most recently, he’s been center stage with the implementation of KRaft. KRaft is kafka plus Raft, and it’s where kafka removes that dependency to ZooKeeper by implementing a new quorum controller service based on that Raft protocol. Today, we catch up with the progress of the project, so we talk a lot about community reception, we talk about the upgrade path for shops that are currently on ZooKeeper, and we’ll be moving to a KRaft version, what that looks like and what you need to do. We talk about the overall plan for deprecation, and we talk about lessons learned on the implementation of KRaft.

As always, thank you for joining us on you at another edition of the InfoQ Podcast. Colin, welcome to the InfoQ Podcast.

Colin McCabe: Thanks. It’s great to be here.

Wes Reisz: I think last time we spoke was maybe a year ago, maybe a little bit over a year ago, but it was pretty much when KIP-500 was first accepted, which is KRaft mode. It’s basically replacing ZooKeeper and putting kafka with Raft, KRaft mode into kafka, so self-managed metadata quorum. I think that was available with 3.3.1. How’s the reception been with the community?

Colin McCabe: Yeah. People have been tremendously excited about seeing this as production ready. We’ve been doing some testing ourselves. We’ve heard of some people doing testing. It takes a while to get things into production, so we don’t have a huge number of production installations now. One of the most exciting things is starting to bring this to production and starting to get our first customers, so that’s really exciting.

What do you see as the distribution around versions of Kafka?

Wes Reisz: What do you see as the distribution of people on versions with kafka? Do you see people in a certain area or they tend to stay up-to-date?

Colin McCabe: Yeah, that’s a good question. I don’t have ultra precise numbers, and you’re right. I’m not sure if I’d be allowed to share them if I did. I will say that in general, we have sort of two sets of customers right now. We have our cloud customers and we have our on-premise customers. Our on-premise customers tend to be a few versions behind. Not all of them, but many of them are in the cloud. We’re very up-to-date. So, there’s a little bit of the divide there, and I don’t know the exact distribution. One of the interesting things about the last few years, of course, has been so many more things are moving to cloud, and of course kafka is huge in the cloud now, so we’re starting to see people get a lot more up-to-date just because of that.

What is KRaft mode?

Wes Reisz: Let’s back up for a second. What is KRaft mode? Improves partition scalability, it improves resiliency, it simplifies deployments with kafka in general. What is KRaft? What was the reason behind going to KRaft?

Colin McCabe: Basically, KRaft is the new architecture for kafka metadata. There’s a bunch of reasons which you touched on just now for doing this. The most surface level one is, okay, now I have one service. I don’t have two services to manage. I have one code base, I have one configuration language, all that stuff. But I think the more interesting reason, as a designer, is this idea that we can scale the metadata layer by changing how we do it, and we’re changing from a paradigm where ZooKeeper was a multi writer system and basically you never really knew if your metadata was up-to-date. So, every operation you did had to kind of be a compare and swap, sort of, well, if this, then that. We ended up doing a lot of round trips between the controller and ZooKeeper, and of course all of that degrades performance, it makes it very difficult to do certain optimizations.

With KRaft, we have a much better design in the sense that we believe we can push a lot more bandwidth through the control plane, and we also have a lot more confidence in what we do push, because we have this absolute ordering of all the events that happen. So, the metadata log is a log, it has an order, so you know where you are. You didn’t know that in ZooKeeper mode. You got some metadata from the controller and you knew a few things about it, like you knew when you got it, you knew the controller epoch, some stuff like that. But you didn’t really know if what you had was up-to-date. You couldn’t really diff where you were versus where you are now.

In production, we see a lot of issues where people’s metadata gets out of sync, and there’s just no really good way to trace that, because in ZooKeeper, we’re in the land of just like I got an RPC and I changed my cache. Okay, but I mean, we can’t say for sure that it arrived in the same order on every broker. Basically, I would condense this into a few bullet points. This manageability thing like, hey, one service, not two. Scalability, and also just the ability to, what I would call, safety. The safety of the design is greater.

What is the architectural change that happens to Kafka with kraft mode?

Wes Reisz: If you’re using ZooKeeper today, what is the architectural change that will be present with KRaft?

Colin McCabe: Yeah. If you’re using ZooKeeper today, your metadata will be stored in ZooKeeper. For example, if you create a topic, there will be Z nodes in ZooKeeper that talk about that topic, which replicas are in sync, what is the configuration of this topic. When you move to KRaft mode, instead of being stored in ZooKeeper, that metadata will be stored in a record or series of records that is in the metadata log. Of course, that metadata log is managed by kafka itself.

Wes Reisz: Understood. Nice. You mentioned a little bit about scalability. Are there any performance metrics, numbers you can talk to?

Colin McCabe: Yeah. We’ve done a bunch of benchmarks, and we’ve gotten some great improvements in controller shutdown, and we’ve gotten some great improvements in number of partitions, and we are optimistic that we can hopefully 10X the number of partitions we can get at least.

Wes Reisz: 10X. Wow.

Colin McCabe: Yeah. I think our approach to this is going to be incremental and based on numbers and benchmarking. Right now, we’re in a mode where we’re getting all the features there, we’re getting everything stable, but once we shift into performance mode, then I think we’ll really see huge gains there.

What does the upgrade path look like for moving to KRaft?

Wes Reisz: You mentioned just a little while ago about moving to KRaft mode from ZooKeeper. What does that upgrade path look like?

Colin McCabe: Yeah, great question. The first thing you have to do is you have to get your cluster on a so-called bridge release, and we call them that because these bridge releases can act as a bridge between the ZooKeeper versions and the KRaft versions. This is a little bit of a new concept in kafka. Previously, kafka supported upgrading from any release to any other release, so it was full upgrade support. Now with KRaft, we actually have limited the set of releases you can do to upgrade to KRaft. There’s some technical reasons why that limitation needs to be in place.

But anyway, once you’re on this release and once you’ve configured these brokers so that they’re ready to upgrade, you can then add a new controller quorum to the cluster, then that will take over as the controller. At that point, you’ll be running in this hybrid mode where your controllers are upgraded but not all of your brokers are upgraded. Your goal in this mode is probably to roll all of the brokers so that you’re now in a full KRaft mode. And then, when you’re in full KRaft mode, everybody’s running just as they would in KRaft mode, but with one difference, which is that you’re still writing data to ZooKeeper. We call this dual write mode, because we’re writing the metadata to the metadata log, but also writing it to ZooKeeper.

Dual write mode exists for two reasons. One of them is there’s a technical reason why while we’re doing the rolling upgrade, you need to keep updating ZooKeeper so that these old brokers can continue to function while the cluster’s being rolled. The second reason is it gives you the ability to go back if something goes wrong, you still have your metadata in ZooKeeper, so you can then go back and you’re like, all right, we’re going to go back from the upgrade and you have your metadata. Of course, we hope not to have to do that, but if we do, then we will.

Wes Reisz: What I heard though is it’s an in place upgrade so you don’t have to bring it offline or anything to be able to do your upgrade?

Colin McCabe: Yeah, absolutely. It’s a rolling upgrade and it doesn’t involve taking any downtime. This is very important to us, of course.

Wes Reisz: Yeah, absolutely. That’s awesome. Back to the bridge release for a second. The bridge release is just specifically to support ZooKeeper to KRaft. It isn’t a new philosophy that you’re going to have bridge releases for upgrades going forward, you’re still going to maintain being able to upgrade to any point, is that still the plan?

Colin McCabe: Yeah. I mean, I think that going forward, we’ll try to maintain as much upgrade compatibility as we can. This particular case was a special case, and I don’t think we want to do that in the future if we can avoid it.

Wes Reisz: If I’m right, I believe we’re recording this right around the time of the code freeze for 3.4. What’s the focus of 3.4? Is that going to be that upgrade path you were talking about?

Colin McCabe: The really exciting thing to me obviously is this upgrade path, and this is something users have been asking for a long time. Developers, we’ve wanted it, and so it’s huge for us I think. Our goal for this release is to get something out there and get it into this pre-release form. There may be a few rough edges, but we will have this as a thing we can test and iterate on, and that’s very important, I think.

Any surprises on the implementation of KRaft?

Wes Reisz: Kind of going back to the implementation of Raft with Kafka. Anything surprising during that implementation that kind of comes to mind or any lessons that you might want to share?

Colin McCabe: Yeah, that’s a great question. There’s a few things that I thought were interesting during this development, and one of them is we actually got a chance to do some formal verification of the protocol, which is something I hadn’t really worked on before, so I thought that was really cool. The ability to verify the protocol with TLA+ was really interesting,

Wes Reisz: Just to make sure we level set, what does formal verification mean when it comes to something like Raft?

Colin McCabe: TLA+ is a form of model checking. You create this model of what the protocol is doing, and then you have a tool to actually check that it’s meeting all of your invariance, all your assumptions. It sort of has its own proof language to do this. I think TLA+ has been used to verify a bunch of stuff. Some people use it to verify their cloud protocols. Some of the big cloud providers have done that. I thought it was really cool to see us doing that for our own replication protocol. To me, that was one of the cool things.

Another thing that I thought was really interesting during this development process was I learned the importance of deploying things often. So, we have a bunch of test clusters internally, soak clusters, and one thing that I learned is that development just proceeded so much more smoothly if I could get these clusters upgraded every week and just keep triaging what came through every week.

When you have a big project, I guess you want to attack it on multiple fronts. The testing problem, I mean. You want to have as many tests as you can that are small and self-contained as many unit tests. You want to have as much soak testing going on as you can. You want to have as many people trying it. So, I really thought that I learned a few things about how to keep a big project moving that way, and it’s been really interesting to see that play out. It’s also been interesting that the places where we spend our time have not exactly been where we thought. We thought, for example, we might spend a lot of time on Raft itself, but actually that part was done relatively quickly, relatively the whole process. The initial integration was faster than I thought, but then it took a bit longer than I thought to work out all the corner cases. I guess maybe that wasn’t so unexpected after all.

What languages are used to implement Kafka and how did you actually implement KRaft?

Wes Reisz: Kafka itself is built in Scala now, right? Is that correct?

Colin McCabe: Well, kafka has two main languages that we use. One of them is Java and the other is Scala, and the core of kafka is Scala, but a lot of the new code we’re writing is Java.

Wes Reisz: Did you leverage libraries for implementing Raft? How did you go about the implementation with Raft?

Colin McCabe: Well, the implementation of Raft is our own implementation, and there’s a few reasons why we decided to go this path. One of them is that kafka is first and foremost a system for managing logs. If kafka outsources its logs to another system, it’s like, well, maybe we should just use that other system. Secondly, we wanted the level of control and the ability to inspect what’s going on that we get from our own implementation. I guess thirdly is we had a lot of the tools laying around. For example, we have a log layer, we have fetch requests, we have produced requests, so actually we were able to reuse a very large amount of that code.

What are some of the implications of raft with multi-cluster architectures? 

Wes Reisz: KRaft mode changes how leaders are elected in the cluster and the metadata or metadata stored. What does it mean or does it mean anything for multi cluster architectures?

Colin McCabe: Yeah, that’s a good question. I think when a lot of people ask about multi cluster architectures, they’re really asking, “I have N data centers. Now, where should I put things?” This is an absolutely great question to ask. One really typical setup is, hey, I have everything in one data center. Well, that one’s easy to answer. Now, if you have three data centers, then I would tell you, well, you put one controller in each data center and then you split your brokers fairly evenly.

This is really no different than what we did with ZooKeeper. With ZooKeeper, we would have one ZooKeeper node in each cluster, therefore if a cluster goes down, you still have your majority. Where I think it generally gets interesting, if interesting is the right word, is when you have two data centers. A lot of people find themselves in that situation because maybe their company has invested in two really big data centers and now they’re trying to roll out kafka as a new technology and they want to know what to do. It’s a really good question. The challenge, of course, is that the Raft protocol is all majority based. When you have only two data centers, if you lose one of those data centers, you have a chance of losing the majority, because if you had two nodes in one and one node in another, well then the data center with two nodes is a single point of failure.

People generally find workarounds for this kind of problem, and this is what we saw with ZooKeeper. People would, for example, they would have multiple data centers but they would have some way of manually reestablishing the quorum when they lost the data center that had their majority. There’s a bunch of different ways to do that. The one that I’ve always liked is putting different weights on different nodes. The Raft protocol actually does not require that every node have the same weight. You could have nodes that have different weights, so that gives you the flexibility to change the weight of a node dynamically, and then the node can then say, okay, well now, I’m the majority. Even though I’m a single node, I’m the majority. That allows you to get back up and running after a failure.

There’s also other customers we’ve seen who actually, they have two big data centers but they want to have a third place for things to be. So, they hire out a cloud provider and they’re like, “Well, we only have this one ZooKeeper node in the cloud. Everything else else is on premise, but this is what we’ve got.” That can be a valid approach. It kind of depends on the reasons why they’re on premise. If the reason why you’re on premise is regulatory, this data can’t leave this building, then obviously that’s not going to work. But if the reason you’re on premise is more like, “Well, we looked at the cost structure, this is what we like,” then I think that sort of approach could work.

Wes Reisz: In that three data center architecture that you talked about where you put a controller maybe in each data center, what are you seeing with constraints around latency, particularly with KRaft mode? Are there any considerations that you need to be thinking about?

Colin McCabe: Right now, most of our testing has been in the cloud, so we haven’t seen huge latencies between the regions, because typically we run a multi-region setup when we’re doing this testing. It’s kind of hard for me to say what the effect of huge latency would be. I do think if you had it, you would then see metadata operations take longer, of course. I don’t know that it would be a showstopper. It kind of depends on what the latency is, and it also depends on what your personal pain threshold is for latency.

We’ve seen very widely varying numbers on that. We had a few people who were like, “Man, I can’t let the latency go over five milliseconds roundtrip,” which is pretty low for a system, one of the systems we’re operating. Other people are like, “Well, it’s a few seconds and okay.” And we’ve seen everything in between. So, I think if you really want low latency and you own the hardware, then you know what to do. If you don’t own the hardware, then I think the question becomes a little more interesting. Then, we can talk about what are the workarounds there.

Are there any other architectural considerations to think about with KRaft?

Wes Reisz: What about on a more general level, are there any specific architectural considerations that you should consider with KRaft mode that we haven’t already spoke about?

Colin McCabe: I think from the point of view of a user, we really want it to be transparent. We don’t want it to be a huge change. We’ve bent over backwards to make that happen. All the admin tools should continue to work. All the stuff that we supported we’re going to continue to support. The exception I think is there’s a few third party tools. Some of them are open source, I think, that would just do stuff with ZooKeeper, and obviously that won’t continue to work when ZooKeeper’s gone, but I think those tools hopefully will migrate over. A lot of those tools already had problems with versions, they would have sort of an unhealthy level of familiarity with the internals of kafka. But if you’re using any supported tool, then we really want to support you. We really want to keep everything going.

Wes Reisz: That was a question I wanted to ask before I forgot, is that we’re talking about KRaft and we’re talking about removing ZooKeeper, but as someone who’s using kafka, there’s no change to the surface. It still fills and operates like kafka’s always done. And as you just said, all the tools, for the most part, will continue to work unless you’re trying to deal directly with ZooKeeper.

Colin McCabe: Yeah, absolutely. If you look at a service like Confluent Cloud, there really isn’t any difference from the user’s point of view. We already blocked direct access to ZooKeeper. We had to because if you have direct access to ZooKeeper, you can do a lot of things that break the security paradigm of the system. Over time, kafka has been moving towards hiding ZooKeeper from users, and that’s true even in an on-premise context, because if you have the ability to modify znodes, you have kafka’s brain is right there and you could do something bad. So, we generally put these things behind APIs, which is where I think they should be. I mean, APIs that have good stability guarantees, that have good supportability guarantees are really what our users want.

Wes Reisz: That’s some of the reason for the bridge release was to move some of those things to a point where you didn’t have direct access to ZooKeeper.

Colin McCabe: The bridge release is more about the internal API, if you want to call it that of kafka. The brokers can now communicate with a new controller, just as well as they could communicate with the old controller. In previous releases, we were dealing with external APIs. There used to be znodes that people would poke in order to start a partition reassignment. Obviously, that kind of thing is not going to be supported when we’re in KRaft mode. Over time, we phased out those old znode-based APIs, and that happened even before the bridge release.

What does the future roadmap look like for Kafka?

Wes Reisz: Okay. The last question I have is about future roadmap. What are you looking at beyond 3.4?

Colin McCabe: We’ve talked a little bit about this in some of our upstream KIPs. KIP-833 talks about this a bit. The big milestone to me is probably kafka 4.0 where we hope to remove ZooKeeper entirely, and this will allow us to focus all of our efforts on just KRaft development and not have to implement features twice or have older stuff we’re supporting. So, I think that’ll be a really red letter day when we can do that.

In the more short term, obviously upgrade from ZooKeeper is the really new cool thing that we’re doing. It’s not production ready yet, but it will be soon. Hopefully, 3.5, I don’t think we’ve agreed on that officially, but I would really like to make it production ready then. There’s a bunch of features that we need to implement, like JBOD. We want to implement delegation tokens, we want to implement a few other things, and once we’ve gotten to full feature parody with ZooKeeper mode, we can then deprecate ZooKeeper mode.

Wes Reisz: I’m curious about that. What does deprecation mean for kafka? Does it mean it’s marked eligible for deletion in some point in the future, or is there a specific timeline? What does deprecation mean?

Colin McCabe: Deprecation generally means when you deprecate a feature, you are giving notice that it’s going to go away. In kafka, we do not remove features until they’ve been deprecated in the previous major release. In order for us to remove ZooKeeper in 4.0, it’s necessary for us to officially deprecate ZooKeeper in a 3.X release. We originally wanted to do it in 3.4. That’s not going to happen because of the aforementioned gaps. But as soon as those gaps are closed, then I think we really do want to deprecate and basically give notice that, hey, the future is KRaft.

Wes Reisz: Well, Colin, I appreciate you taking time to catch us up on all things happening around KRaft mode, the removal of ZooKeeper with kafka.

Colin McCabe: It’s been such an interesting project because we have this vision of where we want to be with Kafka, and I think we’re so close to getting there, so that’s what’s exciting to me.

Wes Reisz: That’s awesome. We appreciate you giving us kind of a front row seat and taking us along with you on your journey. Once again, Colin, thanks for joining us on the InfoQ Podcast.

Colin McCabe: Thank you very much, Wesley. It’s always a pleasure to speak with you.

About the Author

.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Java News Roundup: Gradle 8.0, Maven, Payara Platform, Piranha, Spring Framework, MyFaces, Piranha

MMS Founder
MMS Michael Redlich

Article originally posted on InfoQ. Visit InfoQ

This week’s Java roundup for February 13th, 2023 features news from OpenJDK, JDK 20, JDK 21, Native Build Tools 0.9.20, Spring 6.0.5, Spring Cloud Data Flow 2.10.1, Quarkus 2.16.3, Payara Platform, Micronaut 3.8.5, Helidon 3.1.2, Vert.x 3.9.15, Hibernate Search 6.2.Alpha2, MyFaces 4.0-RC5, Grails 5.3.2, Reactor 2022.0.3, Metrics 1.11-M1 and Tracing 1.1-M1, Maven 3.9, Gradle 8.0 and Piranha 22.3.

OpenJDK

Ron Pressler, consulting member of the technical staff at Oracle and project lead of Project Loom, has submitted JEP Draft 8302326, Implicit Classes and Enhanced Main Methods (Preview). This feature JEP proposes to “evolve the Java language so that students can write their first programs without needing to understand language features designed for large programs.” This JEP moves forward the September 2022 blog post, Paving the on-ramp, by Brian Goetz, Java language architect at Oracle.

JDK 20

Build 36 of the JDK 20 early-access builds was made available this past week, featuring updates from Build 35 that include fixes to various issues. More details on this build may be found in the release notes.

JDK 21

Build 10 of the JDK 21 early-access builds was also made available this past week featuring updates from Build 9 that include fixes to various issues. Further details on this build may be found in the release notes.

For JDK 20 and JDK 21, developers are encouraged to report bugs via the Java Bug Database.

GraalVM Native Build Tools

On the road to version 1.0, Oracle Labs has released version 0.9.20 of Native Build Tools, a GraalVM project consisting of plugins for interoperability with GraalVM Native Image. This latest release provides: a new showPublications Gradle task that will list all Group | Artifact | Version (GAV) coordinates published on Maven; ensure only a single task can concurrently access the reachability metadata service to avoid deadlock when collecting metadata; and add quickstart guides for beginners using a clean Java project. More details on this release may be found in the changelog.

Spring Framework

The release of Spring Framework 6.0.5 features: early support for JDK 21; deprecate the ConcurrentExecutorAdapter class for removal in version 6.1; support for Optional in the PayloadMethodArgumentResolver class; and support for the @JsonNaming annotation when converting to native image with GraalVM. Further details on this release may be found in the release notes.

The release of Spring Cloud Data Flow 2.10.1 features: library updates to Spring Boot 2.7.8, Spring Framework 5.3.25 and Spring Shell 2.1.5; and updates to dependency projects such as: Spring Cloud Dataflow Build 2.10.1, Spring Cloud Deployer Kubernetes 2.8.1 and Spring Cloud Common Security Config 1.8.1. More details on this release may be found in the release notes.

Quarkus

Red Hat has released Quarkus 2.16.3.Final featuring support for custom Flyway credentials and URL. Other bug fixes and improvements include: register a CDI bean when an @ConfigMapping annotation is marked with the @Unremovable annotation; simplify the workflow in Quarkiverse Hub, the place to host and build Quarkus extensions; and a fix for quarkus:dev when the project.build.directory property is overridden by a profile. Further details on this release may be found in the release notes.

Payara

Payara has released their February 2023 edition of the Payara Platform that includes Community Edition 6.2023.2 and Enterprise Edition 5.48.0. Both versions share two improvements: rename MicroProfile OpenAPI property from mp.openapi.scan.lib to mp.openapi.extensions.scan.lib, a breaking change; and make it easier to locate and log an expired certificate. The Community Edition also includes a migration to the Jakarta Persistence 3.0 namespace for EJB Timer services. Notable bug fixes for both versions include: improve application deployment onJDK 11 and JDK 17; time out of Asadmin CLI utility commands, start/stop/restart-deployment-group; and revert the removal of the JobManager interface due to issues. More details on these releases may be found in the Community Edition release notes and the Enterprise Edition release notes.

Micronaut

The Micronaut Foundation has released Micronaut 3.8.5 featuring bug fixes, improvements in documentation, a dependency upgrade to Netty 4.1.87.Final and updates to modules, Micronaut OpenAPI and Micronaut Oracle Cloud. Further details on this release may be found in the release notes.

Helidon

Helidon 3.1.2, a bug fix release, ships with: a deprecation of the name() and filename() methods in the BodyPart interface to be replaced with the isNamed() method; a fix in the functionality of OIDC logout; improvements in the Helidon Config component; and create a backport of the OpenTelemetry specification in the Helidon 2.x release train.

Eclipse Vert.x

Despite the end-of-line for the 3.9 release train of Eclipse Vert.x in 2022, security updates will be made available through 2023. Version 3.9.15 delivers upgrades to Jackson 2.14.0, Netty 4.1.89 and Hazelcast 3.12.13 to address vulnerabilities CVE-2022-41881, CVE-2022-41915 and CVE-2022-36437. More details on this release may be found in the release notes.

Hibernate

The second alpha release of Hibernate Search 6.2.0 provides: compatibility with Elasticsearch 8.6 and OpenSearch 2.5; an upgrade of -orm6 artifacts to Hibernate ORM 6.2.0.CR2; simpler and/or/not predicates; mass indexing for multiple tenants; and a switch to UUIDs for identifiers in the outbox-polling coordination strategy.

Apache Software Foundation

The fifth release candidate of MyFaces Core 4.0.0, a compatible implementation of the Jakarta Faces specification, featuring: integration of the jsf.js next generation scripts; a migration of all unit tests to JUnit 5; display a warning if the selectOne attribute renders no selected item; and update logging in the WebConfigParamsLogger class. Further details on this release may be found in the release notes.

Grails

Versions 5.3.2 and 5.3.1 of Grails were released this past week as version 5.3.2 patched version 5.3.1 due to an issue with upgrading the Maven coordinate, org.apache.maven:maven-resolver-provider, from version 3.8.3 to 3.9.0. Otherwise, version 5.3.1 was comprised of dependency upgrades such as: Micronaut 3.8.4, Grails Gradle Plugin 5.3.0, com.netflix.nebula:gradle-extra-configurations-plugin 9.0, Vue 5.0.3 and io.methvin:directory-watcher 0.18.0.

Project Reactor

Project Reactor 2022.0.3, the third maintenance release, provides dependency upgrades to reactor-core 3.5.3 and reactor-netty 1.1.3 and reactor-kafka 1.3.16.

Micrometer

The first milestone release of Micrometer Metrics 1.11.0 delivers new features such as: support for the Azul Prime C4 Garbage Collector and Apache HttpClient 5.x; and a new method, observe(Function function), in the Observation interface to complement the existing observe(Runnable runnable) and observe(Supplier supplier) methods.

The first milestone release of Micrometer Tracing 1.1.0 features: support for: no-operation implementations of the PropagatingSenderTracingObservationHandler and PropagatingReceiverTracingObservationHandler classes; and custom Mapped Diagnostic Context (MDC) keys for the Slf4JEventListener class.

Maven

Maven 3.9.0 has been released with new features such as: a new MAVEN_ARGS environmental variable; allow for building an application in multiple local repositories; the ability to store snapshots in a separate local repository; provide a warning related to a deprecated Mojo plugin; and simplify the integration of Redis Java Client (Redisson) and Hazelcast for the Maven artifact resolver.

Gradle

After five release candidates, the release of Gradle 8.0 delivers: a new Kotlin DSL that provides an alternative syntax to the Groovy DSL; improvements in the the buildSrc builds; a configuration cache, an incubating new feature; and improvements in Java toolchains. More details on this release may be found in the release notes and InfoQ will follow up with a more detailed news story.

Shortly after the GA release, a patch release, Gradle 8.0.1 provides fixes for these issues: document the integration of the Scala plugin with toolchains and problems with the target flag; removal of the --no-rebuild command-line option without prior warning and deprecation notice; and a Scala build failure that reports the value, isBlank, as not a member of the String class.

Piranha Cloud

The release of Piranha 23.2.0 provides notable changes such as: deprecate the LoggingExtension and MimeTypeExtension classes; relocate the debug module in the pom.xml file to the test directory; and introduce a new static utility class, WarFileExtractor, for extracting WAR files. Further details on this release may be found in their documentation and issue tracker.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.