Harbor Capital Advisors Inc. Sells 2,226 Shares of MongoDB, Inc. (NASDAQ:MDB)

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Harbor Capital Advisors Inc. lessened its holdings in shares of MongoDB, Inc. (NASDAQ:MDBFree Report) by 63.1% during the 4th quarter, according to its most recent filing with the Securities and Exchange Commission. The firm owned 1,301 shares of the company’s stock after selling 2,226 shares during the quarter. Harbor Capital Advisors Inc.’s holdings in MongoDB were worth $303,000 as of its most recent SEC filing.

Other hedge funds and other institutional investors have also recently added to or reduced their stakes in the company. Hilltop National Bank increased its position in shares of MongoDB by 47.2% during the fourth quarter. Hilltop National Bank now owns 131 shares of the company’s stock worth $30,000 after acquiring an additional 42 shares in the last quarter. Quarry LP boosted its stake in MongoDB by 2,580.0% in the 2nd quarter. Quarry LP now owns 134 shares of the company’s stock worth $33,000 after purchasing an additional 129 shares during the period. Brooklyn Investment Group bought a new position in MongoDB during the 3rd quarter worth about $36,000. Continuum Advisory LLC raised its stake in shares of MongoDB by 621.1% in the 3rd quarter. Continuum Advisory LLC now owns 137 shares of the company’s stock valued at $40,000 after purchasing an additional 118 shares during the period. Finally, GAMMA Investing LLC lifted its holdings in shares of MongoDB by 178.8% in the third quarter. GAMMA Investing LLC now owns 145 shares of the company’s stock valued at $39,000 after purchasing an additional 93 shares in the last quarter. 89.29% of the stock is currently owned by institutional investors.

MongoDB Price Performance

MDB traded up $7.40 on Wednesday, hitting $249.81. 121,113 shares of the stock were exchanged, compared to its average volume of 1,432,734. MongoDB, Inc. has a 12 month low of $212.74 and a 12 month high of $509.62. The firm has a market capitalization of $18.60 billion, a price-to-earnings ratio of -91.17 and a beta of 1.25. The business has a 50-day moving average of $280.67 and a two-hundred day moving average of $269.51.

MongoDB (NASDAQ:MDBGet Free Report) last posted its earnings results on Monday, December 9th. The company reported $1.16 earnings per share for the quarter, topping analysts’ consensus estimates of $0.68 by $0.48. The firm had revenue of $529.40 million for the quarter, compared to analyst estimates of $497.39 million. MongoDB had a negative net margin of 10.46% and a negative return on equity of 12.22%. The company’s revenue for the quarter was up 22.3% compared to the same quarter last year. During the same period in the previous year, the firm posted $0.96 EPS. Equities research analysts expect that MongoDB, Inc. will post -1.86 earnings per share for the current fiscal year.

Analysts Set New Price Targets

A number of research analysts recently issued reports on MDB shares. Rosenblatt Securities started coverage on MongoDB in a report on Tuesday, December 17th. They set a “buy” rating and a $350.00 price objective for the company. Stifel Nicolaus raised their price target on shares of MongoDB from $325.00 to $360.00 and gave the stock a “buy” rating in a research note on Monday, December 9th. Tigress Financial boosted their price objective on shares of MongoDB from $400.00 to $430.00 and gave the stock a “buy” rating in a research report on Wednesday, December 18th. Monness Crespi & Hardt cut shares of MongoDB from a “neutral” rating to a “sell” rating and set a $220.00 target price for the company. in a report on Monday, December 16th. Finally, Mizuho lifted their price target on shares of MongoDB from $275.00 to $320.00 and gave the company a “neutral” rating in a research note on Tuesday, December 10th. Two equities research analysts have rated the stock with a sell rating, four have given a hold rating, twenty-two have given a buy rating and one has given a strong buy rating to the company. According to data from MarketBeat.com, the company presently has an average rating of “Moderate Buy” and a consensus target price of $364.64.

View Our Latest Report on MongoDB

Insider Transactions at MongoDB

In other MongoDB news, CFO Michael Lawrence Gordon sold 5,000 shares of the stock in a transaction that occurred on Monday, December 16th. The stock was sold at an average price of $267.85, for a total transaction of $1,339,250.00. Following the transaction, the chief financial officer now directly owns 80,307 shares in the company, valued at approximately $21,510,229.95. This represents a 5.86 % decrease in their position. The transaction was disclosed in a filing with the Securities & Exchange Commission, which is accessible through this link. Also, CAO Thomas Bull sold 169 shares of MongoDB stock in a transaction on Thursday, January 2nd. The shares were sold at an average price of $234.09, for a total transaction of $39,561.21. Following the completion of the sale, the chief accounting officer now owns 14,899 shares of the company’s stock, valued at $3,487,706.91. This represents a 1.12 % decrease in their position. The disclosure for this sale can be found here. Insiders sold a total of 23,776 shares of company stock worth $6,577,625 over the last three months. 3.60% of the stock is owned by company insiders.

MongoDB Company Profile

(Free Report)

MongoDB, Inc, together with its subsidiaries, provides general purpose database platform worldwide. The company provides MongoDB Atlas, a hosted multi-cloud database-as-a-service solution; MongoDB Enterprise Advanced, a commercial database server for enterprise customers to run in the cloud, on-premises, or in a hybrid environment; and Community Server, a free-to-download version of its database, which includes the functionality that developers need to get started with MongoDB.

Featured Articles

Institutional Ownership by Quarter for MongoDB (NASDAQ:MDB)

Before you consider MongoDB, you’ll want to hear this.

MarketBeat keeps track of Wall Street’s top-rated and best performing research analysts and the stocks they recommend to their clients on a daily basis. MarketBeat has identified the five stocks that top analysts are quietly whispering to their clients to buy now before the broader market catches on… and MongoDB wasn’t on the list.

While MongoDB currently has a “Moderate Buy” rating among analysts, top-rated analysts believe these five stocks are better buys.

View The Five Stocks Here

7 Stocks to Own Before the 2024 Election Cover

Looking to avoid the hassle of mudslinging, volatility, and uncertainty? You’d need to be out of the market, which isn’t viable. So where should investors put their money? Find out with this report.

Get This Free Report

Like this article? Share it with a colleague.

Link copied to clipboard.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


How Database Storage Engines Have Evolved for Internet Scale – The New Stack

MMS Founder
MMS RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

<meta name="x-tns-categories" content="Databases / Operations / Storage“><meta name="x-tns-authors" content="“>


How Database Storage Engines Have Evolved for Internet Scale – The New Stack


<!– –>

As a JavaScript developer, what non-React tools do you use most often?

Angular

0%

Astro

0%

Svelte

0%

Vue.js

0%

Other

0%

I only use React

0%

I don’t use JavaScript

0%

2025-01-14 09:00:33

How Database Storage Engines Have Evolved for Internet Scale

sponsor-aerospike,sponsored-post-contributed,

Out-of-place updates drive excellent write performance relative to in-place updates, but sacrifice read performance in the bargain.


Jan 14th, 2025 9:00am by


Featued image for: How Database Storage Engines Have Evolved for Internet Scale

The design of database storage engines is pivotal to their performance. Over decades, SQL and NoSQL databases have developed various techniques to optimize data storage and retrieval.

Database storage engines have evolved from early relational systems to modern distributed SQL and NoSQL databases. While early relational systems relied on in-place updates to records, modern systems — both distributed relational databases and NoSQL databases — primarily use out-of-place updates. The term “record” is used to refer to both tuples in a relational database as well as key-values in a NoSQL store.

Out-of-place updates became popular as a result of the extremely heavy write workloads that modern databases encountered with the advent of internet-scale user events, as well as automated events from sensors (e.g., Internet of Things) flowing into a database.

These two contrasting approaches — in-place updates and out-of-place updates — show how out-of-place updates drive excellent write performance relative to in-place updates, but sacrifice read performance in the bargain.

Layers of a Storage Engine

Let’s begin with an overview of the layered architecture of storage engines. A database storage engine typically consists of three layers:

  1. Block storage: The foundational layer, providing block-level access through raw devices, file systems or cloud storage. Databases organize these blocks for scalable data storage.
  2. Record storage: Built atop block storage, this layer organizes records into blocks, enabling table or namespace scans. Early relational systems usually updated records in place while the more modern storage engines use out-of-place updates.
  3. Access methods: The topmost layer includes primary and secondary indexes, facilitating efficient data retrieval. Updates to access methods also can be in place or out of place, as we will see shortly. Many current systems apply the same methodologies, in-place updates or out-of-place updates, for both the record storage and access methods. We will therefore talk about these two layers together in the context of how they are updated.

Let’s delve deeper into each layer.

Block Storage

At its core, the block storage layer organizes data into manageable units called blocks (B1 and B2 in Figure 1 below). These blocks act as the fundamental storage units, with higher layers organizing them to meet database requirements. Figure 1 illustrates a basic block storage system. Record storage and access methods are built on top of the block storage. There are two broad categories of record storage and access methods corresponding to whether updates happen in place or out of place. We will describe the record storage and access methods under these categories next.

Figure 1: Block storage showing blocks B1 and B2.

Figure 1: Block storage showing blocks B1 and B2.

Storage and Access Methods With In-Place Updates

The approach of updating records and the access methods in place was the standard in early relational databases. Figure 2 (below) illustrates how a block in such a system is organized and managed to provide a record storage API. Notable features of such a record storage layer include:

  • Variable length records: Records often vary in size, and the size may change during updates. To minimize additional IO operations during updates, the record storage layer actively manages block space to accommodate updates within the block.
  • One level of indirection: Each record within a block is identified by a slot number, making the record ID (RID) a combination of the block ID and slot number. This indirection allows a record to move freely within the block without changing its RID.
  • Slot map: A slot map tracks the physical location of each record within a block. It grows from the beginning of the block while records grow from the end, leaving free space in between. This design allows blocks to accommodate a variable number of records depending on their sizes, and supports dynamic resizing of records within the available space.
  • Record migration: When a record grows too large to fit within its original block, it is moved to a new block, resulting in a change to its RID.
Figure 2: Record storage for in-place updates showing how a block is organized internally.

Figure 2: Record storage for in-place updates, showing how a block is organized internally.

Access methods are built on top of record storage to efficiently retrieve records. They include:

  • Primary indexes: These indexes map primary key fields to their corresponding RIDs.
  • Secondary indexes: These indexes map other field values (potentially shared by multiple records) to their RIDs.

If the index is completely in memory, then self-balancing trees, such as red-black (RB) trees, are used. If the index is primarily on disk (with parts possibly cached in memory), B+-trees are used. Figure 3 shows a B+-tree on top of a record storage. The primary index as well as the secondary index would have the same format for the entries (field value and RID).

Figure 3: B+-tree on top of record storage. 

Figure 3: B+-tree on top of record storage.

Combining Access Methods and Record Storage

In some systems, the access method and record storage layers are integrated by embedding data directly within the leaf nodes of a B+-tree. The leaf level then essentially becomes a record storage, but additionally is also now sorted on the index key. Range queries are made efficient as a result of this combination compared to an unsorted record storage layer. However, to access the records using other keys, we would still need an access method (an index on other keys) on top of this combined storage layer.

Storage and Access Methods With Out-of-Place Updates

Most modern storage engines, both distributed NoSQL and distributed SQL engines, use out-of-place updates. In this approach, all updates are appended to a current write block maintained in memory, which is then flushed to disk in one IO when the block fills up. Note that durability of the data before the write hits the disk if this node were to fail is mitigated by the replication within the distributed database. Blocks are immutable, with records packed and written only once, eliminating the need for space management overhead. The older version of the record will be garbage-collected by a cleanup process if that is desired. This has two advantages:

  1. Amortized IO cost: All the records in the write block together need one IO compared to at least one IO per record for in-place updates.
  2. Exploits sequential IO: These techniques were invented in the era of magnetic hard disk drives (HDD), and sequential IO was way superior to random IO in HDDs. But even in the era of SSDs, sequential IO is still relevant. The append-only nature of these systems lends itself to sequential IOs.

The most well-known and commonly used form of out-of-place update storage engines use a data structure called log-structured merge-trees (LSM-trees). In fact, LSM-trees are used by almost all the modern database storage engines, such as BigTable, Dynamo, Cassandra, LevelDB and RocksDB. Variants of RocksDB are employed by systems like CockroachDB and YugabyteDB.

LSM-Trees

The foundational concepts for modern LSM-tree implementations originate from the original paper on the concept, as well as from the Stepped-Merge approach, which was developed concurrently.

The Stepped-Merge algorithm arose from a real, critical need: managing the entire call volume of AT&T’s network in 1996 and recording all call detail records (CDRs) streaming in from across the United States. This was an era of complex phone billing plans — usage-based, time-of-day-based, friends-and-family-based, etc. Accurately recording each call detail was essential for future billing purposes.

However, the sheer volume of calls overwhelmed the machines of the time, leading to the idea of immediately appending CDRs to the end of record storage, followed by periodic “organization” to optimize lookups for calculating bills. Bill computations (reads) were batch jobs with no real-time requirements, unlike the write operations.

The core idea behind solving the above problem was to accumulate as many writes as possible in memory and write it out as a sorted run at level 0 once memory fills up. After a certain number, T, of level 0 runs are available, they are all merged into a longer sorted run at level 1. During the merge, duplicates could be eliminated if required.

This process of merging T-sorted runs at level i to construct a longer run at level i+1 continues for as many levels as is required, drawing inspiration from the external sort merge algorithm. This idea is very similar to the original LSM-tree proposal and forms the basis of all modern LSM-based implementations, including the concept of T components per level. The merge process is highly sequential-IO friendly, with the cost of writing a record amortized over multiple sequential-IO operations for several records.

However, the reads, in the worst case, must examine every sorted run at each level, incurring the penalty of not updating in place. Yet, looking up a key in a sorted run is made efficient by an index, such as a B+-tree, specific to that sorted run. These B+-trees directly point to the physical location (as opposed to a RID), since the location remains constant. Figure 4 illustrates an example of an LSM-tree with three levels and T=3 components per level.

The sorted runs are shown as B+-trees to optimize read operations. Notice that the leaf level represents the sorted run, while the upper levels are constructed bottom-up from the leaf (a standard method for bulk loading a B+-tree). In this regard, an LSM-tree can be considered a combination of an access method and a record-oriented storage structure. While sorting typically occurs on a single key (or a combination of keys), there may be cases requiring access via other keys, necessitating secondary indexes on top of the LSM-tree.

Figure 4: Example LSM trees with three levels on disk and three components per level.

Figure 4: Example LSM trees with three levels on disk and three components per level.

Comparing In-Place and Out-of-Place Updates

The table below compares key features of storage engines of early relational systems with those developed for modern storage engines. It assumes that one record is being written and one primary key value is being read. For early relational systems, we assume the presence of a B+-tree index on the primary key (the details of whether the leaf level contains the actual data or a record identifier (RID) do not significantly affect this discussion). For the LSM-tree (most common modern storage engines), the assumption is that the sorted runs (and the B+-trees) are based on the primary key.

Conclusion

Storage engines have evolved to handle the heavy write workloads many database systems encountered with the advent of internet scale. LSM-trees have become popular to solve this challenge of handling heavy write workloads. However, LSM-trees do give up on real-time read performance relative to the infrastructure processing unit (IPU)-based storage engines used in early relational systems. Under some circumstances, it may be wise to find a system that blends the best of both of these ideas: Use out-of-place updates for record storage to be able to continue to handle the write-heavy workload, but use in-place updates for access methods to minimize the read overhead.

Visit our website to learn more about Aerospike Database.

YOUTUBE.COM/THENEWSTACK

Tech moves fast, don’t miss an episode. Subscribe to our YouTube
channel to stream all our podcasts, interviews, demos, and more.

Group
Created with Sketch.







Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Presentation: Where is the Art? A History in Technology

MMS Founder
MMS Andy Piper

Article originally posted on InfoQ. Visit InfoQ

Transcript

Piper: In October 1971, a gentleman called Frieder Nake published a note in PAGE, the Bulletin of the Computer Arts Society, entitled, “There Should Be No Computer Art”. “Soon after the advent of computers, it became clear that there was a great potential application for them in the area of artistic creation”, he began. “Before 1960, digital computers helped to produce poetic text and music. Analog computers, or only oscilloscopes, generated drawings of sets of mathematical curves and representations of oscillations. It was not before the first exhibitions of computer produced pictures were held in 1965 that a greater public took notice of this threat, as some said, progress, as some thought. I was involved in this development from its beginning onward in 1964.

I found the way the art scene reacted to the new creations, interesting, pleasing, and stupid. I stated in 1970 that I was no longer going to take part in exhibitions. I find it easy to admit that computer art did not contribute to the advancement of art, if we judge advancement by comparing the computer products to all existing works of art. In other words, the repertoire of results of aesthetic behavior has not been changed by the use of computers. This point of view, namely, that of art history, is shared and held against computer art by many art critics. There is no doubt in my mind”, he said, “that interesting new methods have been found which can be of some significance for the creative artist”.

As you might imagine, this was a bit of a controversial take. Here was a man who had for part of the previous decade been an insider, been an advocate for the use of algorithmic and generative processes to create art. He’d taken part in exhibitions around the world in that mid-60’s period up to 1970, and his output in that period was estimated at around 300 to 400 works in ink produced on a high precision flatbed plotter. Frieder Nake’s Wikipedia page, says this, “His statement was rooted in a moral position.

The involvement of computer technology in the Vietnam War and in massive attempts by capital to automate productive processes and thereby generate unemployment, should not allow artists to close their eyes and become silent servants of the ruling classes by reconciling high technology with the masses of the poor and suppressed”. I’ll just finish this piece by reading another piece from what he posted in that article, which is, “Questions like, is a computer creative, or is a computer an artist, or the like, should not be considered serious questions, period. In the light of the problems we are facing at the end of the 20th century, those are irrelevant questions”.

Background

I’m Andy. I live primarily online on the federated social web. For the past 25 years, I’ve worked alongside you in the technology industry, and I expect to continue to do so. I graduated in 1997 with a degree in modern history, which has always made me something of an interesting person at career fairs, when my employers have been rolling me out to encourage folks to get involved in technology. I’m self-taught as a developer. I’m not here to talk about AI and large language models, and generative creation of art using those means. Instead, we’re going to go on a journey and look at one aspect of computer history, and that is creative technology and art, and the ways in which it’s been considered a threat and misunderstood at different times in our shared history. We’ll also find out how I’ve accidentally become an artist. We start out in some despair. We’re going to go through some discovery. I hope, I certainly have, we will find some delight.

The Event

Let’s talk about what happened, the event. This is my euphemistic term for my recent career path. I was laid off from my dream job from a company that no longer exists. I spent nine years of my life working there, and I spent 15 years of my life on that platform, passionate about enabling people to communicate in real-time, openly around the world. For the first time in our shared history, I think that platform created a lot of opportunities and new ways to communicate that we’ve inherited from it. All it took was one spiteful billionaire to change everything in a moment and tear it all down. It was a dramatic change, and I knew that I was going to have to take some time away from what I’d been doing, take a step back.

During and since the pandemic, my wife and I had found our house filling up with our hobbies, gubbins, just lots of things I like to play around with, electronics and retro gadgets and things. My wife likes to sew and do other things with her handcrafting, and the house was filling up. We thought we’d try to find a space for those. We ended up renting an art studio and moving our hobbies there. There’s a longer story here, if you’re interested, about the difficulties of renting office space if what you want to do is hot soldering. I’ve had some interesting conversations about insurance there. I’ll tell you about this adventure, and then we’ll come on to the art history piece I mentioned.

While I was working through that layoff process, which took a longer time than you might expect, for reasons, we were also getting settled into this art studio. We’ve got a space over in Southwest London. It’s just outside Wimbledon. It’s an old converted paper warehouse. It’s got a range of artists, painters, sculptors, ceramicists, photographers, folks that do picture framing as well. My wife, Heidi, and I, moved in there, and we didn’t really think of ourselves as going there to do art. We just wanted a space for our hobbies. I excitedly started going to IKEA, getting lots of shelving, getting all my stuff set up. I grew up in the 1980s and early ’90s, and loved the 8-bit era of computers, where you probably may or may not have taken the thing apart and had a look inside. I had an Acorn Electron at home, and I had BBC Micros at school.

For me, the reemergence in the last 10 to 15 years of affordable, accessible technology with an educational focus, we’re talking about things like Arduino and Raspberry Pi, has really enabled me to re-engage with my early passion for technology. I love solving problems with code, and I also love to tinker around with electronics. In fact, about three years ago, I started to get involved with the MicroPython project. MicroPython is a rewrite of Python, implementation of Python that runs on microcontrollers. Once you’ve connected up a few sensors to your little small computing board, you might want to put that into a case. As luck would have it, my friend was getting rid of an old 3D printer, so I inherited that from him. Before long, I had this whole studio, and I had three 3D printers. Not that I printed more, but I could have done that as well.

The Makerspace

Within a few months, at the beginning of 2023, my wife and I built up this small makerspace for the two of us. She was using her cutting machine to create crafts with vinyl, and I only had to 3D print, but we still didn’t really know what we were doing there. We just had this space. I was really in denial. It was quite a traumatic experience watching what happened at my former employer. Several of our immediate neighboring artists in the studios are very traditional painters. Have been there for quite a long time, very different styles. We didn’t really feel very connected to them. The Wimbledon Art Studios run a show twice a year. Our first show was this time last year, last May, and we were invited to take part. We thought, it sounds great. A lot of the other artists were encouraging us and saying, “Great commercial opportunity”. What are we going to sell? I thought I could print some bits and pieces, some pots and some trays.

My wife was creating things with her vinyl cutter, creating bags and T-shirts and things. I thought it’d be good to have something a bit more interesting than that. I got a little toy 3D printer. You can get them for less than £100, not very high quality, but great for playing with learning. I put one of those out. Then I’ve also seen this thing in a magazine, which I thought I’d have a go at making, so that if people came into the studio, I could talk about technology. This is BrachioGraph. BrachioGraph means arm-writer. It was created by a gentleman called Daniele Procida. He presented it at PyCon UK in 2019. It’s super simple, you can see here. It’s made up of lollipop sticks. It’s got three small servomotors, a little clip to hold the pen, and beyond that, you’ve got a Raspberry Pi Zero. The code is all in Python. It’s all open source. The recipe is online for about £20, just a little bit over. You too can build your own drawing robot. It’s lovely and basic.

One of the things I love about it is its limitations. It’s two arms on rotational joints, it’s like your arm. It can only draw, however, curves. It cannot move in straight lines. The lines are wiggly and inaccurate. They’re very cheap motors, it’s very slow. It’s going to draw a little bit for us. I put this out on display, and I had a few things being drawn as people came through the studios, and we were talking about 3D printing and other things. Again, we were very different to what all the other artists were doing. I’d get to talk to, especially the youngsters who would go, “Dad, look, this robot’s drawing things”. If they liked whatever it had produced, I just gave them the bit of paper to take away home with them. I enjoyed explaining the limitations and how it worked with the arm and the fact that it can’t draw straight lines.

In order to draw straight lines, you’re going to need high precision. You’re going to need an x and y axis to move your pen around. Something else has an x and y axis and also has a z axis, and that’s a 3D printer. A plotter is a 2D printer. A 3D printer has a z axis, moves up and down, as well as left and right and around. You put an extruder on the top and squirt hot plastic through, and you’ve got a 3D printer. Things were just starting to come together, in my mind, as we did this. This was never going to threaten our neighbors in the studios, and I don’t want to threaten our neighbors in the studio. This is not something that alarmed anybody in the studios as the new fancy artist in town.

Reemergence of Pen Plotters

After that show, I thought, I’m interested in this. I’m going to go and buy a proper one, because everybody got really engaged in my plotter. This is a proper plotter that you could buy commercially. It’s called an AxiDraw. This is a nice one. I got that in. My wife got really interested as well. This is a piece that actually we’re going to have in the show in May. It’s got a lovely open-source ecosystem, this machine. The hardware is not open source, but you can drive it using Inkscape. It’s got a Python API. Once that arrived, my wife got interested. This is a sped-up thing that you can see. It’s nice and accurate. This is just drawing a quick postcard. I immediately regretted that I only got the A4 version, because I want to now do really big things. Once I started to think more about the space that we’d found ourselves in, several things began to emerge.

The first one is that as I looked at what I was doing, and then starting to discover what other people were doing with plotters, I realized that this was not new at all. There’s quite a renaissance as people today are starting to use pen plotters a bit more again, but as we’ll see, people have been using pen plotters for a long time. I also realized that while we could transform images to lines and draw them out using a plotter, in the case of the BrachioGraph, I gave it a picture, and it transformed that picture into what was drawn. You can also go directly without having an intermediate picture stage. You can write some code and drive the plotter. You don’t need to be creative to come up with something first, if you don’t want to. You don’t have to be artistic. You don’t have to have some huge artistic vision to come up with something that’s interesting.

Another thing that I particularly fell in love with was that this is about tangible, physical output. I think digital creations are amazing and fantastic and fun, but there’s a whole new rabbit hole you find yourself going down when you start getting interested in the different materials that you’re using as well.

We had another show coming up in November last year, and I, because I’m constantly living online, reading what people are doing, discovered this article in HackSpace magazine, which is available for free online, as well as in the shops, in paper form, if you prefer. A gentleman called Ben Everard had bought a cheap plotter online with laser cut pieces using an Arduino, and found that the code didn’t work. He got frustrated by that, and decided that he wanted to recreate the thing himself. He wrote an article about it. I tried to follow this article and found that it had quite a few missing details. The process of building this plotter was a bit more complicated. I needed to 3D print some parts. I needed to put together a small circuit using a Raspberry Pi Pico, which is a microcontroller.

The way that this plotter works is a hanging plotter, and it’s called a polargraph. You have this central gondola that moves around on strong cables, otherwise known as cotton, pieces of cotton from pulleys. It’s pretty cheaply made. It’s a little bit annoying, because it’s polar, it always needs to recenter in the middle of the board. There’s nothing automatic to do that. You’re trying to press buttons to get the thing to come back to the center each time. I’m not going to give you a blow-by-blow account of how I built this. You can go and read about it on my website, if you would like. There’s a project page for it. I’ll tell you a little bit about it. G-code in 3D printing, is simply a set of instructions that tell a 3D printer how to move the head in x and y and z, and at what points to heat up the filament and temperature to push it through at. Plotters work on a very similar set of principles, G-code will give it a set of x and y instructions.

Open-source software here has benefited both the 2D printer plotter and 3D printer areas. CNC machines and drilling machines also use the same set of instructions. This plotter simply runs a piece of code originally written for the Arduino, called GRBL. You just transform your image into this G-code, fire it over a serial port to, in this case, the Raspberry Pi Pico, and it sends instructions, and it moves around and draws things. Away goes the plotter. It’s simple and it works.

As well as having built that, my wife and I actually used the AxiDraw, the proper plotter, to start to create some art ourselves. There’s a lovely piece of software, software solves all problems, or a lot of them anyway, that enables us to take our images and decompose them into lines that the plotter can create. You’ll see here on the right-hand side of the image, are some pictures of lighthouses. Those are pictures of fairly low-quality digital camera taken images from around the Great Lakes in the U.S., where Heidi is from, and they were taken about 20 years ago. They weren’t high quality. If you transform them into some plotter art, you get these quite nice effects.

On the left-hand side of the image are some things I came up with which are a bit more abstract, decomposed circuit-like type diagrams. We hung these up on the wall outside. If you met me during the event here, and I gave you a card, you got a small Andy Piper original on the back of the card. We had those outside. We had those up for sale. I put the hanging cluster inside the door, and people came through. It was the 60th anniversary of the best TV show in the world that Jeremy knows all about, because he made the first computer game for. I had that drawing on the wall as well. It was a little bit of fun. It definitely got people talking about what we were doing. We could draw them in and say, come and see a machine drawing things.

Where is the Art?

A lot of the visitors were much more interested. We had things on the wall outside the studio this time. We had something to talk about. One specific woman came and said to my wife, we’re looking at the lighthouses, when she said, “Where’s the art in this?” My wife has a little bit less patience than me. I’ve been doing developer relations for 15 years, so I hopefully am a little bit more tactful.

Before she got too annoyed, I jumped in and started talking this lady through the process of choosing the image, working out which algorithm would be good for processing it into line art, the choice of pens and materials, actually putting it through the plotter, the whole process. It’s not the same as taking a photo on your phone of Big Ben, and going home and sending it to your photo printer and printing out 10 copies. Very few people print out copies I think of photos anyway. It’s not the same, each one of those copies is a carbon copy. Each one of these is a unique thing. She seemed quite satisfied once she got through her inquiry and made her point. I think this is really interesting because it comes back to Frieder Nake and what was happening in 1970 when he was finding that the traditional art world was saying, no.

Contemporary Plotter Artists

One of my favorite 1960s art pieces is this piece by Georg Nees, in 1968, it’s called Schotter, which means gravel in English. This was a period, mid-1960s when computers were not small, didn’t have rich graphical displays, didn’t have easy to use input devices. I think this is quite a lovely thing. It’s a very simple algorithm. You draw a square, and then you repeat that square, adding a little bit of noise for each iteration, and then you get this lovely collapsing effect that I find visually pleasing. Nees was interested in the relationship between order and chaos.

This piece is now in the collection at the Victoria and Albert Museum, so it must be art. This was entirely created using code. It’s often difficult today to take the programs from 1968, in this case, and rerun them. This was written in ALGOL. Of course, computer systems have moved on. You certainly wouldn’t be running something today at the same speed that a system was creating it in 1968.

Often, the compilers, the interpreters, have gone away as well. Input and output device is completely different. It’s incredibly cool to me that you can download a Rust package called whiskers today, fire it up, and do that exact same thing in real-time with sliders that let you modify all of the parameters and experiment and see what the effects would be. Today we have programming environments like Processing, p5.js. There are packages in Rust and Python and others that let you do some incredibly fun things in an experimental way. There’s another strand here which I’m going to leave hanging and let you go and research if you’re interested on your own, around preservation of our shared computer history and past, and how we can preserve those original ideas.

I went to the V&A, and a lot of the pieces are not on permanent display. They do have cyclical exhibitions. Schotter, you can find it on their website. You get the information from the page. You send them an email, say, I’d like to see this piece, please, and you get invited to go and have a look. You’ve got to give them about a week’s notice. I read the piece of information about the original A4 size sheet. I sat in the library there at the V&A. The lady wheeled out a trolley with the items I’d ordered. This was one of them. This is, in fact, a 1-meter-high lithograph, because this was the display piece. This was the piece that was actually displayed in Montreal in 1972. It’s quite fascinating.

Down in the corner, you see this little detail, and you’ll find this across Nees pieces, because Nees, as it happened, worked for Siemens in the 1960s, a Siemens System 4004. This is a Siemens System 4004, as you can see, highly portable, very easy to tinker with. This was a computer from the 1971 movie, “Charlie and the Chocolate Factory”, that helped to find the location of the final golden ticket, so IBM System/360 compatible.

To show you a couple of other pieces from the V&A. This is one from a gentleman called Peter Struycken. This one’s from 1969. I love this one because we actually have the original code preserved here, and because this is QCon, let’s have a look at the code. I don’t know what language this is. I had a look. There are some familiar structures there. I cut and posted it into Google Gemini. I said, “Can you help me figure out what language this is in?” It said, “It might be ALGOL, it might be Pascal, I can’t quite tell. Give me some more context”. I said, it was written by Peter Struycken. Gemini gave me the Wikipedia entry for him. Didn’t tell me how to rerun this code. This is another amazing piece from the collection.

This one is fascinating to me because it actually dates from 1962 as you’ll see. It’s from a British artist called Desmond Paul Henry. This is not from a programmable computer. It is from a mechanical computer. It is from a mechanical computer that was used as a mechanical army analog bomb site computer, actually, that he repurposed. He rebuilt that computer into three drawing machines, and took their swinging arm parts and attached pens to them. I did like the texture, and you can see the line the Biro drew on the card. Let’s come back to the coding aspect. I’ve got another little bit to read for you here, because I do love the way that this has been written about.

Before I do that, I’ll point out that Georg Nees, Frieder Nake, who are two people I’ve mentioned, were only a small number of a group of pioneers that included folks like Vera Molnár, and they were exploring this aspect of code primarily from a mathematical standpoint, into art pieces.

As we move from the ’60s into the ’70s, we also then start to see more capable output devices. This book, “Tracing the Line”, was published just earlier this year, and covers a variety of contemporary plotter artists. The introduction has a lovely background, and I was just going to read you this part. “The first attempts at generative art date back to the 1960s where no graphic software existed, let alone a screen or computer mouse. Frieder Nake, one of the pioneers of this art form was 25 years old and studying mathematics at the University of Stuttgart, when he began experimenting with a ZUSE Graphomat Z64, a drawing machine that was a predecessor to modern plotters, capable of creating intricate graphics by writing programs.

These programs had to be transferred to punched cards, which were then processed by a big computer, which then came out with a punched tape, which was fed into the Graphomat to create the art pieces”. The ZUSE Graphomat supported the use of four pens. If you actually go and look at Frieder Nake’s work, which, again, you can do in the V&A, you’ll see that he uses four colors. Referring to conceptual art, Frieder Nake repeatedly said that the program itself was the artwork, the execution, the image merely represented the surface. This is where the backlash comes in. We’ve got this group of young 25-year-old mathematicians starting to use computers to generate things that people are getting interested in. Traditional art world, “Witchcraft. Computers are not thinking machines. They must not be allowed to create art”.

Harold Cohen (Traditional Artist), and the AARON Program

There was a traditional artist, though, a British artist, in fact, called Harold Cohen. He was an existing artist. He had a career as a painter. He started to think about how computers could be applied to his existing practice. He taught himself to code, like me, actually, which is quite a nice parallel. He started to consider the choices that could be taken away from his artistic practice, like where to put the color or how to arrange lines by giving those choices to a program or a computer. He continued to impose rules on the output that the computer created. Cohen moved to the University of California in 1968, and he went through this transitional period in the ’60s and ’70s, where he was moving from working with mixed media, which is what he had previously done, to co-creating with the computer.

What you can see here is a detail of a piece that’s printed on a dot matrix piece of paper. I’ve taken a closer up picture so you can see here on the right-hand side, it’s a pattern of numbers printed in a diamond shape, and then he’s gone in with a felt tip and drawn over clusters of those numbers to create the artwork. n 1970, there’s a remarkable sequence where Cohen uses code to draw these shapes, label the shapes with colors that he’s going to apply. He’s using some rules in the program to determine where the colors may go, which colors may or may not touch one another. Then, on the top right, you can see where he’s inked that in.

The final piece at the bottom there is an artwork that he’s created in acrylic. He’s thinking about, all the time as he’s doing this, as he’s moved to California, starting to work with the systems they had and give up more of his artistic practice. He started to formulate this idea of a program to co-create with him. He asked this question, what are the minimum conditions under which a set of marks, functions as an image? Boiling it all the way back to the very basics, at what point is this an image?

He created a program, and he called this program, AARON. He built machines that enabled AARON to do the drawing, as well as showing where things should go. One of these, interestingly, was a turtle. Jeremy famously immortalized on the packaging for the turtle that the BBC had for the BBC Micro. I grew up in the 1980s. There was a programming language called Logo, that you could run on the BBC Micro. You had a little digital turtle, and you gave it instructions like forward and left, and it would move around. Harold Cohen’s turtle moved around the floor of the gallery, and drew shapes and painted, which was very cool, I think. I love this unknown connection, until I did the research. AARON started out written in C.

Cohen rewrote it into Lisp because he got frustrated with C’s limitations. By the mid-1990s, Harold Cohen was giving the program the ability to paint for him or apply the paint for him as well. You can see it’s a bit splotchy, but this was, I think, from 1995. This was mind blowing to me, because just before Christmas, so around November last year, I went to an exhibition and a workshop with a lady called Licia He, who is a contemporary plotter artist. She has got an AxiDraw, but she has converted it not to drive a pen around, but to actually move around and dip a brush into paint and draw and apply the paint and ink to the paper.

That’s a bit more complicated than a pen, because you need to know how much paint to load up on the brush. You need to know how much you can apply before the paper goes soggy and wears through. You need to be able to wash the brush in between colors. I thought this was brilliant and brand new, and it is brilliant. Licia is a phenomenal artist, but not new. This has happened before. This is actually Licia’s setup that I saw.

If you Google, Harold Cohen, AARON, then you will find that AARON is often called an artificial intelligence. I think that’s a very interesting thing to stop and think about, given the current hype cycle that we’re caught up in. Cohen said this, “If what AARON is making is not art, what is it exactly, and in what other ways beyond its origin, other than its origin, does it differ from the real thing? If it is not thinking, what exactly is it doing?” You can go and look at Harold Cohen’s work. There is a small gallery called the Gazelli Art House over in Mayfair, just close to Green Park station. You can walk in, you can look at some of the original pieces of work. If you’d like to see his work with AARON, that’s currently on display in New York at the Whitney by the High Line, the American Art Museum there.

How to Be Creative

I’m going to have to skip through 20 years of computer art. I did want to give you that background about the 1960s and 1970s, and give you a sense of what’s been happening, probably outside of most of our domains. Let’s get back to being creative today, and round out by looking at how we can be creative. A pen plotter is not an inkjet. An inkjet or a laser printer is going to reproduce things for us. We know how annoying printers are today. It’s probably the most annoying pieces of technology in most people’s lives. You’re going to need a plotter. We’ve talked about several types. There’s BrachioGraph. There’s polargraph. There’s the x, y. You’re going to need some line art. There’s software that enables you to convert bitmap to vector art. There’s a piece of software called DrawingBot that’s written in Java. It’s desktop software. You’re going to need some materials.

Materials is where things get really exciting, because you can choose the different weights and textures and colors of paper. You can choose whether you want to use fineliners, or fountain pens, or sharpies, or metallic ink to whatever you create. As you build up those lines on your piece of art, as you watch the plotter move and build up the ink, every single pen stroke is unique. It’s a little bit unpredictable every time exactly how it’s going to lay down on the paper. This is a piece I made for the art show last November, and this is all printed on cotton rag paper. This is a Cistercian numeral. The Cistercian monks in the 13th century had this numeric system that enabled them to put any number as a single character like this, depending on where the lines appeared on the stave. This number is 1984. I drew this using a plotter, using sepia ink on this cotton rag paper. The cotton rag paper is not even.

A plotter really wants your surface to be completely flat. This one requires a bit of babysitting, because if you’re not fully flat, then you might end up dragging, and things like that. It’s really quite an interesting, tangible process. Growing up in the 1980s, I occasionally saw a plotter in an office or at school, but then they fell out of use, and along came inkjets and laser printers, and we all got them at home. That means nobody wants them anymore, so you can get them on eBay. This is my current project. It’s a Roland DXY-1100. It’s an A3 plotter, and it takes up to eight different colors of pen. You need to figure out how to plug it into your modern computer. This uses a 25-pin serial interface. You need to wire that up to a USB.

Somebody’s fortunately written a Python library that lets you talk to this, which is very handy. The little adapters on the left-hand side are purple. I’ve had to print adapters for the pens. This was the first example of the big ink manufacturers trying to lock us in to their devices, so that you got these tiny, little stubby pens that would only go in these plotters. Now I’m 3D printing my own adapters for modern pens.

Code = Art

Am I a technologist, am I a historian, or am I an artist? I’m a coder. I write code. I’m sure many, if not most of us, do the same. Code can transform art. Code can create art. Code can be art. Computers and technology have, since the ’60s, at least, been at the heart of tension in our society, and certainly with the art world. The nature of art, I think, to me, is to comment on that tension and the things that we see in our society and the experiences we have. Isn’t tension an element of art? There’s a link, andypiper.url.lol/wita. It’s just a simple little page. There’s a lot of links on there, if you want to go and explore any of the elements I’ve spoken about.

Conclusion

This is something I plotted out. It sits on the wall inside the studio. It’s Georg Nees’ artist statement from his 1972 portfolio in Montreal. “Computer art is sort of artificial genetics. Its DNA is on punched cards. Information originally emanating from the brains of programmers, yet to be mutated and augmented in complex ways by dice-gaming computers, emerging finally into the environment of rejecting, and/or, as one may observe, promoting culture”. We live in an amazing time. Jeremy said, we think we’re in the technology business, but we’re actually in the people business. I love that. We are physical beings. That’s really important as well.

We live at a time of ephemerality, and fleeting digital moments. A single line of code might not be unique, a few lines of code might be more unique, but every stroke of a pen or a brush is unique. Making something digital is fun, but mostly ephemeral. Making something tangible and physical has the opportunity to endure. Go create. Go take the opportunity of all of the open-source software and hardware and beautiful code that we have, and make something wonderful.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Google Releases PaliGemma 2 Vision-Language Model Family

MMS Founder
MMS Anthony Alford

Article originally posted on InfoQ. Visit InfoQ

Google DeepMind released PaliGemma 2, a family of vision-language models (VLM). PaliGemma 2 is available in three different sizes and three input image resolutions and achieves state-of-the-art performance on several vision-language benchmarks.

PaliGemma 2 is an update of the PaliGemma family, which was released in 2024. It uses the same SigLIP-So400m vision encoder as the original PaliGemma, but upgrades to the Gemma 2 LLM. The PaliGemma 2 family contains nine different models, combining LLM sizes of 2B, 9B, and 27B parameters with vision encoders of 224, 448, and 896 pixels-squared resolution. The research team evaluated PaliGemma 2 on a variety of benchmarks, where it set new state-of-the-art records, including optical character recognition (OCR), molecular structure recognition, and radiography report generation. According to Google:

We’re incredibly excited to see what you create with PaliGemma 2. Join the vibrant Gemma community, share your projects to the Gemmaverse, and let’s continue to explore the boundless potential of AI together. Your feedback and contributions are invaluable in shaping the future of these models and driving innovation in the field.

PaliGemma 2 is a combination of a pre-trained SigLIP-So400m image encode and a Gemma 2 LLM. This combination is then further pre-trained on a 1B example multimodal dataset. Besides the pre-trained base models, Google also released variants that were fine-tuned on the Descriptions of Connected and Contrasting Images (DOCCI) dataset, a collection of images and corresponding detailed descriptions. The fine-tuned variants can generate long, detailed captions of images, which are “more factually aligned sentences” than those produced by other VLMs.

Google created other fine-tuned versions for benchmarking purposes. The benchmark tasks included OCR, table structure recognition, molecular structure recognition, optical music score recognition, radiography report generation, and spatial reasoning. The fine-tuned PaliGemma 2 outperformed previous state-of-the-art models on most of these tasks.

The team also evaluated performance and inference speed for quantized versions of the model running on a CPU instead of a GPU. Reducing the model weights from full 32-bit to mixed-precision quantization showed “no practical quality difference.” 

In a Hacker News discussion about the model, one user wrote:

Paligemma proves easy to train and useful in fine-tuning. Its main drawback was not being able to handle multiple images without being partly retrained. This new version does not seem to support multiple images as input at once. Qwen2vl does. This is useful for vision RAG typically.

Gemma team member Glenn Cameron wrote about PaliGemma 2 on X. In response to a question about using it to control a robot surgeon, Cameron said:

I think it could be taught to generate robot commands. But I wouldn’t trust it with such high-stakes tasks…Notice the name of the model is PaLM (Pathways Language Model). The “Pa” in PaliGemma stands for “Pathways”. It is named that because it continues the line of PaLI  (Pathways Language and Image) models in a combination with the Gemma family of language models.

InfoQ previously covered Google’s work on using VLMs for robot control, including Robotics Transformer 2 (RT-2) and PaLM-E, a combination of their PaLM and Vision Transformer (ViT) models.

The PaliGemma 2 base models as well as fine-tuned versions and a script for fine-tuning the base model are available on Huggingface. Huggingface also hosts a web-based visual question answering demo of a fine-tuned PaliGemma 2 model.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Nvidia Announces Arm-Powered Project Digits, Its First Personal AI Computer

MMS Founder
MMS Sergio De Simone

Article originally posted on InfoQ. Visit InfoQ

Capable of running 200B-parameter models, Nvidia Project Digits packs the new Nvidia GB10 Grace Blackwell Superchip to allow developers to fine-tune and run AI models on their local machines. Starting at $3,000, Project Digits targets AI researchers, data scientists, and students to allow them to create their models using a desktop system and then deploy them on cloud or data center infrastructure.

Nvidia Grace Blackwell brings together Nvidia’s Arm-based Grace CPU and Blackwell GPU with the latest-generation CUDA cores and fifth-generation Tensor Cores connected via NVLink®-C2C. A single unit will include 128GB of unified, coherent memory and up to 4TB of NVMe storage.

According to Nvidia, Project Digits delivers up to 1 PetaFLOP for 4-bit floating point, which means you can expect that level of performance for inference using quantized models but not for training. Nvidia has not disclosed the system’s performance for 32-bit floating point or provided details about its memory bandwidth.

The announcement of Project Digits made some developers ponder whether it can be a preferable choice to an Nvidia RTX 5090-based system. In comparison to a 5090 GPU, Project Digits has the advantage of coming in a compact box and not requiring the huge fan used on the 5090. On the other hand, the usage of low-power DDR5 memory on Project Digits seems to imply a reduced bandwidth compared to the 5090’s GDDR7 memory, which further hints at Project Digits being optimized for inference. However lacking final details, it’s hard to understand how the two solutions compare performance-wise.

Another interesting comparison that has been brought up is with Apple’s M4 Max-based systems, which may pack up to 196GB of memory and are thus suitable to run large LLMs for inference. Here, there seem to be more similarities between the two systems, including the use of DDR5X unified memory, so it seems Nvidia is seemingly aiming, among other things, to provide an alternative to that kind of solution.

Project Digits will run Nvidia’s own Linux distribution, DGX OS, which is based on Ubuntu and includes Nvidia-optimized Linux kernel with out-of-the-box support for GPU Direct Storage (GDS). Nvidia says the first units will be available in May this year.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


AJ Styles comments on his recovery from injury, Chelsea Green wants Matt Cardona in WWE

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

– On the October 4th 2024 edition of Smackdown, AJ Styles suffered an injury in a match against Carmelo Hayes. Styles then revealed that he had suffered a “mid foot ligament sprain”. A fan recently asked Styles if he could provide an update on his injury and the response wasn’t as positive as we would have all hoped.

Chelsea Green expressed her desire to see her husband, Matt Cardona, return to WWE. She said “I want to see Matt in WWE, honestly more than anything else, anything else that I even could want out of my career. I feel guilt because first of all, he supports me like no other. He’s so happy for me. He watches everything I do. He’s at shows when I’m winning championships. But at the end of the day, I go home and I know that this was his dream. I joke with you about the fact that I googled how to be a WWE Diva, but he didn’t. He literally came out of the womb wanting to be a WWE Superstar. So I just want him so badly to come back and have that final closure, that ending that he so deserves as, I mean, he was with WWE for a very, very, very long time. I think the fans want it too. Like, I don’t want to speak for anyone, but I just, I get a lot of people asking, you know, when’s he coming back? When’s he coming back? Gosh, I would love, love, love to see him back.“

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


How to create realistic, safe, document-based test data for MongoDB – Security Boulevard

MMS Founder
MMS RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

An Overview of MongoDB

MongoDB is a NoSQL database platform that uses collections of documents to store data rather than tables and rows like most traditional Relational Database Management Systems (RDBMS). It derives its name from the word ‘humongous’ — ‘mongo’ for short. It is an open source database with options for free, enterprise, or fully managed Atlas cloud licenses.

Development on MongoDB began as early as 2007 with plans to release a platform as a service (PaaS) product; however, the founding software company 10gen decided instead to pursue an open source model. In 2013, 10gen changed their name to MongoDB to unify the company with their flagship product, and the company went public.

MongoDB was built with the intent to disrupt the database market by creating a platform that would ease the development process, scale faster, and offer greater agility than a standard RDBMS. Before MongoDB’s inception, its founders — Dwight Merriman, Kevin P. Ryan, and Eliot Horowitz — were founders and engineers at DoubleClick. They were frustrated with the difficulty of using existing database platforms to develop the applications they needed. MongoDB was born from their desire to create something better.

.ai-rotate {position: relative;}
.ai-rotate-hidden {visibility: hidden;}
.ai-rotate-hidden-2 {position: absolute; top: 0; left: 0; width: 100%; height: 100%;}
.ai-list-data, .ai-ip-data, .ai-filter-check, .ai-fallback, .ai-list-block, .ai-list-block-ip, .ai-list-block-filter {visibility: hidden; position: absolute; width: 50%; height: 1px; top: -1000px; z-index: -9999; margin: 0px!important;}
.ai-list-data, .ai-ip-data, .ai-filter-check, .ai-fallback {min-width: 1px;}

As of this writing, MongoDB ranks first on db-engines.com for documents stores and fifth for overall RDBMS platforms.

Being document-based, Mongo stores data in JSON-like documents of varying sizes that mimic how developers construct classes and objects. MongoDB’s scalability can be attributed to its ability to define clusters with hundreds of nodes and millions of documents. Its agility results from intelligent indexing, sharding across multiple machines, and workload isolation with read-only secondary nodes.


Challenges Creating Test Data in MongoDB

While the ease of creating documents to store data in MongoDB is valuable for development purposes, it entails significant challenges when attempting to create realistic test data for Mongo. Unlike traditional RDBMS platforms with predefined schemas, MongoDB functions through JSON-like documents that are self-contained with their own individual definitions. In other words, it’s schema-less. The elements of each document can develop and change without requiring conformity to the original documents, and their overall structure can vary. Where in one document, a field may contain a string, that same field in another document may have an integer.

The JSON file format itself introduces its own level of complexity. JSON documents have great utility because they can be used to store many types of unstructured data from healthcare records to personal profiles to drug test results. Data of this type can come in the form of physician notes, job descriptions, customer ratings, and other formats that aren’t easy to quantify and structure. What’s more, it is often in the form of nested arrays that create complex hierarchies. A high level of granularity is required to ensure data privacy when generating test data based on this data, whether through de-identification or synthesis. If that granularity isn’t achieved, the resulting test data will, at best, fail to accurately represent your production data and, at worst, leak PII into your lower environments.

A high degree of privacy paired with a high degree of utility is the gold standard when generating test data based on existing data. Already it can take days or weeks to build useful, safe test data in-house using a standard RDBMS. The variable nature of MongoDB’s document-based data extends that in-house process considerably. It’s the wild west out there, and you’d need to build a system capable of tracking every version and format of every document in your database to ensure that nothing is missed—a risky proposition.

It’s also worth noting that there aren’t many tools currently available for de-identifying and synthesizing data in MongoDB. This speaks to the challenges involved—challenges we’re gladly taking on.

Solutions for Mimicking Document-based Data with Tonic

Safely generating mock data in a document-based database like MongoDB requires best-in-class tools that can detect and locate PII across documents, mask the data according to its type (even when that type varies within the same field across different documents), and give you complete visibility so you can ensure no stone has been left unturned.

De-identifying a MongoDB collection in Tonic

At Tonic, we provide an integrated, powerful solution for generating de-identified, realistic data for your test environments in MongoDB. For companies working with data that doesn’t fit neatly into rows and columns, Tonic enables aggregating elements across documents to realistically anonymize sensitive information while providing a holistic view of all your data in all of its versions. Here are a few ways we accomplish this goal:

  • Schema-less Data Capture: For document-based data, Tonic builds a hybrid document model to capture the complexity of your data and carry it over into your lower environments. Our platform automatically scans your database to create this hybrid document, capturing all edge cases along the way, so you don’t miss a single field or instance of PII.
  • Granular NoSQL Data Masking: With Tonic, you can mask different data types using different rules, even within the same field. Regardless of how varied your unstructured data is, you can apply any combination of our growing list of algorithm-based generators to transform your data according to your specific field-level requirements.
  • Instant Output Preview: After applying the appropriate generators to your data, you can preview the masked output directly within the Tonic UI. This gives you a complete and holistic view of the data transformation process across your database.
  • Cross-database support: Achieve consistency in your test data by working with your data across database types. Tonic matches input-to-output data generated across your databases from MongoDB to PostgreSQL to Redshift to Oracle. Our platform connects natively to all of your database types to consistently and realistically de-identify your entire data ecosystem.
  • De-identify non-Mongo NoSQL data too: You can use Tonic with Mongo as a NoSQL interface, to de-identify NoSQL data stored in Couchbase or your own homegrown solutions. By using MongoDB as the go-between, Tonic is able to mask a huge variety of unstructured/NoSQL data.

We’re proud to be leading the industry in offering de-identification of semi-structured data in document-based databases. Are you ready to start safely creating mock data that mimics your MongoDB production database? Check out a recording of our June launch webinar, which includes a demo of our Mongo integration. Or better yet, contact our team, and we’ll show you the ropes live.

*** This is a Security Bloggers Network syndicated blog from Expert Insights on Synthetic Data from the Tonic.ai Blog authored by Expert Insights on Synthetic Data from the Tonic.ai Blog. Read the original post at: https://www.tonic.ai/blog/how-to-create-realistic-safe-document-based-test-data-for-mongodb

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Podcast: Building Green Software with Anne Currie and Sara Bergman

MMS Founder
MMS Anne Currie Sara Bergman

Article originally posted on InfoQ. Visit InfoQ

Transcript

Thomas Betts: What does it mean to be green in IT? That’s the question that begins chapter one in Building Green Software. Today I’m joined by two of the book’s authors. Anne Currie has been a passionate leader in green tech for many years. She’s a community co-chair of the Green Software Foundation, as well as lead of the GSF’s Green Software Maturity Matrix Project. Sara Bergman is a senior software engineer at Microsoft. She’s an advocate for green software practices at Microsoft and externally. She’s an individual contributor of the Green Software Foundation. Unfortunately, their co-author, Sarah Hsu was unable to join us today, Anne and Sara, welcome to The InfoQ Podcast.

Anne Currie: Thank you very much for having us.

Sara Bergman: Thank you for having us. Excited to be here.

Thomas Betts: I guess I should say welcome back Anne. You’ve been on before, but it’s been three or four years probably. You talked with Wes and Charles in the past, so welcome back.

What does it mean to be green in IT? [01:10]

Thomas Betts: So, let’s start with that opening question from your book: what does it mean to be green in IT? Anne, you want to start?

Anne Currie: It’s an interesting question. What does it mean to be green in IT? The O’Reilly book came out in April and is going well so far. People generally like it. It gives a good overview of what we’ll need to do. The good thing about writing a book is you learn a lot about the subject while you are writing the book. Going into the book, so when we started on chapter one, I think our view of what is green software is software that produced the minimum amount of carbon to operate it. So, without dropping any of the functionality, without dropping any of the SLAs, it’s produced less carbon, and that usually means it’s more efficient, uses fewer servers to do the job, and/or it runs when the sun’s shining or the wind’s blowing, so there’s more renewable power available.

But I think by the end, my thoughts on it had changed quite a lot. And nowadays I tend to think that green software is software that is optimized to run on renewable power because that is the power of the future. That is the only way we’re really going to get green in the future. So, all of those things, being more efficient, being more able to shift and shape work to when the sun is shining or the wind is blowing, it reduces carbon, but it is what is required to run on renewable power. That’s a little bit of a confusing answer, so Sara, what do you think?

Sara Bergman: I think it’s good… It explains also what our minds went through when we wrote the book, so that’s good. Or I don’t know, should people have a closer insight to our minds? I don’t know if that would be good or confusing, but anyway, no, I agree with your answer. I do think, because we’re on a journey, the energy transitions are moving from these fossil fuel, always-on energy sources that have dominated the energy sector, well, they’re going away slowly and being replaced by renewables. We aren’t quite there yet though, so it still matters. The other things still matters. Energy efficiency matters. It’s going to matter less, I dare say, once the energy transition is complete.

Then this shifting capabilities are going to come out in full swing. I also think hardware efficiency is fascinating, so that means doing more with less, so being very mindful of the carbon debt, and that lands in your hand when you get a shiny new device. And there are many ways we can do this, and that people are already doing it.

Fundamental principles of green computing [03:36]

Thomas Betts: I think the two of you hit on a couple different themes, and maybe we should dive into these and explain what the terms mean. So, we talked a little bit about just energy efficiency overall, but also I think, Sara, you mentioned hardware and hardware efficiency, and there’s an aspect of carbon awareness that you discuss in the book. So, what are those fundamental principles of green computing?

Energy efficiency [03:57]

Anne Currie: Well, shall I do efficiency and you do hardware efficiency and then we’ll both talk a little bit about carbon awareness? There are specialist subjects, so carbon efficiency is about producing the same functionality with fewer watts required, which really means with fewer CPU cycles because CPU is very correlated with energy use in servers. So, a more efficient system does the same stuff for you… well, uses less power to do it. So, that’s incredibly useful at times when the sun isn’t shining and the wind isn’t blowing. When you’re running off non-renewables, when you’re running off fossil fuels, you want to be using less power to produce the same output. And there are various ways you can do it.

You can do it through operational efficiency, so just being a bit cleverer about the way you use systems so they’re not, for example, over-provisioned. And you can also do it through code efficiency, so not running 10 lines of code when one would do. That’s a much more controversial and difficult subject. We’ve got a whole chapter on it in the book, but in addition to that, we’ve got hardware efficiency.

Hardware efficiency [05:03]

Sara Bergman: Yes, hardware is… In case you haven’t thought about it’s a very labor-intensive and carbon-intensive thing to produce because minerals are mined all over the world, shipped to other places, assembled sometimes in very energy-intensive ways before they land in your hand or in your server hall or in your closet where you keep servers, I don’t know. I don’t judge. But it’s significant carbon debt, and that’s already paid. Once you get it and it’s brand new, you’ve paid the substantial… or the environment’s paid a substantial cost for that to happen. So, we want to be mindful of that. One of the best things that we do is to hold onto it for longer, so that we amortize this cost over time and prolong the new additional cost of buying a new shiny thing. And if you are a software person, I do identify as a software person, it’s easy to say, “Well, that’s not my problem. It sounds like a hardware person’s problem”.

And they are certainly aware of it, and certainly thinking about it and working on it, but we as software people are not off the hook because the type of software that we run highly impacts the type of hardware that we have. If you just look at any enterprise software today, you could not run it on a machine that’s 30 years old, for example, because they are massively improved. So, they kind of lockstep. We get better machines with more complicated software and so forth, so forth. Also on the client side, and now I’m thinking PCs, smartphones, smart TVs, for example, there we see that actually the manufacturing cost in terms of carbon is much higher than the use time. So, it’s very, very important to be mindful of that cost. And you can do that in multiple ways and that’s a whole chapter in itself, but I think that’s the short dish

Anne Currie: Sometimes that carbon that you have to pay out upfront, a lot of the terms in this whole area are very confusing, sometimes it’s called embodied carbon, sometimes it’s called embedded carbon, but it is basically the same. It’s the theoretical amount of carbon that was emitted into the atmosphere when your device was created.

Sara Bergman: And it’s basically… I believe it’s a borrowed expression. It comes from the building industry. They use the same terminology.

Anne Currie: I didn’t realize that. That’s good. And one of the things that’s quite interesting is in the tech industry, when you’re talking about consumer devices like phones, it’s all embodied carbon. That’s a major issue. Huge issue with embodied carbon in phones and user devices, but in data centers, embodied carbon is not the main problem in data centers because the hardware data center is really well managed compared to how you manage your devices at home. You don’t keep them as highly utilized. You don’t keep them as long. In data centers, it’s mostly about the electricity used to power your servers, how that was generated. That’s the biggest cause of carbon emissions into the atmosphere.

Carbon awareness [08:05]

Sara Bergman: So, carbon awareness I think is the third thing we mentioned that we should maybe dive into. But it sounds so sci-fi. Whenever I mention it to an unsuspecting crowd, I’m like, “Yes, now we’re going to go off the deep end”. But it really isn’t super complicated. So, the thing is energy can be produced in different ways. We all know that. Most countries don’t have one source, they have multiple. So, the grid mix varies throughout the day, throughout the year and whatnot, over the course of years also as the grid evolves. And carbon awareness is adapting to this fluxuality. It’s about listening to real time. What energy do I have right now? Or listen to forecasts and then plan how you operate or what you serve to customers based on that.

Anne Currie: Yes, it’s interesting, almost everybody talks mostly about efficiency, and that’s a good place to start. It’s usually the easiest place to start. It’s not hard for people to just go through, and do fairly manual stuff to do with sorting out provisioning and cut their carbon emissions by 50%. That’s really not that difficult to do. Carbon awareness is more difficult. It requires you to design your systems, to architect your system, so you have parts of your system that, particularly CPU-heavy parts of your system, workloads that can be shifted in time that can be delayed to when the sun comes up or when the wind starts blowing again.

So, in many ways that’s very hard, but in the long term that is absolutely where all the win is because power that’s generated from photovoltaic cells, from wind farms, if you can use it directly, is going to be 10 times cheaper, even than fossil fuels. So the wind, if you can find a way to use it, is you will get electricity that’s 10 times cheaper, but it’s a longer term planning exercise to work out how you’re going to produce systems that can shift work, take advantage of that.

What can engineers do to make software more green? [09:59]

Thomas Betts: So, this is what happens with every time I talk about green software, you go into 17 different aspects that we all need to consider and it kind of becomes overwhelming. And it’s easy for someone to say, “Well, it’s embedded carbon in the iPhone, and so that’s clearly not my problem. I’m just an app developer, and so nothing I do makes a difference”, or, “It’s up to the data centers because they’re the ones that are actually plugged in, and are they running on fossil fuels or solar?” So, as the lowly developer or architect, how can you convince those people, no, what you’re doing does make a difference?

Anne Currie: Well, that’s good because it really does. AWS, Azure I think use the same terminology. They talk about it as a shared responsibility model. Even in the clouds where they’re putting tons of effort into providing services that… I know that you hear pros and cons of the cloud. And a big company is not one thing. There’s loads of things going on in their big company, but they do want to produce systems that will in the long run deliver zero carbon power because they want some of that 10X cheaper electricity too, thank you very much. So, they want us to produce systems that will run on that cheap green power, but as developers, you have to use those systems. You have to build your applications to run on those offerings that are effectively green platforms that can provide that.

If you don’t, if you run, if you just lift and shift from your data center, from your on-prem data center, for example, into the cloud and you just run on a dedicated server, you really don’t get that much benefit. You get some benefits, you don’t get that much benefit and the clouds will never be able to be green based on that. In order to be green, you have to start modernizing. You have to start using those services like spot instances, the right kind of instance type in order to get the benefit of the investment that they’re making, serverless, that kind of thing. Both sides are going to have to step up, and if either side doesn’t, you don’t end up with a very good solution.

Sara Bergman: Yes, and I think at the very least, use your consumer power. Do your research before choosing where you host your software. And continue to use your consumer power to apply necessary pressure on the topics you’re interested in. That goes for everything, of course. And don’t eat the elephant in one bite. Is that the saying?

Anne Currie: Oh yes, don’t try and eat the elephant in one bite. Yes, that is the saying.

Sara Bergman: Sorry, the two native English speakers need to correct the Swedish person. I don’t even know what the Swedish equivalent would be. Anyway, so don’t eat the elephant in one bite. Try to break it down. I think most people who have any piece of software, you know where your inefficiencies are, you know the thing you’re doing that’s kind of a little bit stupid, but you’re doing it anyways because of reasons. So, if you can find, for example, a sustainability argument to motivate doing it, I think you’re very, very likely going to be able to find other reason for doing it as well. Maybe this is also cheaper. Maybe this also makes your service more reliable. Maybe it increases performance, lower latency. So, bundle those arguments together. Sadly, most people, if you come to them and say, “Hey, we’re going to make this 5% greener”, they’re like, “Okay, you also have to deliver other value”.

Publicly talk about making software more green [13:23]

So, bundle it together with other things. And then what I… Maybe this is a… I don’t know if it’s a secret tip or not, but I think there’s a lot of goodwill to be had around publicly talking about this. Say you did something, you could see that it decreased your emissions and you go and talk about it. It can be a positive story. So, even if you are being carbon efficient, you’re delivering the same value for lesser carbon, that value can still be that you can talk about it externally. Say, “This is what we did”, and your customers will think, “Hey, what a nice company. I want to buy their stuff because I want to be green too”, because consumers want to be green.

Thomas Betts: Yes, a lot of corporate sustainability reports are talking about, “Here’s our carbon impact. We’re trying to go carbon neutral”. And it’s all playing with the numbers. There’s ways of manipulating it so that you look good in a report, but it is for that public campaign saying, “Hey, this is important to us and we’re making strides to reduce our footprint”. And it started with, “Okay, we’re just going to make sure the lights are turned off in the building”. And then it turns out everyone went home for COVID and they just never turned the lights back on in the building, so there, big win. But now it is looking at the, what is it, the second and third order effects. We’ve now put all of our stuff in the cloud, so we have to understand: what is the carbon impact of our software running the cloud as opposed to the data center where it’s a little easier to manage and measure?

Anne Currie: There is one downside of advertising and talking about what you’re doing, although I really like it when people do. It’s fantastic when they do. If you go into the ESG area, they are very keen on getting really accurate numbers. The trouble is that a lot of the times it’s really quite hard to get accurate numbers at the moment. And therefore they kind of go, “Well, if I can’t the accurate numbers, I’m not even going to start”. But there are significant knock-on benefits to being green even if you don’t have the number. There are things you can do that are kind of proxy measures for being green and are worthwhile in and of themselves.

So, I often say, “Well, your hosting bill isn’t a perfect proxy metric for carbon emissions, but it’s not bad and it’s a really good place to start”. The very basic things that you start with to clear up your systems, like turning off stuff that you’re not using anymore just has low value, or you are over-provisioned, get the right sizing everything, that has an immediate impact on your hosting bill.

And I always think that in terms of green, if it isn’t at least a double win, it’s probably not green. So, to start with, the double win is you cut your cost. And then as you get a bit more sophisticated, the way that you then cut your carbon again is usually by using more modern resilience techniques like autoscaling. Well, that has a triple win there. And you cut your hosting costs. You cut your carbon emissions, but most importantly you get a more resilient system because autoscaling is a lot more resilient than having a cold or even a hot backup somewhere. So again, you’re looking at that double win. And if you don’t have the double win, ironically it’s probably not green because it won’t scale.

Proxy measures for how green you are [16:26]

Thomas Betts: You mentioned there were several different proxies. So, the bill is the one, what else can we look at for some sort of measure to say, “Am I green?” beyond just the bill that I’m getting?

Anne Currie: Well, I do quite like the most obvious proxy other than the bill. The bill is the nice easy one want to start with. The more sophisticated one is performance, because as we said, each CPU cycle costs you time and it costs you money and it costs you carbon emissions. So, basically if you can improve the performance of your system, as Sara said, if you could just reduce those niggles, all those performance bugs that you kind of always meant to fix, but you never really got round to fixing, fix those performance bugs, you go faster and you reduce less carbon into the atmosphere.

Sara Bergman: The reason why that works is because you’re then cramming more work into your CPU in a shorter amount of time, meaning you might not need two instances or three instances. You can reduce down. Now, if you’re in the cloud, that’s great because someone else can use that hardware. If you’re on-prem, then you might say, “Oh, but I already have all those machines”. However you want to think forward as well, into the future in terms of capacity planning. You may not need to buy additional servers if you’re using the ones you have more frugally. So, you might be able to maybe even turn someone off and save them for a bit, so that’s why performance is a good metric. Don’t just speed things up for the sake of it. You speed things up so that you can reduce your footprint.

Green does not require the cloud [17:59]

Thomas Betts: I like that you brought up the on-prem versus in the cloud because I think there’s some mindset that, “Oh, I have to go to the cloud or we can’t be green”, and you can apply these same techniques in your own data center. It’s the same thing. You can run microservices yourself, you can host all those things if you need to on your own hardware. Plenty of companies do. And that’s the idea, that I shifted from… You said you don’t want to just do the straight lift and shift, that’s inefficient for all the reasons, but you also have to start thinking about: how do I design my system so that it can auto-scale up and down even though that’s on my hardware, and when it scales down, can something else run in its place, so that machine that’s still sitting there running is not completely useless, it’s actually producing some value?

Anne Currie: Well, or ideally it gets turned off. The ideal is you just turn the stuff off when you’re not using it. That is absolutely the dream. In fact, from what Sara said, obviously we’re completely in agreement on this, the way I quite like to say it is start with operational efficiency and use cost as your way of measuring that. If you could just run fewer machines, then it’s reflected in your energy bill if you’re on-prem or your hosting bill if you are in the cloud because you’ve right-sized or you’ve got rid of systems that are not doing anything. And then only after you’ve done that do you think about performance and effectively code level tuning. Because if you start doing the code level tuning before you’ve got really good at the operations, then you just end up with a whole load of machines that are only partially utilized. So, you’ve got to start with ops first. So, cost is your first proxy, performance is only a later proxy.

Sara Bergman: Yes, do it in the right order. And I think also maybe something that, if you’re not that into green software maybe, haven’t heard about them, the reason why we say, “Turn things off”, is because there’s something called energy proportionality about servers. And this was, I believe, first introduced in 2007 by two Google engineers who wrote this amazing paper, which I recommend people to read. But basically when you turn a server on, it consumes a baseline of power that’s pretty high. In their paper it’s around 50%. That was an aggregation over… Yes. Point is, it’s pretty high and it’s not doing anything.

And then as utilization increases, the curve goes up, but even if you just have an idle machine, you’re consuming a big chunk of energy, so you want to be mindful of that and have properly utilized machines or turning them off. And I think their paper originally was to propose we do fully energy proportional hardware so that the curve starts at zero and then increases with utilization. I haven’t seen any servers that can do that yet. That will be amazing, so looking forward to it. It’s very much a research area still, but we’re not there yet. So, idle servers consume lots of energy.

Anne Currie: Again, that’s something you can do on-prem, is that you can set yourself a target of saying, “Average across my entire estate, have I got between 50 and 80% CPU utilization?” So, above 80% is problematic because you end up with all kinds of other downsides of having all machines that are a little bit highly utilized, but on average, if you can achieve 50% across your estate, then that’s a great target.

Thomas Betts: Yes, I think some people will look at a metric and like, “Oh, my CPU is sitting there at like five or 10%, I’m doing great”. And that’s a mindset you have to get out of. Oh, if that’s the only thing that machine’s doing, that’s over-provisioned.

Anne Currie: Absolutely, yes.

Thomas Betts: You can get the same performance without running at 50%. You’re not introducing latency, you’re not introducing any other concerns, it’s just more of the smaller size CPU. And like you said, someone else can use that CPU for something else if you’re on the cloud.

Anne Currie: Yes, indeed. But on-prem too, it just requires a little bit of thought.

Thomas Betts: Right.

Sara Bergman: Yes, yes, you can definitely build the most sustainable data center ever yourself. There is no secret to the cloud. It takes a lot of engineering power.

Multi-tenancy is the efficiency secret sauce in the cloud [21:55]

Anne Currie: Well, there is one magic secret to the cloud, which you can do on-prem, but not very many businesses have the range of workloads to achieve it, which is multi-tenancy. So, the secret source of the cloud is that they’ve got such a mixture of tenants, and they all have different demand profiles, and they’re very good at making sure that you’re sharing a machine with people who have different demand profiles. So, maybe it’s the other side of the world, so they’re busy during the day and then you’ve got a different time of day. So, basically they’re saying, “Well, how do we align these demand profiles so that we never have a quiet machine and we never have an overloaded machine?” So, some companies can do that.

The most famous is probably Google. They’re one of the reasons why they’re very ahead on all of this stuff and a lot of the interesting thinking in this area came from Google was because from the get-go, from a very early stage, they were effectively their own multi-tenant. And they did quite a lot of work to become even more their own multi-tenant. And the classic is YouTube video. When they bought YouTube, that suddenly made them an excellent multi-tenant because YouTube has a very, very CPU-intensive workload associated with it, which is video transcoding and encoding. And they were always very careful. So, right from the start, there’s no SLA on that. They’ll say, “Do you know, could be done five minutes, could be done in five hours, could be done tomorrow. Just we’ll let you know when it’s done”.

So, that gives them something they can move around in time to make sure they get really high utilization on their machines. So, excellent example of finding something that effectively could be used to make themselves their own multi-tenant. So, one thing that any enterprise can do is if you can find that CPU-intensive task that you can… If you’re in the cloud, stick it in a spot instance. If you’re on-prem, you’re going to have to orchestrate that yourself in some way. Kubernetes does have a spot instance tool associated with it, so you can do it on-prem. But yes, the secret sauce of the cloud is multi-tenancy. It’s hard to manage otherwise.

Sara Bergman: Also Google has been doing… Their YouTube processing, the media processing have been carbon aware since 2021, I believe, which is another fun fact about them.

Shifting in time is better than shifting in space [24:12]

Thomas Betts: But that’s the thing, they have the incentive because they are at such a scale, yes, there’s the carbon benefit, there’s also the cost benefit for them. Because I’m not buying the GCP instance. Google is paying for that. Even though it’s one division of Google is paying another division of Google, like, “Please host this software for me”, there’s still a lot of costs. And so those big things that add up to multi-comma numbers, we can do this and we can get cheaper energy, like you said, if we can run this on solar somewhere else later, then let’s just do that. It also gives you the ability, I guess, to follow the sun.

We talk about that for software support, that we have people all around the globe so that we can support our users, but when you say, “I want to move this software”, it’s not I’m always going to move it to Sweden or Norway or somewhere, I’m going to move it to wherever it might make sense right now, so I don’t have to necessarily wait 18 hours. I could just pick a different place potentially?

Anne Currie: You can do, but I have to say, we are more keen on a shift in time than a shift in space. Well, it depends. If it’s something with a low amount of data associated with it, then shift it in place. There’s no problem. You’re not moving a great deal of data. But if you are moving a great deal of data, you’re often better off coming up with some cunning wheeze which will enable you to just do the work, either beforehand… Pre-caching is a big thing. You do things in advance, like a CDN does a whole lot of work in advance, or do it later. But yes, that just involves architects coming up with some clever plans for how they can do things early or late.

Sara Bergman: Because we do worry about network costs, like we’re spending carbon in the networks because, spoiler alert, networks consume a lot of energy. They have a lot of equipment that eats a lot of electricity. So, sending data on the internet for long distances, if it’s a lot of data, we’re spending carbon. So, then we’re worried that that is actually counter-weighting to the benefit you had of moving it to a greener green time zone. Because of that, again, you can’t necessarily easily see where your network is consuming electricity. That’s just harder the way the internet works. Also, there are, of course, a lot of, especially in Europe I want to say, legal rules around where you can move data, and you have to be mindful of those things as well.

Various viewpoints and roles for green software [26:29]

Thomas Betts: Well, so you’ve brought up a couple of different things, the two of you in the last minute. So, Sara was talking about networks. Anne said, “Architects have to come up with these clever solutions”. So, the InfoQ audience is engineers, architects, DevOps, SREs. We have all these different personas. And so think about those different people and the different viewpoints, how would a software engineer think about changing their job to be more green versus an architect versus say a DevOps engineer or a platform engineer that’s responsible for deploying all this stuff and setting things up?

Anne Currie: I would say that you start with the platform. So, the platform engineers talk to the software engineers talk to the architects, and come up with a platform that’s going to work well for what they do. And it has to be a platform that in the long run the platform creators, the people who support that platform, are committed to it being a platform for the future, platform that will be able to run on top of renewable power. So, just for example, Kubernetes are working on all of this, but the clouds… There are some things that the clouds do that are the green platforms, and they talk about as green platforms, and some things that they kind of are clearly not and therefore they don’t really mention green associated with them.

So, this is again where Sara was saying you have to use your consumer power. You have to ask, you have to say, “Is that a green platform? Is that a platform that’s just going to be able to carry me all the way through to 24/7 carbon-free electricity in 2030 or 2035 or is it not? Is it going to be a dead end for me when I’m going to have to re-platform at some later point?” So, the first thing is make sure you choose the right platform. And then after that, and this is where the techies come in, where the architects come in, it’s using that platform as it was intended to be used. So, using that platform really well, which means reading all the docs, learning how to do it, and grasping that platform. Well for a platform engineer now, you need to step back and say, “Is this a green platform?” And if it’s not, what’s my plan for moving to one?

Well-architected frameworks from cloud providers [28:47]

Sara Bergman: And one place to start could be all three major cloud providers have their own well-architected frameworks and they all have a sustainability section within that, so that can be a good place to get going. I think all three of them could be a bit more comprehensive, but I think they’re a very good start. So, no matter who you use, that’s a good starting point.

Anne Currie: And it is well worth saying that have a look at the well-architected framework sustainability section for all of them, and that will give you information. You can ask your account manager, but quite often they’ll tell you stuff that’s not so useful because they don’t always know.

A lot of them have a sustainability architectural expert and you can ask to speak to them, and they will know what the green platforms are. But if you just say, “Is this a green platform?” quite often they will say, “Oh, this is already a green platform because we offset everything”. So, this is a big thing at the moment. A lot of them are saying, “Oh, well we offset everything, so we’ll pay somebody else to..”. But that is not enough. Carbon neutral is not carbon zero. It’s not getting us there. It’s the goal of 10 years ago. And all too often, when you ask a hyperscaler these days, “Is it a green platform?” they’ll say, “Yes, because we offset everything”, and that is not sufficient.

Thomas Betts: Yes. I still hear companies talking about, “We’re going carbon neutral”, or, “We made it to carbon neutral”, and it’s like, “Okay, and next”. So, the goal after that is carbon zero, and that’s for the data centers, that’s for software, that’s for hopefully every company eventually. It might not be tomorrow, but it’s a goal. You can have these long-term goals.

Anne Currie: Yes, indeed. Carbon neutral, if you’re not already there, you are quite behind the times, but it’s the first step.

Sustainable AI and LLMs [30:29]

Thomas Betts: Let’s go to something that’s not carbon neutral: machine learning, AI, LLMs. They get a lot of press of power consumption, data center consumption, CPU. It’s like the new Bitcoin mining, “I’m going to go and train an LLM”. Some of these numbers are hard to come by. I don’t know if you guys have looked into how much it is. And there’s two aspects. There’s the training of the LLMs and the AI models, and then there’s hosting them, which can be less, but if they get used inappropriate, like any software they can also have significant consumption sides to that. So, where are we at with that? What does it look like? Do you have a sense of the impact? And then what do we do about it?

Anne Currie: So, I can talk to where we are and where I think we’re likely to be. Sara is the expert in techniques to reduce the carbon associated with creating models or running inference. But from my perspective, yes, there is a great deal of power being used by LLMs at the moment. Do I think that’s a problem? Not actually. I’m going to be controversial, and say no, not really because it’s very much at the beginning. And it feels to me like it’s a gold rush, and all hardware is being used like crazy and nobody’s waiting for the sun to shine, Everybody’s running everything all the time, that kind of stuff. But fundamentally AI, both training and inference, has the potential to be something that could be shifted to when and where the sun is shining or the wind is blowing. There’s loads of potential asynchronicity about those loads.

They’re very CPU-intensive, and right now they tend to be urgent, but that is because we’re really at this stage terrible at AI. We don’t know how to do it. We don’t know how to operate it. Inference, for example, is often described as something that has to run instantly, that maybe the training you could do in advance, but inference is instant. But we’ve got tons of experience of things that need to be done instantly that are not done instantly on the internet.

For example, I mentioned CDNs earlier, content delivery networks. When you want to watch Game of Thrones, it’s bit of an old school reference now, but in the olden days when you want to watch Game of Thrones, you say, “I want to watch Game of Thrones now, now. It has to be delivered to me”. And back in the very old days, we would’ve said, “Oh, no, it has to be delivered from the US to the UK now instantly. That’s going to be impossible. Oh my goodness me, it’s going to overload the internet and everything’s going to collapse”.

But we learned that that wasn’t the way we needed to do things. What we would do is we’d say, “Well, everybody wants to watch a Game of Thrones”, so we move it to the UK overnight and we sit it on a CD not very far away from where you live. And it seems like getting it from the US, but you’re actually getting it from 100 yards away and it’s very quick, and it’s kind of a magic trick. You do things in advance. It’s a form of pre-caching. Because humans are not that different. They want to watch the same shows, and they want to ask the same questions of AI as well, so there’s tons of opportunity for us to get good at caching on inference, I think. And I think that’s where we’ll go.

But yes, so at the moment it’s horrendous, but I don’t think it’s as bad as Bitcoin. Well, it’s similar to Bitcoin at the moment in many ways, but it’s not innately as bad as Bitcoin. Bitcoin is purely about using power. And this is actually about achieving a result. And if you can achieve that result using less power, we will. Sara, this is actually your area rather than mine.

Sara Bergman: Yes, we seem to be talking about this a lot lately. I guess everyone in our industry is talking a lot about it lately. I think one of the good things is when I started my journey into how do we build greener software, a lot of the research that was out there was actually AI-related. And specifically on the training side, they’re very smart people who’ve been thinking about this for a long time, which is good. And I think especially on the training, because if you go a few years back, that was maybe the bigger source of emission or at least it was a bit easier to do research on perhaps, so there is a bias there. So, there are tons of things we can do. We can time shift obviously. There are a bunch of methods for making the model smaller. There are tons of research paper that describes techniques on how you can achieve that.

You can also think about on the inference side how you deploy it. If you you think federated learning, are you utilizing your edge networks with moving data and the models closer to your users? And all these things that we had talked about before, like how do you do the operational efficiency for AI workloads, is not super foreign to… You use the same tools, maybe screw them in slightly different ways, but it’s very similar. So, we’re in good hands there. I think we will learn over time to apply these to the workloads, because, and this is something Anne and I have talked about before, we need to make it green. There isn’t an option where we say, “Use AI or be green”, because where we are at this point in time, that’s not a fair choice. We need to make AI green. That’s the thing. And we will definitely, as an industry, get there, I believe.

Anne Currie: Yes, I agree. It’s amazing how often we run across people going, “Oh, well, yes, the only green AI is no AI. You turn the AI off”. Well, there’s only one thing we can be really, really certain of in the tech industry, which is that AI is coming and the energy transition is coming. Both of those things need to be done at once. We cannot pick and choose there.

Sara Bergman: Yes. And I think also what’s interesting, because now we have said AI a lot, but I think what we all have meant is generative AI and LLMs. And it’s really only one branch of AI. I think we’re going to see more approach, and that they will maybe have completely different characteristics that will… And I’m curious to see how that goes. But I think specifically with large language models, we’re seeing the rise of small language models, and they’re often outperform… So, a specialized small language model often outperforms a more general large language model on specific tasks they’re trained for, which is obvious when you think about it. And so I think we’re also going to see more of that. Then they also have the benefit they can be closer to you, maybe even on your device because they’re small. It’s right there in the name.

Thomas Betts: Yes. One of the techniques that is often used for getting the large language model to be better is to throw a larger prompt at it. And it turns out if you have a small language model that’s basically trained on what you’re prompting with every time, the analogy I liked was you can send someone to med school or whatever and they’ve spent decades going to school, or somebody wants to be a CPA and be an accountant, they only need to go to a two-year program and they learn how to be an accountant. I need the accounting LLM for my accounting software. I don’t need to ask it all about Taylor Swift. If it knows about Taylor Swift, it’s not actually a good language model for me to have in my software, and now it’s this larger thing that I’m not fully utilizing, just like the larger server that I’m not fully utilizing.

So, I like the idea of get something smaller, and you’re going to get better benefits of it. So, it kind of plays into your… There’s these co-benefits. So, smaller, faster, AI is greener AI, is greener software.

Thomas Betts: So, I have to call out, there was a talk at QCon San Francisco just last month about GitHub Copilot, and it talked about those same ideas, that they were… It was specifically talking about how to make the requests very quick because they needed to act like this is software running on your machine, in your IDE. And to make that happen, it has to be making local network calls.

So, they have it calling the local Azure instance that has the local AI model or LLM model, LLM, because they can’t afford to go across the pond to look it up because it would just take too long and then the software wouldn’t work. So, all those things are being discovered as we’re finding these large use cases for things like GitHub Copilot.

The Green Software Maturity Matrix [38:28]

Thomas Betts: I do want to give you a chance to talk about the Green Software Maturity Matrix. I mentioned a little bit in your bio, Anne, and so tell us, what is that?

Anne Currie: Well, actually that harks back to something we talked about earlier today in the podcast, that it’s very, very complicated. This is something that Sarah and Sara and I really hit writing the book. And we knew from going out and talking at conferences and things about this stuff, there’s lots of different things that you can do to be green, and there’s lots of different ways you can approach it. And there are some ways to approach it that are just vastly more successful than others. And we spent quite a lot of time thinking in the book about how we could communicate the process that you should use. So, don’t try and leap to the end. Don’t try and be Google in one leap because you’ll fail and then it won’t be green. It might be… If you set your goals too high, your bar too high, you’ll fail to achieve it, and then you’ll go, “Well, it’s impossible to be green”.

And that isn’t the case. You just needed to do it in a more regulated way. So, we put a… A maturity matrix is a project… It’s quite an old fashioned project management techniques that basically says, “Well, what does good look like?” But also more importantly it says, “But you’re not going to get there in one go. You actually do need to get there in..”. A maturity matrix rather arbitrarily chooses five steps to get there. And if you do it through those five steps, you will almost certainly get there. If you don’t do it through those five steps, if you try and skip a step, then you will fail miserably.

So, the maturity matrix was basically step one is just want to do it at all. Step two is basic operational efficiency, so turning off things that are not in use, right-sizing. Step three is high-end operational efficiency. This is like modern operational techniques, so autoscaling, using the right services properly. And only at four and five of the matrix do you start doing things like rewriting things in Rust. And ideally most enterprises would never have to achieve four or five because they will adopt a platform at level three that will do four and five for them. So, that’s kind of like the idea. It should be, if you follow it, a very successful way to get modern systems that achieve 24/7 carbon free electricity or can run on 24/7 carbon free electricity by stopping you from trying too hard.

Sara Bergman: And it also, I think, nicely separates the vast area of sustainability into swim lanes. So, depending on your role, you can start with the swim lane that feels most natural to you. If you’re a product manager, you start in the product. You don’t have to start in one of the other ones. And you sort of go from there. And I think that’s also a good mental model for people because sometimes when something is new, it can be a bit overwhelming. And spreading it out, not just as five layers, but also nine different categories, really shows that, “Okay, you start with one thing. Just do this cell and then you can sort of expand from there. Just pick any cell in the first level and just start there, and sort of take it step by step”.

I always these days start my conference talks with that slide because I think it’s a good way to set the expectation, like, “This area is big. Today we’re going to focus on this because I can’t reasonably talk about everything about sustainability in IT in one hour or how long we have. So, we’re going to focus on this. Just know there is more stuff”. It also helps prevent the question, “What if I just rewrite everything in Rust?” like Anne said. Because somehow we always get asked that. Somehow I can never escape that question.

Anne Currie: Everybody wants to rewrite their systems in Rust or C. Why? Why?

Sara Bergman: The fan base is massive. Kudos to the Rust folks because you have some hardcore fans out there.

The Building Green Software book club [42:30]

Thomas Betts: We just had a Rust track at QCon, so yes, it’s popular. So, this has been a great conversation. I think we’re running a little long. I want to give you guys a minute to give a little plug for the Building Green Software Book Club. I attended the first meeting last month, I think it was. What is the book club and how can people find out more about it?

Sara Bergman: So, we’re aiming to do once a month, one chapter of the book. We have done one, so so far we’re on track. I think there will be maybe a bit of a break now over the holidays, but we’ll come back on track. You can find on our LinkedIn pages, we have a link out. You can sign up. It’s on Zoom. It’s very casual. We chat. You can ask us questions. As you might have noticed in this podcast, we’re happy to talk about stuff. So, even if there are no questions, we’ll talk anyway. People enjoyed it last time. I enjoyed it. It was very cozy. It was very nice to see people face-to-face virtually. Yes.

Thomas Betts: Well, I’ll be sure to add links to the book, to the Maturity Matrix, to the book club in the show notes. Anne and Sara, thanks again for joining me today.

Anne Currie: Thank you again for having us. It was lovely.

Sara Bergman: It was really awesome. Thank you for having us.

Thomas Betts: And listeners, we hope you’ll join us again soon for another episode of The InfoQ Podcast.

Mentioned:

About the Authors

.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Java News Roundup: WildFly 35, Jakarta EE 11 Update, Java Operator SDK 5.0-RC1

MMS Founder
MMS Michael Redlich

Article originally posted on InfoQ. Visit InfoQ

This week’s Java roundup for January 6th, 2025 features news highlighting: the release of WildFly 35; Java Operator SDK 5.0-RC1; Spring Framework 2023.0.5; Micronaut 4.7.4; Quarkus 3.17.6; Arquillian 1.9.3; and an update on Jakarta EE 11.

JDK 24

Build 31 of the JDK 24 early-access builds was made available this past week featuring updates from Build 30 that include fixes for various issues. Further details on this release may be found in the release notes.

JDK 25

Build 5 of the JDK 25 early-access builds was also made available this past week featuring updates from Build 4 that include fixes for various issues. More details on this release may be found in the release notes.

For JDK 24 and JDK 25, developers are encouraged to report bugs via the Java Bug Database.

Jakarta EE 11

In his weekly Hashtag Jakarta EE blog, Ivar Grimstad, Jakarta EE Developer Advocate at the Eclipse Foundation, provided an update on Jakarta EE 11, writing:

Jakarta EE Core Profile 11 was released in December. You can check out all the details on the updated Jakarta EE Core Profile 11 specification page. The next out will be Jakarta EE Web Profile 11, which will be released as soon as there is a compatible implementation that passes the refactored TCK. The Jakarta EE Platform 11 will follow after the Web Profile.

The road to Jakarta EE 11 included four milestone releases, the release of Core Profile with the potential for release candidates as necessary before the GA releases of the Platform and Web Profile in 1Q2025.

Spring Framework

Spring Cloud 2023.0.5, codenamed Leyton, has been released featuring bug fixes and notable updates to sub-projects: Spring Cloud Kubernetes 3.1.5; Spring Cloud Function 4.1.5; Spring Cloud Stream 4.1.5; and Spring Cloud Circuit Breaker 3.1.4. This release is based upon Spring Boot 3.4.0. Further details on this release may be found in the release notes.

WildFly

The release of WildFly 3.5 primarily focuses on support for MicroProfile 7.0 and the updated specifications, namely: MicroProfile Telemetry 2.0; MicroProfile Open API 4.0; MicroProfile Rest Client 4.0; and MicroProfile Fault Tolerance 4.1. Along with bug fixes and dependency upgrades, other enhancements include: a refactor of the WildFlyOpenTelemetryConfig class as it had become too large and unmanageable; and the addition of profiles in the source code base for a “cleaner organization of the build and testsuite execution so the base and expansion parts can be independently built, and, more importantly, can be independently tested.” More details on this release may be found in the release notes. InfoQ will follow up with a more detailed news story.

Micronaut

The Micronaut Foundation has released version 4.7.4 of the Micronaut Framework featuring Micronaut Core 4.7.11, bug fixes and patch updates to modules: Micronaut Serialization and Micronaut Discovery Client. Further details on this release may be found in the release notes.

Quarkus

Quarkus 3.17.6, the fifth maintenance release (3.17.1 was skipped due to a regression), ships with bug fixes, dependency upgrades and notable resolutions to issues such as: a NullPointerException caused by the mappingToNames() method, defined in the BuildTimeConfigurationReader class, using the SmallRye Config PropertyName class to map with mapping names; and bootstrapping an application crashes using the Dev Console. More details on this release may be found in the changelog.

Java Operator SDK

The first release candidate of Java Operator SDK 5.0.0 ships with continuous improvements on new features such as: the Kubernetes Server-Side Apply elevated to a first-class citizen with a default approach for patching the status resource; and a change in responsibility with the EventSource interface to monitor the resources and handles accessing the cached resources, filtering, and additional capabilities that was once maintained by the ResourceEventSource subinterface. Further details on this release may be found in the changelog.

Arquillian

A week after the release of version 1.9.2, Arquillian 1.9.3 provides dependency upgrades and improvements to the ExceptionProxy class to produce a meaningful stack trace when the exception class is missing on a client. More details on this release may be found in the release notes.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Google Expands Gemini Code Assist with Support for Atlassian, GitHub, and GitLab

MMS Founder
MMS Renato Losio

Article originally posted on InfoQ. Visit InfoQ

Google recently announced support for third-party tools in Gemini Code Assist, including Atlassian Rovo, GitHub, GitLab, Google Docs, Sentry, and Snyk. The private preview enables developers to test the integration of widely-used software tools with the personal AI assistant directly within the IDE.

Offering similar functionalities to the market leader GitHub Copilot, Gemini Code Assist provides AI-assisted application development with AI code assistance, natural language chat, code transformation, and local codebase awareness. Launching these tools in private preview integrates real-time data and external application access directly into the coding environment, enhancing functionality while reducing distractions. Ryan J. Salva, senior director at Google, and Prithpal Bhogill, group product manager at Google, write:

Recognizing the diverse tools developers use, we’re collaborating with many partners to integrate their technologies directly into Gemini Code Assist for a more comprehensive and streamlined development experience. These partners, and more, help developers stay in their coding flow while accessing information through tools that enhance the SDLC.

According to the documentation, the supported third-party tools can convert any natural language command into a parameterized API call, based on the OpenAPI standard or a YAML file provided by the user. GitHub Copilot Enterprise also includes extensions to reduce context switching. Richard Seroter, senior director and chief evangelist at Google Cloud, comments:

Google often isn’t first. There were search engines, web email, online media, and LLM-based chats before we really got in the game. But we seem to earn our way to the leaderboard over time. The latest? Gemini Code Assist isn’t the first AI-assisted IDE tool. But it’s getting pretty good!

With coding assistance being one of the most promising areas for generative AI, Salva and Bhogill add:

Code Assist currently provides developers with a natural language interface to both traditional APIs and AI Agent APIs. Partners can quickly and easily integrate to Code Assist by onboarding to our partner program. The onboarding process is as simple as providing an OpenAPI schema, a Tool config definition file, and a set of quality evals prompts used to validate and tune the integration.

This is not the only recent announcement impacting Code Assist, with support for Gemini 2.0 Flash being a significant announcement. Powered by Gemini 2.0, Code Assist now offers a larger context window, enabling it to understand more extensive enterprise codebases. According to Google, this new LLM aims to enhance productivity by providing higher-quality responses and lower latency, allowing users to “stay in an uninterrupted flow state for longer.” In the “The 70% problem: Hard truths about AI-assisted coding” article, Addy Osmani warns:

AI isn’t making our software dramatically better because software quality was (perhaps) never primarily limited by coding speed (…) What AI does do is let us iterate and experiment faster, potentially leading to better solutions through more rapid exploration (…) The goal isn’t to write more code faster. It’s to build better software. Used wisely, AI can help us do that. But it’s still up to us to know what “better” means and how to achieve it.

Code Assist currently supports authentication to partner APIs via the OAuth 2.0 Authorization Code grant type, with Google planning to add support for API key authentication in the future. Pricing is based on per-user, per-month licenses, with monthly or annual commitments. Licenses range from $19 USD to $54 USD per user per month. A Google form is available to request access to the private preview of Code Assist tools.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.