Top AI Stocks Show Promising Growth in Q3 Reports – Fagen Wasanni Technologies

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Evercore ISI has identified the top artificial intelligence (AI) players in the market as the technology continues to transform businesses worldwide. The positive prospects of AI have driven up the shares of companies involved in the sector. In particular, Nvidia’s exceptional Q3 results have sparked even more enthusiasm among investors. In light of this, Evercore ISI’s senior managing director, Julian Emanuel, has compiled a list of Russell 3000 stocks that have mentioned AI in their earnings calls since May and have received positive reactions to their latest reports. These stocks have also experienced upward revisions in their next 12 months’ EPS estimates and are ranked in the top 75% of the index.

Among the notable companies on the list are Alphabet and Meta Platforms, both of which have witnessed significant growth in response to AI demand this year. Alphabet, in particular, emphasized its long-standing focus on AI integration in its recent quarterly report, highlighting how the technology improves search and advertising. Meta, on the other hand, highlighted the exponential growth of AI-recommended content on Facebook’s feed.

Travel company Booking Holdings referred to its use of AI to enhance the personalized booking experience. They also highlighted their generative AI Priceline travel assistant called Penny, which has received positive feedback. These reports led to an 8% surge in Booking’s shares, and EPS expectations have increased by over 4% on a 12-month forward basis.

Several semiconductor and software companies, including Arista Networks and MongoDB, were also included in the list. Arista Networks’ EPS estimates have risen by more than 5% since May, fueled by strong earnings and positive guidance. MongoDB experienced a 28% surge in shares after beating expectations, with EPS estimates up by over 44% since May.

Other companies meeting Evercore’s criteria include Intel, Pure Storage, JPMorgan Chase, and Domino’s Pizza. These stocks have shown promising growth in their Q3 reports, reinforcing their position as significant players in the AI landscape.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


MongoDB Unusual Options Activity – Benzinga

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Someone with a lot of money to spend has taken a bullish stance on MongoDB MDB.

And retail traders should know.

We noticed this today when the big position showed up on publicly available options history that we track here at Benzinga.

Whether this is an institution or just a wealthy individual, we don’t know. But when something this big happens with MDB, it often means somebody knows something is about to happen.

So how do we know what this whale just did?

Today, Benzinga‘s options scanner spotted 12 uncommon options trades for MongoDB.

This isn’t normal.

The overall sentiment of these big-money traders is split between 50% bullish and 50%, bearish.

Out of all of the special options we uncovered, 6 are puts, for a total amount of $643,302, and 6 are calls, for a total amount of $318,075.

What’s The Price Target?

Taking into account the Volume and Open Interest on these contracts, it appears that whales have been targeting a price range from $300.0 to $400.0 for MongoDB over the last 3 months.

Volume & Open Interest Development

Looking at the volume and open interest is a powerful move while trading options. This data can help you track the liquidity and interest for MongoDB’s options for a given strike price. Below, we can observe the evolution of the volume and open interest of calls and puts, respectively, for all of MongoDB’s whale trades within a strike price range from $300.0 to $400.0 in the last 30 days.

MongoDB Option Volume And Open Interest Over Last 30 Days

Biggest Options Spotted:

Symbol PUT/CALL Trade Type Sentiment Exp. Date Strike Price Total Trade Price Open Interest Volume
MDB PUT SWEEP BULLISH 01/17/25 $300.00 $215.4K 159 148
MDB PUT TRADE BEARISH 01/17/25 $300.00 $142.7K 159 34
MDB PUT TRADE NEUTRAL 01/17/25 $300.00 $137.0K 159 221
MDB CALL TRADE BEARISH 01/17/25 $300.00 $111.2K 51 65
MDB CALL SWEEP BULLISH 01/19/24 $390.00 $72.3K 89 16

Where Is MongoDB Standing Right Now?

  • With a volume of 352,529, the price of MDB is up 1.25% at $360.68.
  • RSI indicators hint that the underlying stock may be oversold.
  • Next earnings are expected to be released in 17 days.

What The Experts Say On MongoDB:

  • JMP Securities has decided to maintain their Outperform rating on MongoDB, which currently sits at a price target of $425.
  • Keybanc has decided to maintain their Overweight rating on MongoDB, which currently sits at a price target of $462.

Options are a riskier asset compared to just trading the stock, but they have higher profit potential. Serious options traders manage this risk by educating themselves daily, scaling in and out of trades, following more than one indicator, and following the markets closely.

If you want to stay updated on the latest options trades for MongoDB, Benzinga Pro gives you real-time options trades alerts.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


MongoDB, Inc. (NASDAQ:MDB) Receives Average Rating of “Moderate Buy” from Analysts

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

MongoDB, Inc. (NASDAQ:MDBGet Free Report) has received a consensus rating of “Moderate Buy” from the twenty-four brokerages that are currently covering the stock, MarketBeat reports. One equities research analyst has rated the stock with a sell recommendation, three have issued a hold recommendation and twenty have given a buy recommendation to the company. The average 12 month price target among analysts that have issued ratings on the stock in the last year is $378.09.

A number of research analysts recently weighed in on the company. 22nd Century Group reissued a “maintains” rating on shares of MongoDB in a research note on Monday, June 26th. 58.com reaffirmed a “maintains” rating on shares of MongoDB in a report on Monday, June 26th. Citigroup lifted their target price on shares of MongoDB from $363.00 to $430.00 in a report on Friday, June 2nd. VNET Group reaffirmed a “maintains” rating on shares of MongoDB in a research report on Monday, June 26th. Finally, Sanford C. Bernstein boosted their price objective on MongoDB from $257.00 to $424.00 in a research report on Monday, June 5th.

Read Our Latest Stock Report on MongoDB

MongoDB Stock Down 1.0 %

MDB stock opened at $356.22 on Monday. The company has a 50 day moving average of $394.61 and a two-hundred day moving average of $289.96. The company has a quick ratio of 4.19, a current ratio of 4.19 and a debt-to-equity ratio of 1.44. MongoDB has a 12-month low of $135.15 and a 12-month high of $439.00. The company has a market cap of $25.14 billion, a price-to-earnings ratio of -76.28 and a beta of 1.13.

MongoDB (NASDAQ:MDBGet Free Report) last announced its quarterly earnings results on Thursday, June 1st. The company reported $0.56 earnings per share (EPS) for the quarter, beating the consensus estimate of $0.18 by $0.38. The business had revenue of $368.28 million during the quarter, compared to analyst estimates of $347.77 million. MongoDB had a negative return on equity of 43.25% and a negative net margin of 23.58%. The firm’s revenue was up 29.0% on a year-over-year basis. During the same period in the previous year, the company earned ($1.15) EPS. On average, equities analysts anticipate that MongoDB will post -2.8 earnings per share for the current fiscal year.

Insiders Place Their Bets

In other news, Director Dwight A. Merriman sold 3,000 shares of the business’s stock in a transaction on Thursday, June 1st. The shares were sold at an average price of $285.34, for a total value of $856,020.00. Following the sale, the director now owns 1,219,954 shares in the company, valued at approximately $348,101,674.36. The sale was disclosed in a filing with the Securities & Exchange Commission, which can be accessed through the SEC website. In other news, Director Dwight A. Merriman sold 3,000 shares of MongoDB stock in a transaction that occurred on Thursday, June 1st. The shares were sold at an average price of $285.34, for a total value of $856,020.00. Following the completion of the transaction, the director now owns 1,219,954 shares in the company, valued at $348,101,674.36. The transaction was disclosed in a filing with the Securities & Exchange Commission, which is accessible through this hyperlink. Also, CEO Dev Ittycheria sold 20,000 shares of the stock in a transaction that occurred on Thursday, June 1st. The stock was sold at an average price of $287.32, for a total value of $5,746,400.00. Following the sale, the chief executive officer now owns 262,311 shares in the company, valued at $75,367,196.52. The disclosure for this sale can be found here. Insiders sold 102,220 shares of company stock worth $38,763,571 over the last three months. 4.80% of the stock is currently owned by corporate insiders.

Institutional Inflows and Outflows

A number of institutional investors have recently made changes to their positions in the business. Price T Rowe Associates Inc. MD raised its stake in shares of MongoDB by 13.4% in the first quarter. Price T Rowe Associates Inc. MD now owns 7,593,996 shares of the company’s stock worth $1,770,313,000 after acquiring an additional 897,911 shares during the last quarter. Vanguard Group Inc. boosted its position in MongoDB by 2.1% in the 1st quarter. Vanguard Group Inc. now owns 5,970,224 shares of the company’s stock valued at $2,648,332,000 after buying an additional 121,201 shares during the last quarter. Jennison Associates LLC increased its stake in shares of MongoDB by 101,056.3% in the 2nd quarter. Jennison Associates LLC now owns 1,988,733 shares of the company’s stock valued at $817,350,000 after buying an additional 1,986,767 shares during the period. Franklin Resources Inc. raised its position in shares of MongoDB by 6.4% during the fourth quarter. Franklin Resources Inc. now owns 1,962,574 shares of the company’s stock worth $386,313,000 after acquiring an additional 118,055 shares during the last quarter. Finally, State Street Corp lifted its stake in shares of MongoDB by 1.8% during the first quarter. State Street Corp now owns 1,386,773 shares of the company’s stock valued at $323,280,000 after acquiring an additional 24,595 shares during the period. 89.22% of the stock is owned by institutional investors.

MongoDB Company Profile

(Get Free Report

MongoDB, Inc provides general purpose database platform worldwide. The company offers MongoDB Atlas, a hosted multi-cloud database-as-a-service solution; MongoDB Enterprise Advanced, a commercial database server for enterprise customers to run in the cloud, on-premise, or in a hybrid environment; and Community Server, a free-to-download version of its database, which includes the functionality that developers need to get started with MongoDB.

Recommended Stories

Analyst Recommendations for MongoDB (NASDAQ:MDB)

This instant news alert was generated by narrative science technology and financial data from MarketBeat in order to provide readers with the fastest and most accurate reporting. This story was reviewed by MarketBeat’s editorial team prior to publication. Please send any questions or comments about this story to contact@marketbeat.com.

Before you consider MongoDB, you’ll want to hear this.

MarketBeat keeps track of Wall Street’s top-rated and best performing research analysts and the stocks they recommend to their clients on a daily basis. MarketBeat has identified the five stocks that top analysts are quietly whispering to their clients to buy now before the broader market catches on… and MongoDB wasn’t on the list.

While MongoDB currently has a “Moderate Buy” rating among analysts, top-rated analysts believe these five stocks are better buys.

View The Five Stocks Here

Metaverse Stocks And Why You Can't Ignore Them Cover

Thinking about investing in Meta, Roblox, or Unity? Click the link to learn what streetwise investors need to know about the metaverse and public markets before making an investment.

Get This Free Report

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Java News Roundup: Payara Cloud, MicroProfile Telemetry, Foojay.io Calendar, JVM Language Summit

MMS Founder
MMS Michael Redlich

Article originally posted on InfoQ. Visit InfoQ

This week’s Java roundup for August 7th, 2023 features news from JDK 22, JDK 21, GraalVM Native Build Tools 0.9.24, Spring Cloud 2023.0.0-M1, Spring Modulith 1.0-RC1, Payara Cloud, Quarkus 3.2.4, MicroProfile Telemetry 1.1, OpenXava 7.1.4, Foojay.io calendar and JVM Language Summit 2023.

JDK 21

Build 35 of the JDK 21 early-access builds was also made available this past week featuring updates from Build 34 that include fixes to various issues. Further details on this build may be found in the release notes.

JDK 22

Build 10 of the JDK 22 early-access builds was also made available this past week featuring updates from Build 9 that include fixes to various issues. More details on this build may be found in the release notes.

For JDK 22 and JDK 21, developers are encouraged to report bugs via the Java Bug Database.

GraalVM

On the road to version 1.0, Oracle Labs has released version 0.9.24 of Native Build Tools, a GraalVM project consisting of plugins for interoperability with GraalVM Native Image. This latest release provides notable changes such as: support for Profile-Guided Optimization (PGO); migrate away from using ImageClassLoader in favor of ClassLoader for discovery of tests by JUnitPlatformFeature to eliminate the eager class initialization error during native image building time; and improved GraalVM installation instructions. Further details on this release may be found in the changelog.

Spring Framework

The first milestone release of Spring Cloud 2023.0.0-M1, codenamed Leyton, ships with: an implementation of Spring MVC and Jakarta Servlet; support for the Java HttpClient class; and milestone upgrades to sub-projects such as Spring Cloud Commons 4.1.0-M1 and Spring Cloud Task 3.1.0-M1. More details on this release may be found in the release notes.

The first release candidate of Spring Modulith 1.0.0 provides bug fixes, dependency upgrades and new features such as: avoid premature initialization of the SpringModulithRuntimeAutoConfiguration class to avoid a proxy warning; improve database interaction to mark event publications as completed; and allow the ApplicationModulesExporter class to write output to a file. Further details on this release may be found in the release notes. The GA release is scheduled to be announced at SpringOne at VMware Explore conference in late August 2023 and InfoQ will follow up with a more detailed news story.

Payara Cloud

Payara has announced a new 15-day free trial of their Payara Cloud cloud native runtime service for organizations considering their options for these kinds of services. Payara claims that use of Payara Cloud will shorten development cycles, improve operational efficiency and save money training developers in Kubernetes since Payara Cloud handles Kubernetes in the background.

Quarkus

Red Hat has released version 3.2.4.Final of Quarkus with notable changes such as: documentation of Maven configuration options that may be relevant when running tests; a fix for the @RouteFilter annotation that stopped working with WebSocket requests using Quarkus 3.2.0.Final; and a fix for the OpenTelemetry (OTEL) SDK autoconfiguration ignores the OTEL service name in favor of the Quarkus application name. More details on this release may be found in the changelog.

MicroProfile

On the road to MicroProfile 6.1, the MicroProfile Working Group has provided the first release candidate of the MicroProfile Telemetry 1.1 specification featuring notable changes such as: a clarification of which API classes must be available to users; an implementation of tests that is not timestamp dependent; and a clarification of the behavior of the Span and Baggage beans when the current span or baggage changes. Further details on this release may be found in the list of issues.

OpenXava

The release of OpenXava 7.1.4 features dependency upgrades and notable fixes such as: use of the @DisplaySize annotation with a value greater than 50 is ignored when used with @Column(length=255); files lost when uploading several files at same time on creating a new entity; and moving columns to customize a list does not work if the application name contains underscores. More details on this release may be found in the release notes.

Foojay.io

The Foojay.io community calendar now has the ability to automatically import individual Meetup pages maintained by Java Users Groups. This eliminates the need for JUGs to manually enter their Meetup events on the Foojay.io calendar. To get started, JUG leaders would need to register on the Foojay.io Slack channel and specify either daily or weekly basis for automatic updates.

JVM Language Summit

Sharat Chander, senior director, Java and Container Native Product Management and Developer Relations at Oracle, provided InfoQ with a summary of the 2023 JVM Language Summit that featured this agenda and OpenJDK Committers’ Workshop.

Last week represented the 15th edition of the JVM Language Summit. Taking place in Santa Clara, California, this three day summit hosted by Oracle’s Java language and JVM teams offered an open technical collaboration among language designers, compiler writers, tool builders, runtime engineers, and VM architects.

The summit welcomed the creators of both the JVM and programming languages for the JVM to share their experiences. It also included non-JVM developers of similar technologies to attend or speak on their runtime, VM, or language of choice. The attendee mix comprised participants from 15 companies, 30 countries, as well as 11 members of the Java Champions luminary program and 16 Java User Group leaders and organizers.

From sessions exploring dimensions from Project Leyden, Loom, Panama and Valhalla, the summit also offered insight into areas such as Generational ZGC, the Class-File API preview feature, and more.

Chander stated that “session recordings will be made available soon so keep an eye out for an update!”

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Deploying a Scalable and Secure Node.js REST API with MongoDB on Aptible

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Too Long; Didn’t Read

Scalable and Secure REST API Deployment: This tutorial explains how to deploy a scalable and secure Node.js REST API with MongoDB on Aptible, focusing on building modern web applications while maintaining data security and user trust.

Content Overview: The tutorial covers the significance of deploying scalable and secure REST APIs, prerequisites for setting up the development environment, connecting to MongoDB, building the Node.js REST API using Express, deploying on the Aptible platform, and why Aptible is a suitable choice for containerization and orchestration.

Importance of Scalable and Secure APIs: REST APIs are crucial for web apps, requiring scalability for increased demand and robust security to safeguard sensitive data. Aptible offers a powerful platform for deployment, hosting, scaling, and security solutions.

Development Environment Setup: Steps include creating a project directory, initializing the project with npm, and installing the Express framework. This establishes the foundation for building the REST API.

Connecting to MongoDB: Setting up a connection to MongoDB involves creating a database.js file, importing the MongoDB module, and establishing the connection using the MongoClient. This step ensures the database integration for the API.

Building the REST API: Utilizing Express and MongoDB, the tutorial guides through creating models for different data entities, defining REST endpoints for CRUD operations, and setting up the server to listen for incoming requests.

Docker Containerization: The Dockerfile is provided to containerize the Node.js app, specifying dependencies, application code, port exposure, and starting command. Docker enables consistent deployment across various environments.

Aptible Deployment: Aptible’s PaaS is introduced for secure and compliant deployment. Steps include creating an environment, adding an endpoint, setting up a database, and deploying the app. Aptible’s managed infrastructure, security, compliance, scalability, monitoring, and support are highlighted.

Why Use Aptible: The benefits of using Aptible for containerization and orchestration are outlined, including managed infrastructure, security, compliance, scalability, monitoring, ease of use, cost-effectiveness, and excellent support.

Conclusion: By following this tutorial, you’ve learned to deploy a secure Node.js REST API with MongoDB on Aptible, ensuring scalability and data security for modern web applications. This guide empowers you to build, secure, and scale web apps effectively on the Aptible platform.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


The Way to Handle Large Data Sets

MMS Founder
MMS RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

Large Data Sets

Tackling Big Data challenges head-on? By implementing the right techniques and embracing a holistic approach, you can conquer the complexities and unleash the full potential of your data.

Life is simpler when data is available in a small and structured manner. Such data can be easily processed using tools that can be used even by non-technical people and easily interpreted by humans. However, as the volume, velocity and variety of data increase, new challenges arise. From storage to transformation to gaining insights, a significant amount of data needs to be processed within a given timeframe. This size and scale adds complexity. According to a study, 98% of all data being generated today is unstructured. To process such data, traditional tools and techniques are not enough. We need technologies that are inherently developed to handle large and complex data sets.

Challenges of large data sets

Data engineering teams face several challenges when processing large data sets. Some of these challenges include:

  • Loading large files: If a large file needs to be loaded (into, say a table), it is necessary to consider the network bandwidth limitations and page-size issues.
  • Tools/algorithms may crash: Some tools or algorithms are not designed to handle large data sets, resulting in failures when applied to such data.
  • Out of memory issues: The heap memory or allocated memory (RAM) may not be sufficient to handle large data sets.
  • Programs might run infinitely: The processing time required to handle large data might grow exponentially.
  • Complex transformation requires redesign and more development effort: Data processing teams often need to re-design and re-engineer their data pipelines, leading to higher development efforts and project timeline impacts.

Techniques for handling large data sets

To address the common challenges that we just discussed, several techniques can be employed.

  1. Allocate more memory to meet the higher memory requirements.
  2. Choose instances with high memory/CPU ratio when the volume is high, and the transformations/computations are low.
  3. Avoid unnecessary and complex joins wherever possible.
  4. Divide the data set into smaller chunks and process them.
  5. Choose a data format that is optimised for memory utilisation.
  6. Stream data or process in (near) real-time so that data processing happens when the data is available rather than storing and processing in large batches.
  7. Progressively load data into memory by using appropriate timer techniques.
  8. Use an appropriate Big Data platform that meets the needs of the business.
  9. Use a suitable programming language and technology to cater to the data processing needs.
  10. Leverage distributed processing patterns.
  11. Use parallel computing architecture.
  12. Build a workflow and orchestration platform.
  13. Use in-memory data sets for small data sets.
  14. Leverage in-built operators for efficient processing.
  15. Use appropriate partitioning techniques.
  16. Use indexing for fetching as much data as required.
  17.  Leverage garbage collection techniques.
  18. Flush data from memory and cache for efficiency.
  19. Create dynamic clusters based on input size patterns.
  20. Use lazy loading techniques.

A few technical tips

The data engineering team needs to have a holistic approach to handle large data sets. Some of the technical tips for this include:

  1. Specify the correct and appropriate data types while importing data. For example, in Pandas, we can use uint32 for positive integers.
  2. Store the files (data sets) in zipped format such as the tar.gz file format.
  3. If the processing requirements are for data warehousing and reporting, use columnar NoSQL databases like Cassandra.
  4. Persist intermittent data sets to avoid re-computing the entire pipeline in case of failures. For example, use the df.write() method.
  5. Use appropriate system methods to understand the memory availability and dynamically handle the large data sets in various chunks (partitions). Examples of the methods are:a. df.memory_usage()
    b. gc.collect()
    c. sys.getsizeof()
  6. For storing transactional data, use PostgreSQL or MySQL. They also allow clustering techniques and replication capabilities which can be leveraged appropriately.
  7. Always read from external sources in appropriate fetch sizes and write in appropriate batch sizes. For example, in Spark, one can use fetchsize() and batchsize() options.
  8. Create a database collection pool; for example, setMinPoolSize and setMaxPoolSize.

Handling large data sets requires a holistic approach. Data engineers need to carefully consider the entire data pipeline, including the frequency and size of data input. They should include appropriate data quality checks in the pipeline to ensure the sanctity of the data. Also, a proper error handling mechanism, graceful shutdown techniques, and retry mechanisms in data pipelines are essential.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Oracle Data Provider for .Net: Asynchronous Programming, OpenTelemetry, and More

MMS Founder
MMS Giorgi Dalakishvili

Article originally posted on InfoQ. Visit InfoQ

Oracle prereleased a new version of Oracle Data Provider for .NET (ODP.NET), its ADO.NET data access library for the Oracle database. The new version, 23c, is available as a Developer Release, which means it is still not ready for production use. However, it already brings two of the most highly community-requested features: asynchronous programming and support to OpenTelemetry.

With the new version of the ODP.NET library .NET developers can use .NET task-based asynchronous programming (TAP) model to execute data access operations asynchronously using the async and await keywords. As a result, database operations are executed asynchronously and the caller thread is not blocked, leaving it free to serve other requests and consequently increasing scalability. Below is an example of the new asynchronous methods:

using (var connection = new OracleConnection(connectionString))
{
  await connection.OpenAsync(CancellationToken.None);

  using (var cmd = connection.CreateCommand())
  {
    cmd.CommandText = "select * from documents";

    var reader = await cmd.ExecuteReaderAsync();
  }
}

The TAP model is compatible with Oracle Database 19c or higher.

The new release also includes support for OpenTelemetry – an open-source observability framework for instrumenting, generating, collecting, and exporting telemetry data. This makes monitoring and analyzing ODP.NET operations easier in cloud computing environments, distributed systems, and microservices. Developers can use the ODP.NET OpenTelemetry NuGet package to customize the ODP.NET metrics collected.

In addition to these features, .NET developers can now use ODP.NET Application Continuity (AC) for end-to-end high availability in case of planned or unplanned outages. In case of downtime, in-flight database sessions are recovered without any work disruption. AC is enabled by default when connecting to Oracle Database 19c or higher.

ODP.NET 23c also supports the new features introduced in Oracle Database 23c. Among these features are JSON Relational Duality – a feature that allows to access relational data stored as JSON documents, boolean data type columns, SQL domains – a dictionary object that belongs to a schema and encapsulates a set of optional properties and constraints for common values, and database annotations for storing and retrieving metadata about database objects centrally.

In addition to these features, many improvements related to usage in cloud environments are planned for the final release of the library, such as centralized configuration providers and new single sign-on capabilities.

ODP.NET 23c will be able to retrieve application configuration data (such as connection strings and tuning data) from a centralized, secure location, making configuration management simpler and more flexible. It will be possible to store the configuration data in Microsoft Azure using Azure App Configuration and Azure Key Vault or in Oracle Cloud Infrastructure (OCI) via OCI Object Storage and OCI Vault, or on the local file system as a JSON configuration file.

The final release will also support single sign-on with Microsoft Entra ID or Oracle Identity and Access Management service. Single sign-on makes user management and account management easier for administrators and end users.

More details about the new version of the library can be found at the ODP.NET 23c Developer’s Guide. .NET developers can try the new available features and share their feedback on Oracle Forums.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Podcast: Building a Reliable Kafka Data Processing Pipeline With Lily Mara

MMS Founder
MMS Lily Mara

Article originally posted on InfoQ. Visit InfoQ

Subscribe on:






Intro

Thomas Betts: Hey folks. Before we get to today’s podcast, I wanted to let you know that InfoQ’s international software development conference, QCon, is coming back to San Francisco from October 2nd to the 6th.

At QCon, you’ll hear from innovative senior software development practitioners talking about real world technical solutions that apply emerging patterns and practices to address current challenges. Learn more at qconsf.com. We hope to see you there.

Hello and welcome to the InfoQ Podcast. I’m Thomas Betts. Today I’m joined by Lily Mara. Lily is an engineering manager at OneSignal in San Mateo, California. She manages the infrastructure services team, which is responsible for in-house services used by other OneSignal engineering teams.

Lilly is the author of Refactoring to Rust an early access book by Manning Publications about improving the performance of existing software systems through the gradual addition of Rust code.

Today, we’ll be talking about building reliable Kafka data processing pipeline. OneSignal improved the performance and maintainability of its highest throughput, HTTP endpoints, which are backed by a Kafka consumer in Rust by making it an asynchronous system.

The shift from synchronous to asynchronous can simplify the operational costs, but it introduces new challenges, which we’ll dive into. Lily, welcome to the InfoQ Podcast.

Lily Mara: Thanks, Thomas. I’m really excited to be here and to have an opportunity to chat about this with more people.

Messaging at OneSignal [01:06]

Thomas Betts: So you spoke at QCon New York and your title of your talk was, and this is a little long, “How to Build a Reliable Kafka Data Processing Pipeline, Focusing on Contention, Uptime and Latency.” That’s a lot of stuff to focus on.

I want to get to all those things and we’ll break it down, but I want to back up a bit. Let’s start with what does OneSignal do for the people that aren’t familiar with it, and then what were the data processing pipelines specifically that you were trying to optimize?

Lily Mara: Definitely. I do also want to say that talk title was not chosen by me. The track hosts I think filled it in as a kind of temporary holding thing until I filled it in with something a little bit snappier, but I never ended up doing that and it was so long I couldn’t get it to fit on the title slide in my slide deck. So my title slide just said “The One About Kafka.”

I think at the original conference where I gave that talk, it had a snappier title, but QCon likes the very specific titles that walk everybody through what to expect.

I think the original snappy title of the talk was A Kafkaesque Series of Events because there’s quite a confounding series of steps and debuggings and incidents that we went through during that process at OneSignal.

So OneSignal is a customer messaging company. We allow our customers who are website builders, app builders, marketers, developers to contact their own user bases more effectively.

So this might be things like sending people real-time information about live events happening in the world. It might be letting people know about promotions that are customized to their own particular interests or needs. It might be letting them know that they have abandoned carts on the website that they should come back and take a look at.

There’s all kinds of ways that people use our products and it would be probably a waste of my time to try and come up with all of them because customers are so much more creative with open-ended platforms like ours than our developers ever could be.

Before Kafka, OneSignal had a Rails monolith [03:08]

Thomas Betts: But that messaging is the core element that OneSignal is definitely trying to provide communication pathways. And so you’re doing that with Kafka underneath it or did you have something before you introduced Kafka?

Lily Mara: So before we used Kafka, OneSignal was a single Rails monolith that absolutely everything happened inside of, right? We had Rails for HTTP handling and we sent asynchronous jobs off to Sidekiq for processing.

I think from the origin, I think we were always doing some amount of asynchronous processing because the nature of our original product offering, we’ve since grown much broader, but the nature of our original product offering was you have a large user base and you want to send that user base or some subsection of that user base, a large messaging campaign.

You want to let everybody know that there’s a sale and you can’t really do that synchronously in an HTTP handler because you don’t want to be doing N things in an HTTP handler. You want to be doing one thing, and that one thing is shoving a little bit of metadata into Postgres and then firing off a Sidekiq job.

And then that Sidekiq job can have queuing applied to it so that it doesn’t overwhelm your database. And later on when there’s capacity to process that notification delivery, you can use your Postgres database to enumerate all the users that a notification is going to go to and deliver the messages.

So that was kind of our first foray into asynchronous stuff, and it was really there from the beginning.

But yeah, what I want to talk about today is several years down the line for OneSignal was the point where we started having some trouble with one of our seemingly more simple but much more high throughput HTTP endpoints.

This is the endpoint that allowed our customers to update properties on each subscription in the database, like each device record, each app installed on an iPhone, each user profile on a website, things of this nature. We allow our customers to store key value pairs as well as some other related metadata on every subscription that’s out there.

And anytime we got an HTTP request from a customer, we were taking the metadata from that request and just shoveling it directly into Postgres. And CRUD APIs work fine up to a certain scale, but there will come a point eventually as you start scaling up and up and up where those updates and requests are going to start to overwhelm Postgres, right?

Postgres is a great database, but it certainly has some limitations and we started to run into those pretty aggressively.

Why async processing was needed [05:39]

Thomas Betts: And so you’re talking about the input stream, like you were getting more requests than you can handle. It wasn’t that the async going out was being too slow, it was you got thousands or millions of requests a second and that was overloading Postgres?

Lily Mara: That’s correct, yeah. So we’re at the point now where we haven’t even introduced this asynchronous processing for this particular endpoint. So this is all happening synchronously inside of our HTTP handlers.

And at that point we decided we want to take this synchronous work and we want to add queuing to it so that we can much more carefully control how much concurrency is hitting Postgres and how much load we’re applying to Postgres.

And that was the point that we shifted from a synchronous workload in our HTTP handler and we turned that into a Kafka produce that was later picked up by a Rust Kafka consumer, which we wrote and that Kafka consumer would apply the data change to our Postgres database.

And there were a lot of interesting engineering challenges that came along with this, not the least of which was at that time especially absolutely no one was writing Kafka consumers in Rust.

We based our stack on the RdKafka library. There’s a Rust wrapper library for RdKafka that we use very heavily, but beyond the basic, “Please give me another Kafka message,” there really wasn’t much in the way of control for writing a high level Kafka consumer.

So we actually developed our own library for controlling concurrency within our consumers, which we very originally called Kafka Framework internally. And we’ve had talks to open source this because presumably someone else is just as silly as us and wants to write production Kafka consumers in Rust, but we have so far not been able to do that.

Why OneSignal chose to write a Kafka consumer in Rust [07:23]

Thomas Betts: So what was the decision behind using Rust if it didn’t exist already? Did you look at other options or is it, “We have other things. We’re using Rust and we’re going for Rust on all these other platforms, we want to use it here”?

Lily Mara: So we had bet big on Rust pretty early. One of the core pieces of OneSignal’s technology is OnePush, which is the system that powers all of our deliveries.

I think we wrote about that publicly on our engineering blog in, I want to say like 2016, which is about a year after Rusts 1.0 release, maybe even less than a year, maybe like eight or 10 months after the 1.0 release of Rust.

So we were pretty early in putting Rust into production at OneSignal, and we were really happy with both the speed of runtime once it’s deployed in production, the low overhead, the low cost of running it because of that low overhead, and also frankly, the ease of development.

Everybody talks about how Rust is kind of difficult to learn, but I think once you have a certain level of competency with it, it actually lends itself to being a very productive language for people to use in production.

So many classes of bugs that you have to worry about in other languages, you just don’t even have to think about those in Rust. And that’s really convenient for folks.

Everybody who reports to me at least is, I would say, primarily a Rust developer, and they also do some amount of Go at this point, but I have taken people from absolutely no Rust experience whatsoever to being pretty proficient Rust developers. So I have definitely seen what it takes for folks to get spun up there and it’s, I think, honestly much less than the internet Hype Machine makes it out to be.

The tech stack: Go, Kafka, Rust, and Postgres [09:06]

Thomas Betts: You went with Rust, and so how did it fit into the stack right now? So you had an HTTP endpoint and there was that on the Rails monolith you had and were you changing those?

Lily Mara: So at that point the HTTP handling for that particular endpoint had already been moved out of Rails. So I want to say in 2018, we had taken some of our highest throughput HTTP endpoints and we had actually moved them to a Go HTTP server.

We had wanted to put them in a Rust HTTP server, but at that point there really wasn’t a Rust HTTP library that we felt was kind of production grade. There really wasn’t anything that had the kind of feature set that we were looking for in an HTTP library.

I think the situation is much different now, but we’re talking about five years ago. I think the libraries were Iron and I don’t remember what the other ones were, but it was a pretty complex story for HTTP in Rust.

And compared with write something that manages concurrency, the level of engineering that we would have to put into, “Write your own HTTP server library,” was enormous and we weren’t ready to take that on.

Thomas Betts: No, that makes sense. So you had Go for the HTTP server, and then how did you get to the Rust code? Where does that live in the stack?

Lily Mara: So basically everything that we have today, we have Ruby and Go at the HTTP layer. So touching customer data directly, and then I would say everything underneath that customer data handling layer is a mix of Rust and Go.

In the future, we are developing new services exclusively in Rust. So everything that’s kind of under the skin of OneSignal is written in Rust with some older stuff that is written in Go.

Thomas Betts: So is there a Rust service that the Go HTTP just calls and says, “Please handle this request,” and then it’s very thin at the HTTP endpoint?

I’m trying to go back to you said that you had a lot of requests coming in and the Rust into Kafka is going to be really quick, but now we’ve got Go on top of it. I’m just trying to understand where’s the handoff go from one to the next?

Lily Mara: So we’ve got to Go HTTP server and basically what it does is it dumps an event into Apache Kafka, right? And then later on we have a Rust Kafka consumer that picks that event off of Kafka. So those two things don’t speak to each other directly. They kind of use Kafka as a bridge to go in between them.

Thomas Betts: Gotcha. So the Go is straight into Kafka, and it’s the consumer pulling it out of Kafka is what you wrote in Rust?

Lily Mara: That is correct. Yeah.

Thomas Betts: It wasn’t the Kafka producer. I was conflating the two in my head, but you had a Go Kafka producer, and now a Rust Kafka consumer. And so again, that Go HTTP server, very thin. All it does is sticks it into Kafka and you’re done. And then it’s async after that.

The Rust Kafka consumer [11:43] 3

Thomas Betts: And so now it’s running, you’ve got Rust, and what does it do and how does it handle concurrency?

Lily Mara: That system more or less takes the event data off of Kafka, deserializes it into a struct that the rest library can understand, and then it sends Postgres update calls across the wire to Postgres, and the concurrency story there went through a lot of phases.

There’s far too much traffic, of course, for us to just process one event at a time per process–that wouldn’t really work for us. Kafka has some inbuilt support for concurrency already. It has a data structure called partitions, and basically each partition of a topic inside of Kafka is an independent stream of events.

So you can consume those totally independently from one another. And this is kind of the first layer of concurrency that we have. This particular data stream is divided into, I want to say 720 partitions. So we could, in theory, if we did no additional concurrency support, we could consume 720 events concurrently and be sending Postgres 720 Postgres update calls concurrently.

That’s turned out to not be enough concurrency for us, because we had more events than that. So what we did is we developed a strategy for doing concurrency within each partition.

And so we would have a number of worker threads for each partition and we would ask Kafka for a number of events at once and shuffle those events to the various worker threads and process them concurrently.

Maybe this works fine for certain workloads, but we had an issue which was we didn’t want to process any updates for the same subscription concurrently or out of order.

So if my iPhone has two property updates coming for it, one setting, eight to 10, one setting eight to 20, if the order that we received those at the HTTP layer was 10 and then 20, we don’t want to apply 20 to the database and then 10 milliseconds later apply 10. We want the most recent thing that a customer sent us to be the thing that gets persisted into the database.

Why a customer would send us updates like that is not really our concern. We just want to be accurate to what the customer does send us.

Thomas Betts: Right. Can’t control what the customer is going to do with your system.

Lily Mara: You certainly cannot.

Thomas Betts: You at least acknowledge it could be a problem.

Lily Mara: Yes, we acknowledged it could be a problem. And so what were we to do? We can’t have unlimited concurrency. So we use a property on each event, which is the subscription ID, more or less the row ID in Postgres.

We hash that and we make a queue in memory that’s for each one of the worker threads. So each worker thread, maybe there’s four worker threads per partition, each one of those four worker threads has a queue that is dedicated to it.

And this is like a virtual queue because there’s a Kafka partition that’s like the real queue. We can’t really muck around with that too much. But for each worker thread, we have kind of a virtual queue, like a view onto the Kafka partition that only contains events that hash to the same worker thread ID.

And what this means is if there are two updates that are destined for the same row in Postgres, the same subscription, that means they’re going to have to be processed by the same worker thread, which means they’re going to sit on the same queue, which means they are going to be processed in the order that we receive them at the HTTP layer.

How to handle partitioning [15:18]

Thomas Betts: So you’ve got a couple layers, you said they’ve got the partition is the first layer and then you’ve got these virtual queues. How are you partitioning?

You said you’ve got 720 partitions, you said that was the max. Are you using the max and then what are you doing to separate that? Is there some sort of logical structure for choosing those partitions to help you down the line so you don’t have some data going to one place and some going to another?

Lily Mara: Absolutely. So 720 is not the maximum number of partitions you can use in Kafka. I’m not sure what the maximum is. I would assume it’s something obscenely large that no one actually wants to use in production, but yeah, it is definitely not the max.

We picked 720 because it has a lot of divisors. And so when you start up Kafka consumers, Kafka will assign each consumer a number of partitions to consume from.

And basically we wanted a number of partitions that would very easily divide into an even number of partitions per consumer. And also the second half of your question, an even number of partitions should be assigned to each database.

So the partitioning strategy that we use is to divide messages onto particular partitions that align with Postgres databases. So the way our system works is internally we have a number of Postgres databases that are sharded and they’re sharded based on the customer’s app ID or dataset ID, right?

I think it’s the first byte of their app ID determines which database shard they’re going to live on. So occasionally when we are having performance issues or incidents or things like that, we will publish a status page update that says, “Customers who have app IDs beginning with D might be having some performance issues right now that is because we’re having some problems with one particular Postgres database.”

So we use this to determine which Kafka partition things are going to be assigned to. So this means that one customer’s app, they are always going to have all of their Kafka events go to the same Kafka partition.

And datasets which are going to the same Postgres server, they’re going to be either on the same or slightly different Kafka partition, not necessarily guaranteed to be the same, but it could be the same.

Thomas Betts: So that’s part of the customer app ID. So that’s in the request that’s coming in. So from that HTTP request, “Here’s the data I want to update and here’s who I am.” That’s when you start having the key that says, “I know where to send this data to Kafka,” and that’s going to correspond to the shard in Postgres as well?

Lily Mara: That’s correct.

Thomas Betts: Okay. So that’s built into the design at the data level of how your data is structured?

Lily Mara: That’s correct. At the end of the day, most of the problems that our company faces are Postgres scaling problems. So as we have gotten our stuff to be more and more and more reliable, a lot of that work has come down to making more of our systems aware of the fact that they have to queue things up based on which Postgres server particular kinds of traffic are going to.

Thomas Betts: I think that covers a lot of the partitioning, which is complicated and I think you’ve explained it pretty well, at least I can understand it now.

Changing metrics for sync vs. async [18:24]

Thomas Betts: When you shifted from synchronous to asynchronous workflows with this data processing, that changes how you measure it. I guess you had performance issues before.

How do you know that this improved performance and how you have to measure the state of the system change because it’s now asynchronous and it’s not the same… the server’s not getting overloaded, the CPU is not spiking, but is it healthy?

Lily Mara: Definitely. Yeah. So this very dramatically changed how we thought about our metrics. It very dramatically increased the scope of monitoring that we had to do on our systems.

When you have a synchronous system that’s just serving a bunch of HTTP traffic, you care about how long does each request take to serve, and how many successes, and how many errors do I have.

I don’t want to say it’s simple, because of course it is not simple, but you could be pretty effective by monitoring those three things and driving response time down and driving errors down.

But as soon as you move to an asynchronous system like this, if those are the only three metrics that you care about, you might not ever be processing any of your updates.

A successful HTTP response doesn’t mean very much in a world where actually applying the data that’s in that HTTP request might happen 12 hours later. And of course we don’t want it to happen 12 hours later, but these are the nature of some of the metrics we have to track now.

The key one that we originally started looking at was message lag, the number of messages that are currently sitting in Kafka that have not yet been processed. So normally when the system is healthy, this is fairly close to zero and it might go as high as a couple of hundred thousand and it can drop down to zero again very, very quickly. That is kind of normal, nominal operation.

When things start to get to be in a bad state, we can get tens or I think in rare cases we may have even seen hundreds of millions of messages sitting in Kafka waiting to be processed. Right?

This is not a good thing, and this is an indication that your system is not healthy because you have a bunch of messages that you expected to be able to process that you somehow are lacking the capacity to process.

There’s a number of different reasons why this could be happening. Maybe your Postgres server is down, maybe there’s… in one case for us, maybe you have a customer who sent a ton of updates that were destined for a single row in Postgres, and this kind of destroys your concurrency logic because all of the messages are being queued for a single row in Postgres, which means all of your worker threads are sitting there doing nothing.

But the nature of why you have a bunch of messages in lag is kind of broad. We’re talking about what are the metrics, that you have the number of messages sitting in Kafka.

And then one that we’ve started looking at more recently as a helpful metric is we’re calling time lag because every message that goes into Kafka has a timestamp that’s on it that says this is the time that this message was added to the Kafka queue. So we’ve recently added a metric that reports the timestamp of the most recently processed message on the consumer side.

So we’re able to determine, “Right now it’s 2:30 PM. Well, the last message that we processed was actually put onto Kafka at noon.” So why are there two and a half hours between when this message was produced and when we’re consuming it? That means we are probably pretty far off from keeping up with reality and we need to update things.

This metric can be a little bit confusing for folks though, because in a normal state when things are nominal, that timestamp is always ticking up at a fairly steady state. It’s always basically keeping up with reality, right?

It’s 2:30 now, and the most recently processed message was also at 2:30. That makes sense. And if you have an elevated production rate, you have more HTTP requests than can be handled by your Kafka consumer coming in for a sustained period of time, you’re maybe going to see a slightly lower slope to that line. You’re not going to be keeping up with reality and at 2:30 you’re still going to be processing messages from 2:00 PM or something like that.

But something kind of confusing can happen if you have a pattern like… Let’s say your normal state of message production is like 2000 messages per second are coming across the wire. If you have say a one-minute period where that jumps up to 20,000 messages per second, say you have a big burst of traffic that is abnormal, that happens over a short period of time, and enqueues a bunch of messages into Kafka, your keeping up with real-time metric might actually plateau because say from two o’clock to 2:10, you have somebody sending you 20,000 messages a second.

So over this 10 minute period, you have maybe 12 million messages come across the wire and they’re all sitting there in Kafka waiting to be processed. You can’t process 20,000 messages a second. Maybe you can process 4,000. You can process slightly above your steady state of incoming messages, but you are going to fall behind reality. And so your time latency is going to kind of plateau. It’s going to go, this maybe starts at 2:00 PM and by 2:30 you have maybe processed up to like 2:03.

And this can be kind of confusing for people because if they’re just looking at a message latency graph, if they’re looking at the thing that shows me number of messages sitting in Kafka, it looks like there was a big growth in the number of messages sitting in Kafka.

But now that line is starting to steep down really dramatically. Now it’s starting to return back to normal really dramatically because we are now processing those messages and it’s starting to go back to normal.

But your time latency is going to keep getting worse and worse and worse and worse until you manage to catch up with this because your real-time, the wall time, that’s continuing to tick forward at one second per second. But this time latency, this thing that’s measuring how far the wall time is from the time of the most recently processed Kafka message, that’s continuing to fall further and further behind reality.

So that can be a little bit confusing for folks to wrap their head around, but we found it to be a little bit more useful for actually determining how real-time is the system.

Number of messages sitting in Kafka is great as a first baseline health metric, but it’s not great for answering the question, how far behind reality are we? Maybe there’s 20 million messages in lag, but maybe they’ve only been there for two minutes. So is that really a problem? I don’t know.

Thomas Betts: Yeah, I like the default response is the queue length for a queue should be close to zero. That’s standard, how do you know any queue is healthy? And Kafka is just much bigger than typical queues, but it’s the same idea of how many messages are waiting.

And the time lag seems like, okay, that was giving you one metric and the first thing you do is, “Okay, there’s a problem, we can see there’s a problem, but we don’t understand why there’s a problem.” And then the time lag is also the backend.

So for both of those, because you’re talking about the producer side and the consumer side building up, but for both of those, you wanted to start digging in and starting to do analysis. What did you get to next? Did you have to add more metrics? Did you have to add more logging dashboards? What was the next step in the troubleshooting process?

Lily Mara: Yeah, definitely. So we have found distributed tracing really valuable for us. At OneSignal we use Honeycomb and we use OpenTelemetry absolutely everywhere in all of our services.

And that has been really invaluable for tracking down both performance issues and bugs and behavior problems, all kind of stuff across the company, across the tech stack. We produce OpenTelemetry data from our Ruby code, from our Go code, from our Rust code. And that has been really, really valuable to us.

OpenTelemetry and Honeycomb [26:29]

Thomas Betts: And so that OpenTelemetry data is in that message that’s being put into Kafka. So it comes in from the request and then it always sticks around. So when you pull it out, whether it’s a minute later or two and a half hours later, you still have the same trace.

So you go look in Honeycomb and you can see, “This trace ran for two and a half hours,” and then you can start looking at all the analysis of what code was running at what time, and maybe you can start seeing, “Well, this little bit of it sat and waited for an hour and then it took a sub-second to process.” Right?

Lily Mara: We don’t actually tie the produce to the consume. We have found that to be rather problematic. We did try it at one point when we were kind of very optimistic, but we found that that runs into some issues for a number of reasons.

One of those issues is that we use Refinery for sampling all of our trace data. Refinery is a tool that’s made by Honeycomb, and more or less it stores data from every span inside of a trace and uses that to determine a kind of relevancy score for every trace and uses that to determine whether or not a trace should be sent to Honeycomb.

And that’s because each trace that you send up to the storage server costs you money to store. So we have to make a determination if we want to store everything or not. And of course, we cannot afford to store absolutely everything.

So if we were to match up the produce trace with the consume trace, we would have to persist that data in Refinery for potentially several hours. And that is problematic. That system was designed for stuff that is much more short-lived than that, and it doesn’t really work with that super well.

One thing that we have found that works a bit better is span links. So I am not sure if this is part of the OpenTelemetry spec or if it’s something that’s specific to Honeycomb, but more or less you can link from one span to another span, even if those two things are not a part of the same trace.

So we can link from the consume span to the root of the produced trace. And this is helpful for mapping those two things up, but it’s generally not even super valuable to link those two things up.

You might think that it is helpful to have the HTTP trace linked up with the Kafka consumed trace, but in general, we have found that the Kafka stuff has its own set of problems and metrics and concerns that really are kind of orthogonal to the HTTP handling side.

You might think, “Oh, it’s useful to know that this thing was sitting in the Kafka queue for two and a half hours,” or whatever, but we can just put a field on the consume span that says, “This is the difference between the produced timestamp and the time it was consumed.”

So there’s not really a whole lot of added value that you would be getting by linking up these two things.

Measure the consumer and producer sides independently [29:20]

Thomas Betts: Well, it’s helpful. I mean, I think you’ve just demonstrated that someone like me that comes in with a naive approach, like, “Oh, clearly this is the structure of the message I want to have from the person put in the request to when I handled it.”

And you’re actually saying you separate and you look at if there’s issues on the consuming side, they’re different from issues on the production side, and basically in the middle is Kafka.

And so are we having problems getting into Kafka and is it slow or are we having problems getting stuff out of Kafka into Postgres and that’s our problem? And you tackle those as two separate things. It’s not the consumer thing.

Lily Mara: Yeah, absolutely. And if we want to do things like analyze our HTTP traffic coming in to determine what shape that has, how many requests are each customer sending us in each time span, we actually can just look at the data that are in Kafka as opposed to looking to a third party monitoring tool for that. Because Kafka is already an event store that basically stores every single HTTP request that’s coming across our wire.

So we’ve developed some tools internally that allow us to inspect that stream and filter it based on… Basically we add new criteria to that every time there’s an investigation that requires something new. But we have used that stream very extensively for debugging, or maybe not debugging, but investigating the patterns of HTTP traffic that customers send us.

So yeah, combination of looking at the Kafka stream directly to determine what data are coming in, as well as looking at monitoring tools like logs, like metrics, like Honeycomb for determining why is the consumer behaving the way it is.

They sometimes have the same answer: the consumer is behaving the way it is because we received strange traffic patterns. But it is also often the case that the consumer is behaving the way it is because of a bug in the consumer, or because of a network issue, or because of problems with Postgres that have nothing to do with the shape of traffic that came into the tool.

Current state and future plans [31:13]

Thomas Betts: So what’s the current state of the system? Have you solved all of your original needs to make this asynchronous? Is it working smoothly or is it still a work in progress and you’re making little tweaks to it?

Lily Mara: I don’t know if things could ever really be described as done and we can walk away from them. Our systems are very stable, especially been pleased with the stability of our Rust applications in production.

Something that we have noted in the past as almost an annoyance point with our Rust applications is they’re so stable that we will deploy them to production and then the next time we have to make a change to them, we’ll realize that they are very behind on library version updates because they have just kind of never needed anything to be done to them. They have really, really great uptime.

So the systems are very reliable, but as our customers traffic patterns change, we do continue to have to change our systems to evolve with those. We are exploring new strategies for making our things concurrent in different ways.

We’re exploring as we add features that require us to look at our data in different ways. We are also changing concurrency strategies as well as data partitioning strategies, all this type of stuff.

Thomas Betts: So all the typical, “Make it more parallel. Split it up more. How do we do that, but how do we have to think about all those things?”

A lot of interesting stuff. And if you have lots of changes in the future, hope to hear from you again. What else are you currently working on? What’s in your inbox?

Lily Mara: I am currently pitching a course to O’Reilly about teaching experienced developers how to move over to Rust as their primary development language. I think there is a lot of interest in Rust right now across the industry.

You have organizations like Microsoft saying they’re going to rewrite core parts of Windows in Rust because of all of the security issues that they have with C and C++ as their core languages.

I believe last year there was a NIST recommendation. It was either NIST or DOD recommendation that internal development be done in memory safe languages or something like that.

And I believe there are also a number of large players in the tech industry who are looking into Rust. And I think there’s a lot of people who are nervous because Rust is talked about as this big scary thing that is hard to learn.

It’s got ownership in there. What is that? And so the goal of this course is to help those people who already have experience in other languages realize that Rust is relatively straightforward, and you can definitely pick up those concepts and apply them to professional development in Rust.

Thomas Betts: Sounds great. Well, Lily Mara, once again, thank you for joining me today on the InfoQ Podcast.

Lily Mara: Thank you so much, Thomas. It was great to talk to you.

Thomas Betts: And we hope you’ll join us again for another episode in the future.

Lily Mara: Absolutely.

Mentioned

About the Author

.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Ai4 2023 Panel Discussion: Generative AI in Business and Society

MMS Founder
MMS Anthony Alford

Article originally posted on InfoQ. Visit InfoQ

The recent Ai4 conference featured a panel discussion titled “Generative AI in Business and Society.” Some key takeaways are that generative AI offers many opportunities for operational efficiency and product personalization, that companies need to balance privacy concerns with personalization, and they need to understand how generative AI is used across their organization.

The panel was moderated by Aaron Rosenberg, partner at Radical Ventures. Panelists included Alon Yamin, CEO and co-founder of Copyleaks; Chrissie Kemp, chief data and digital product officer at Jaguar Land Rover; Mark Stutzman, CTO at AREA15; and Eiman Ebrahimi, CEO at Protopia AI. The panelists discussed questions about generative AI posed by Rosenberga.

Rosenberg began by asking Stutzman about the use of generative AI at his company, which provides immersive entertainment to its guests. Stutzman said that AREA 15 uses ChatGPT for “a lot of boring stuff,” such as their customer service chatbot, where it resolves “close to 85 to 90%” of guest questions. They also used a generative image AI to create images for theming a new restaurant at their venue, which he described as “like walking into a video game.” He also hinted at future plans for personalized interactive experiences generated dynamically by AI.

Next Rosenberg asked Kemp what adoption of generative AI looked like for a large enterprise such as Jaguar Land Rover. Kemp replied that they were being “cautious,” especially with respect to security and privacy. However, she said that adopting generative AI would allow them to deliver more personalized in-vehicle services, calling out the company’s partnership with NVIDIA. She also said that in the enterprise itself there would be “huge opportunities to drive productivity and efficiency.”

Rosenberg then asked Ebrahimi how his company, Protopia AI, and others like it are enabling enterprises to adopt generative AI. Ebrahimi noted that one of the biggest challenges is how to properly handle sensitive data. He called back to AREA 15 and Jaguar Land Rover both wanting to provide personalized experiences, but needing to balance collecting the personal data needed for that with privacy concerns. He referred to his company’s product, which transforms data into a form that’s not human-understandable, so that it can be used by machine learning algorithms while preserving privacy.

Rosenberg next asked Yamin what generative AI concerns he was seeing and how to address them. Yamin replied that he saw “amazing excitement” about opportunities in enterprises, along with worry about how to mitigate risks. He pointed out that many enterprises do not have a full picture of where generative AI is already being used within their organization. He recommended that companies define policies around the use of generative AI and build tools to enforce the policies. He also recommended they check the accuracy of the output of models even closer than they would human-created content.

Rosenberg then asked each panel member for “one piece of practical advice” on how organizations could experiment with generative AI. Stutzman said, “play with it, [but] be careful.” Yamin reiterated the need to know how generative AI was used across the organization, along with a need for clear policies. Kemp advised “invest your data,” as models are only as good as the data used to train them. Ebrahimi cautioned against hoping that “the legal system is going to come save us,” and instead recommended looking for technological solutions to privacy and compliance problems.

Finally, Rosenberg asked the panelists what they were most excited about when thinking about the future of generative AI. For Ebrahimi, it was personalized health care. Kemp predicted “another renaissance era in terms of creativity,” particularly in art. Yamin was excited about education, noting that there was already visible progress in the field. Stutzman seconded Ebrahimi’s excitement on health care, but added that he predicted a fully-automated marketing tech stack. Rosenberg concluded by sharing his own excitement about AI’s potential for advancing biology and physics.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


MySQL Changes Versioning Model Announcing First Innovation Release 8.1.0

MMS Founder
MMS Renato Losio

Article originally posted on InfoQ. Visit InfoQ

Oracle recently announced a change in the versioning model for MySQL, introducing the MySQL Innovation and Long-Term Supported releases. The first innovation release is MySQL 8.1.0 , which includes InnoDB cluster read replicas.

The announcement marks a significant change in the MySQL release cycle. Until today, MySQL 8.0 followed a continuous delivery model with quarterly releases. While this approach allowed MySQL to introduce new features more frequently, it also presented challenges for projects and applications that needed only critical patches with fewer behavior changes.

Going forward, there will be separate Innovation and Long-Term Support (LTS) releases as explained by Kenny Gryp, MySQL product management director at Oracle, and Airton Lastori, MySQL product manager at Oracle:

MySQL database version 8.1.0 will be our first Innovation release, and 8.0.34+ will transition to only bug fixes until 8.0 End-Of-Life (EOL) scheduled for April-2026. Approximately one year from now, MySQL version 8.x will eventually become LTS which will provide ample time for users to migrate from 8.0.x to the 8.x LTS version.

Innovation releases will follow a similar model to MySQL 8.0 continuous development (< 8.0.34), incorporating bugfixes, security patches, and new features. According to Oracle, a new LTS version will be released every 2 years, and an 8.x LTS release is expected before the EOL of version 8.0. Gryp and Lastori add:

The current cadence goal is to have an Innovation release every quarter, incrementing the minor version number (eg. 8.2, 8.3, etc.). Innovation releases will also be Generally Available and are recommended to be used in production environments. Bugfixes and security patches will generally be included in the next Innovation or LTS release and not as part of a patch release within that Innovation release.

Among other changes, the new release introduces InnoDB Cluster Read Replicas, a new integrated solution for read scale-out, with router awareness. David Stokes, technology evangelist at Percona, comments:

I like the idea of the long-term support edition, as too many have been caught out on some of the tweaks in the quarterly releases. This should add stability to production environments and make life simpler for many. The announcement of 8.1 was long anticipated, and new features are always interesting and, hopefully, helpful. Seeing 8.0 becoming a bug-fix only for the next few years until the EOL date seems a little bittersweet.

For instance, release 8.0.29 had to be removed last year due to a critical issue related to the new INSTANT ALTER TABLE usage, which could lead to an unrecoverable database.

While the status and the lack of information about MySQL 8.1/9.0 has been a long-term topic in the community, Michael Larabel, founder of Phoronix.com, instead highlights other changes that MySQL 8.1 introduces:

Among the additions with MySQL 8.1 is allowing the EXPLAIN FORMAT=JSON output to be used with an INTO option to allow the JSON-formatted EXPLAIN output to be stored into a user variable for use with MySQL JSON functions. Also on the JSON front, MySQL 8.1 also adds the SHOW PARSE_TREE statement for showing a JSON-formatted parse tree for a SELECT statement.

As adopters of the innovation releases will require more frequent updates for bugfixes and security patches, the MySQL team published an upgrade and downgrade support matrix, confirming that it will be possible to replicate from an LTS or Innovation release to the next LTS release and any Innovation release up until the next LTS release.

Both the new LTS and Innovation releases are already available on MySQL HeatWave.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.