Wealthfront Advisers LLC Reduces Holdings in MongoDB, Inc. (NASDAQ:MDB)

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Wealthfront Advisers LLC trimmed its stake in MongoDB, Inc. (NASDAQ:MDBFree Report) by 22.5% during the first quarter, according to its most recent 13F filing with the Securities and Exchange Commission (SEC). The fund owned 2,226 shares of the company’s stock after selling 645 shares during the period. Wealthfront Advisers LLC’s holdings in MongoDB were worth $519,000 at the end of the most recent quarter.

A number of other hedge funds have also recently modified their holdings of MDB. Raymond James & Associates grew its position in shares of MongoDB by 32.0% in the 1st quarter. Raymond James & Associates now owns 4,922 shares of the company’s stock valued at $2,183,000 after buying an additional 1,192 shares during the last quarter. PNC Financial Services Group Inc. grew its position in shares of MongoDB by 19.1% in the 1st quarter. PNC Financial Services Group Inc. now owns 1,282 shares of the company’s stock valued at $569,000 after buying an additional 206 shares during the last quarter. MetLife Investment Management LLC purchased a new position in shares of MongoDB in the 1st quarter valued at $1,823,000. Panagora Asset Management Inc. grew its position in shares of MongoDB by 9.8% in the 1st quarter. Panagora Asset Management Inc. now owns 1,977 shares of the company’s stock valued at $877,000 after buying an additional 176 shares during the last quarter. Finally, Vontobel Holding Ltd. grew its position in shares of MongoDB by 100.3% in the 1st quarter. Vontobel Holding Ltd. now owns 2,873 shares of the company’s stock valued at $1,236,000 after buying an additional 1,439 shares during the last quarter. 89.22% of the stock is owned by hedge funds and other institutional investors.

Insider Transactions at MongoDB

In other news, CRO Cedric Pech sold 15,534 shares of MongoDB stock in a transaction dated Tuesday, May 9th. The shares were sold at an average price of $250.00, for a total transaction of $3,883,500.00. Following the completion of the sale, the executive now directly owns 37,516 shares in the company, valued at $9,379,000. The sale was disclosed in a legal filing with the SEC, which is accessible through this link. In other news, CRO Cedric Pech sold 15,534 shares of MongoDB stock in a transaction dated Tuesday, May 9th. The shares were sold at an average price of $250.00, for a total transaction of $3,883,500.00. Following the completion of the sale, the executive now directly owns 37,516 shares in the company, valued at $9,379,000. The sale was disclosed in a legal filing with the SEC, which is accessible through this link. Also, Director Dwight A. Merriman sold 1,000 shares of MongoDB stock in a transaction dated Tuesday, July 18th. The stock was sold at an average price of $420.00, for a total transaction of $420,000.00. Following the sale, the director now owns 1,213,159 shares of the company’s stock, valued at $509,526,780. The disclosure for this sale can be found here. Insiders sold a total of 118,427 shares of company stock worth $41,784,961 over the last ninety days. 4.80% of the stock is currently owned by corporate insiders.

Analysts Set New Price Targets

Several brokerages have issued reports on MDB. Needham & Company LLC raised their price objective on shares of MongoDB from $250.00 to $430.00 in a research report on Friday, June 2nd. William Blair reissued an “outperform” rating on shares of MongoDB in a research report on Friday, June 2nd. Oppenheimer raised their price objective on shares of MongoDB from $270.00 to $430.00 in a research report on Friday, June 2nd. Royal Bank of Canada raised their price objective on shares of MongoDB from $400.00 to $445.00 in a research report on Friday, June 23rd. Finally, Guggenheim downgraded shares of MongoDB from a “neutral” rating to a “sell” rating and raised their price objective for the stock from $205.00 to $210.00 in a research report on Thursday, May 25th. They noted that the move was a valuation call. One equities research analyst has rated the stock with a sell rating, three have given a hold rating and twenty have issued a buy rating to the company. According to MarketBeat.com, the company presently has an average rating of “Moderate Buy” and a consensus target price of $375.59.

MongoDB Stock Performance

Shares of MDB opened at $409.78 on Monday. The company has a current ratio of 4.19, a quick ratio of 4.19 and a debt-to-equity ratio of 1.44. The stock has a market cap of $28.92 billion, a P/E ratio of -87.75 and a beta of 1.13. The company has a 50-day moving average of $362.53 and a two-hundred day moving average of $268.31. MongoDB, Inc. has a 12-month low of $135.15 and a 12-month high of $439.00.

MongoDB (NASDAQ:MDBGet Free Report) last announced its earnings results on Thursday, June 1st. The company reported $0.56 earnings per share (EPS) for the quarter, topping analysts’ consensus estimates of $0.18 by $0.38. The company had revenue of $368.28 million during the quarter, compared to analysts’ expectations of $347.77 million. MongoDB had a negative net margin of 23.58% and a negative return on equity of 43.25%. The firm’s quarterly revenue was up 29.0% compared to the same quarter last year. During the same quarter in the previous year, the business posted ($1.15) earnings per share. As a group, research analysts predict that MongoDB, Inc. will post -2.8 EPS for the current fiscal year.

About MongoDB

(Free Report)

MongoDB, Inc provides general purpose database platform worldwide. The company offers MongoDB Atlas, a hosted multi-cloud database-as-a-service solution; MongoDB Enterprise Advanced, a commercial database server for enterprise customers to run in the cloud, on-premise, or in a hybrid environment; and Community Server, a free-to-download version of its database, which includes the functionality that developers need to get started with MongoDB.

Further Reading

Institutional Ownership by Quarter for MongoDB (NASDAQ:MDB)



Receive News & Ratings for MongoDB Daily – Enter your email address below to receive a concise daily summary of the latest news and analysts’ ratings for MongoDB and related companies with MarketBeat.com’s FREE daily email newsletter.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


MongoDB at $1.5B in Revenue — An Epic Growth Story – SaaStr

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Some SaaS and Cloud leaders have seen big impacts from the post-2020 hangover, but others keep accelerating. MongoDB is one of those that has never stopped. 

Instead of looking at those who are struggling, let’s take a look at the epic growth story of MongoDB, a company crushing it in 2023. 

MongoDB is at a stunning $1.5B in ARR, 29% overall growth, and is worth $29B. That’s about as good as it gets in SaaS and Cloud! 

While many less efficient public SaaS companies are trading at tough multiples of 3-4x ARR, and the average is about 6x ARR, MongoDB is almost 20x ARR. 

Let’s look at nine interesting learnings about this leading company.



#1 — 2022 Saw A Slowdown In Usage Growth, But 2023 Saw A Potential Bounce Back.

Even with MongoDB’s epic growth, 2022 saw a big slowdown in usage growth. And when you have partial utility-based pricing like MongoDB does, you see the impact fast. 

Folks pulled usage down as much as they could to save money, although logo churn remained low. 

But last quarter saw a bounce back. Usage was up, although not quite like the days of 24 months ago. So we’re seeing signs of coming off the lows. 

Are we near or past the lows in SaaS and Cloud buying patterns? We’ll see.

#2 — New Workloads Are Only A Small Percentage Of Today’s Revenue, But Are The Majority Of Tomorrows.

MongoDB thrives with workloads. You put more workloads in the database, and then you add more over time. That ACV grows and grows gently over time to $1M+ deals. 

But new workloads and new use cases of MongoDB are only about 10% of new bookings. 

Why?

Because it takes time, and this is the power of NRR and playing the long game. MongoDB isn’t trying to force it, even if they could. 

This chart shows the awesome power of high NRR and how you have to invest in customers that scale with you over time.

#3 — New Customer Count Is Up 24%—a Very Good Sign.

MongoDB is growing 29% at $1.4B in ARR, but what a lot of people at scale have trouble with is getting new customers. They milk the base, and the new customer account doesn’t remotely approach the new revenue growth rate. 

Upselling your existing base and raising prices to keep top-line growth growing isn’t adding real long-term value. Adding new customers does, and ideally, growing new customers at least half as fast as revenue. 

If you’re growing at 30%, it’s good to see at least 15% growth in your new customer base. If you’re growing at 50%, you want to see at least 25% growth. That’s your future, not just your present. 

MongoDB is crushing it, adding +24% new customers while growing +29% overall — a bullish sign for the future.

#4 — $1M+ Customers Are Up 30%, Even With Economic Headwinds. 

$1M+ and bigger customers fuel MongoDB’s growth. Many people complain that there’s no budget in Enterprise and that it’s impossible to close bigger deals. 

Sure, it’s harder to close bigger deals. Budgets are tight, and folks are looking to shrink their number of vendors. 

Yet leaders like MongoDB are growing quicker than startups, and it’s usually the opposite. 

CIOs and others want to concentrate their budget on vendors they trust, so if you have a strong brand, you can benefit from this trend. 

Mongo’s $1M+ customers are up 30%, which is pretty epic. 

If your million-dollar customers are growing at that rate and new logos are growing at almost 30%, at the end of the day, you’re probably going to breeze to $10B in ARR.

#5 — The Average Customer Spend More Than Doubles After Two Years.

High NRR is magical, but NRR as a metric isn’t a GAAP metric. It can be gamed a bit and is subject to interpretation. 

But you know what isn’t? Revenue growth. That’s what really matters. 

Here you can see the magic in MongoDB’s business model. 

After 24 months, the average customer has grown 2.1x. And that includes its biggest customers. They still grow 2x in spend over the first 24 months. 

What does this mean? 

It means you do not have to rip off the customer or get every single dollar upfront. As you build up a great sales team, they’ll get better at getting revenue up front, and that’s great up to a point. 

You don’t want to break it. Mongo lets you try its products for free and scale up. They know across thousands and thousands of customers that customer spend will grow 2.1x after two years, so they can lean in and invest in that knowledge.

#6 — Top 100 Customers Are 33% Of Revenue. Enterprise Customers Are 75% Of Revenue. 

Why is this important? 

MongoDB has become pretty Enterprise, and that’s common for people north of $1B ARR. Slack started as a free tool, and as it crossed $1B, its majority became Enterprise. At that scale, you need $100k and $1M+ deals. 

This stat isn’t just a case study of Go To Market. It’s also a case study of PLG. 

Mongo doesn’t always describe themselves this way, but they’re very PLG. They have thousands of users using it for very little or free at the long end of the tail. 

PLG is great but can take many different paths. It’s not a one-size-fits-all motion. 

Some people want to move to a PLG model because they think it’s cheaper than Enterprise sales. And the truth is, it hasn’t been very efficient for Mongo until recently. 

There are plenty of companies with great PLG self-serve viral motions that still spend a ton on sales and marketing.

#7 — 45% Of MongoDB’s Revenue Is Outside North America. Go Global If You Can, Folks! 

This is a small but important point. 

45% of MongoDB’s revenue comes from outside North America. This is true for many leaders like HubSpot, Monday, DataDog, and others. 

You need to localize your product and do it early. It’s not a waste. You will go global early. 

If you want to scale beyond early adopters and tech-focused folks, you need local offices, ideally when you have $1M ARR and a geography. You won’t close the old-school customers without it. 

The point? 

If you focus too much on the American market, you may be giving up half of your revenue.

#8 — Most Of MongoDB’s Top 100 Customers Have Been Customers For 5+ Years. 

It’s interesting to see the benefits of high NRR in this way. Most of MongoDB’s top hundred customers, which typically represent a third of their revenue and are fairly concentrated, have been customers for 5+ years. 

That’s one key to their jaw-dropping numbers — keeping their top customers forever. 

Almost three-quarters of their customers have been with them for 4+ years. They treat them well, lock them in, and grow their spend over time. 

You don’t always have to close a $1M deal upfront. You can grow into them. MongoDB does and earns it.

#9 — Almost All Cloud Leaders Got Radically More Efficient In One Year. 

How efficient should you be, and can you be in SaaS? 

In 2019, 2020, 2021, no one really cared if you were efficient, but boy, has that changed. 

A year and a half ago, Mongo had very negative margins, and it makes sense because growth was epic. 

And for the first time in their recent history, they swung to being operating margin positive or “being profitable.” They went from -40% to +5%. 

How did leaders like MongoDB, HubSpot, Monday, and Salesforce go from being pretty inefficient to efficient in a single year? 

They kept the headcount kind of flat. Everyone got about 15-20% more efficient, and everyone had to do 15-20% more work. And it worked! 

MongoDB has one of the highest public multiples in SaaS and Cloud. Efficiency is a complicated metric, and Mongo isn’t wildly efficient, even at $1.5B in ARR. But they are wildly successful. 

So the existential question remains: Are they mortgaging the future and under-hiring during a downturn? 

First, this isn’t a downturn—more of a side turn. And for now, we’ll see if they revert to less efficiency if the market continues to bounce back.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Simplicity Solutions LLC Purchases Shares of 1144 MongoDB, Inc. (NASDAQ:MDB)

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Simplicity Solutions LLC acquired a new position in MongoDB, Inc. (NASDAQ:MDBFree Report) in the 1st quarter, according to its most recent Form 13F filing with the Securities and Exchange Commission. The firm acquired 1,144 shares of the company’s stock, valued at approximately $267,000.

A number of other hedge funds and other institutional investors also recently made changes to their positions in the business. Arizona State Retirement System lifted its position in shares of MongoDB by 3.6% during the 1st quarter. Arizona State Retirement System now owns 19,975 shares of the company’s stock valued at $4,657,000 after acquiring an additional 695 shares during the period. Chicago Partners Investment Group LLC acquired a new position in shares of MongoDB during the first quarter valued at about $257,000. Fiera Capital Corp boosted its position in shares of MongoDB by 40.3% during the 1st quarter. Fiera Capital Corp now owns 222,112 shares of the company’s stock worth $51,779,000 after purchasing an additional 63,760 shares in the last quarter. Valley National Advisers Inc. grew its stake in shares of MongoDB by 37.2% in the 1st quarter. Valley National Advisers Inc. now owns 568 shares of the company’s stock valued at $127,000 after buying an additional 154 shares during the period. Finally, Xcel Wealth Management LLC increased its position in MongoDB by 1.3% in the 1st quarter. Xcel Wealth Management LLC now owns 5,988 shares of the company’s stock valued at $1,396,000 after buying an additional 79 shares in the last quarter. 89.22% of the stock is owned by institutional investors and hedge funds.

Wall Street Analyst Weigh In

Several equities research analysts have recently commented on the company. Needham & Company LLC boosted their price objective on MongoDB from $250.00 to $430.00 in a report on Friday, June 2nd. The Goldman Sachs Group upped their price objective on shares of MongoDB from $420.00 to $440.00 in a report on Friday, June 23rd. Capital One Financial began coverage on shares of MongoDB in a report on Monday, June 26th. They set an “equal weight” rating and a $396.00 price objective for the company. Sanford C. Bernstein upped their target price on shares of MongoDB from $257.00 to $424.00 in a report on Monday, June 5th. Finally, Piper Sandler lifted their price target on MongoDB from $270.00 to $400.00 in a research note on Friday, June 2nd. One equities research analyst has rated the stock with a sell rating, three have given a hold rating and twenty have given a buy rating to the company. According to data from MarketBeat, the stock presently has a consensus rating of “Moderate Buy” and a consensus price target of $375.59.

MongoDB Stock Down 0.7 %

Shares of NASDAQ MDB opened at $409.78 on Monday. The company has a debt-to-equity ratio of 1.44, a current ratio of 4.19 and a quick ratio of 4.19. The stock has a market capitalization of $28.92 billion, a PE ratio of -87.75 and a beta of 1.13. MongoDB, Inc. has a one year low of $135.15 and a one year high of $439.00. The business’s fifty day moving average price is $362.53 and its 200-day moving average price is $268.31.

MongoDB (NASDAQ:MDBGet Free Report) last released its quarterly earnings data on Thursday, June 1st. The company reported $0.56 EPS for the quarter, beating the consensus estimate of $0.18 by $0.38. MongoDB had a negative net margin of 23.58% and a negative return on equity of 43.25%. The business had revenue of $368.28 million for the quarter, compared to the consensus estimate of $347.77 million. During the same period in the prior year, the firm earned ($1.15) earnings per share. The company’s revenue was up 29.0% compared to the same quarter last year. On average, equities research analysts anticipate that MongoDB, Inc. will post -2.8 EPS for the current year.

Insider Activity at MongoDB

In related news, Director Dwight A. Merriman sold 2,000 shares of the stock in a transaction dated Thursday, May 4th. The stock was sold at an average price of $240.00, for a total transaction of $480,000.00. Following the sale, the director now owns 1,223,954 shares of the company’s stock, valued at approximately $293,748,960. The sale was disclosed in a legal filing with the Securities & Exchange Commission, which is accessible through the SEC website. In other news, Director Dwight A. Merriman sold 3,000 shares of the company’s stock in a transaction on Thursday, June 1st. The shares were sold at an average price of $285.34, for a total value of $856,020.00. Following the transaction, the director now owns 1,219,954 shares of the company’s stock, valued at $348,101,674.36. The transaction was disclosed in a legal filing with the SEC, which is available at the SEC website. Also, Director Dwight A. Merriman sold 2,000 shares of MongoDB stock in a transaction on Thursday, May 4th. The shares were sold at an average price of $240.00, for a total value of $480,000.00. Following the sale, the director now directly owns 1,223,954 shares of the company’s stock, valued at approximately $293,748,960. The disclosure for this sale can be found here. In the last ninety days, insiders sold 118,427 shares of company stock worth $41,784,961. Corporate insiders own 4.80% of the company’s stock.

MongoDB Profile

(Free Report)

MongoDB, Inc provides general purpose database platform worldwide. The company offers MongoDB Atlas, a hosted multi-cloud database-as-a-service solution; MongoDB Enterprise Advanced, a commercial database server for enterprise customers to run in the cloud, on-premise, or in a hybrid environment; and Community Server, a free-to-download version of its database, which includes the functionality that developers need to get started with MongoDB.

Read More

Want to see what other hedge funds are holding MDB? Visit HoldingsChannel.com to get the latest 13F filings and insider trades for MongoDB, Inc. (NASDAQ:MDBFree Report).

Institutional Ownership by Quarter for MongoDB (NASDAQ:MDB)



Receive News & Ratings for MongoDB Daily – Enter your email address below to receive a concise daily summary of the latest news and analysts’ ratings for MongoDB and related companies with MarketBeat.com’s FREE daily email newsletter.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


$1000 Invested In MongoDB 5 Years Ago Would Be Worth This Much Today – Benzinga

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

MongoDB MDB has outperformed the market over the past 5 years by 36.9% on an annualized basis producing an average annual return of 46.96%. Currently, MongoDB has a market capitalization of $28.82 billion.

Buying $1000 In MDB: If an investor had bought $1000 of MDB stock 5 years ago, it would be worth $6,652.65 today based on a price of $408.34 for MDB at the time of writing.

MongoDB’s Performance Over Last 5 Years

Finally — what’s the point of all this? The key insight to take from this article is to note how much of a difference compounded returns can make in your cash growth over a period of time.

This article was generated by Benzinga’s automated content engine and reviewed by an editor.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Java News Roundup: WildFly 29, JDK 21 in RDP2, Helidon 4.0-M1, Oracle Critical Patch Updates

MMS Founder
MMS Michael Redlich

Article originally posted on InfoQ. Visit InfoQ

This week’s Java roundup for July 17th, 2023 features news from JDK 22, JDK 21, JDK 20, BellSoft, releases of: Spring Boot; Spring Framework; Spring for GraphQL; Spring Session; Spring Integration and Spring HATEOAS; WildFly 29, Quarkus 3.2.1, Helidon 4.0-M1, Micronaut 4.0.1, Hibernate 6.3-CR1, MicroProfile Config 3.1, Infinispan 14.0.13, PrimeFaces 12.0.5, OpenXava 7.1.3 and Gradle 8.3-RC1.

JDK 20

JDK 20.0.2, the second maintenance release of JDK 20, along with security updates for JDK 17.0.8, JDK 11.0.20 and JDK 8u381, were made available as part of Oracle’s Releases Critical Patch Update for July 2023.

JDK 21

As per the JDK 20 release schedule, Mark Reinhold, chief architect, Java Platform Group at Oracle, formally declared that JDK 21 has entered Rampdown Phase Two to signal continued stabilization for the GA release in September 2023. Critical bugs, such as regressions or serious functionality issues, may be addressed, but must be approved via the Fix-Request process.

The final set of 15 features in JDK 21 will include:

Build 32 of the JDK 21 early-access builds was also made available this past week featuring updates from Build 31 that include fixes to various issues. Further details on this build may be found in the release notes.

JDK 22

Build 7 of the JDK 22 early-access builds was also made available this past week featuring updates from Build 6 that include fixes to various issues. More details on this build may be found in the release notes.

For JDK 22 and JDK 21, developers are encouraged to report bugs via the Java Bug Database.

BellSoft

Also concurrent with Oracle’s Critical Patch Update (CPU) for July 2023, BellSoft has released CPU patches for versions 17.0.7.0.1, 11.0.19.0.1 and 8u381 of Liberica JDK, their downstream distribution of OpenJDK. In addition, Patch Set Update (PSU) versions 20.0.2, 17.0.8, 11.0.20 and 8u382, containing CPU and non-critical fixes, have also been released.

Spring Framework

The first milestone release of Spring Boot 3.2.0 delivers bug fixes, improvements in documentation, dependency upgrades and new features such as: support for the JDK HttpClient class and Jetty in the ClientHttpRequestFactories class; allow key password to be set for an instance of the PemSslStoreBundle class; and deprecation of the DelegatingApplicationContextInitializer and DelegatingApplicationListener classes in favor of registering each delegate programmatically or in the spring.factories property. Further details on this release may be found in the release notes.

Versions 3.1.2, 3.0.9 and 2.7.14 of Spring Boot have also been released with dependency upgrades, improvements in documentation and notable bug fixes such as: the ImportsContextCustomizer test class does not support the @AliasFor annotation; the equals() method defined in the ConfigurationPropertyName class is not symmetric when the element contains trailing dashes; and an auto-configuration failure with a NoSuchMethodError exception due to the removal of Oracle-related methods from the Flyway FluentConfiguration class. More details on these releases may be found in the release notes for version 3.1.2, version 3.0.9 and version 2.7.14.

The third milestone release of Spring Framework 6.1 delivers new features such as: new configuration options for virtual threads on JDK 21 with a dedicated VirtualThreadTaskExecutor class and a new setVirtualThreads() method added to the SimpleAsyncTaskExecutor class; Spring MVC now throws NoHandlerFoundException or NoResourceFoundException exceptions to allow for consistent handling of HTTP 404 errors that includes the RFC 7807, Problem Details for HTTP APIs, error response; and support for the BeanPropertyRowMapper and DataClassRowMapper classes in the R2DBC project. Further details on this release may be found in the what’s new page.

The release of Spring for GraphQL 1.2.2 released delivers bug fixes, improvements in documentation, dependency upgrades and these new features support for: Kotlin coroutines in the @GraphQlExceptionHandler annotation; and the ValueExtractor interface for the ArgumentValue class for improved bean validation. More details on this release may be found in the release notes.

The first milestone release of Spring Session 3.2.0 provides dependency upgrades and these new features: a new SessionIdGenerationStrategy interface that specifies a strategy for generating session identifiers; and eliminating multiple calls to the commitSession() method in the private SessionRepositoryRequestWrapper class defined in the SessionRepositoryFilter class upon calling the include() method defined in the RequestDispatcher interface. Further details on this release may be found in the release notes.

The first milestone release of Spring Integration 6.2 ships with notable changes such as: removal of the unused JDK ThreadLocal class from the RedisStoreMSource class; an optimization of the maybeIndex() method in the JsonPropertyAccessor class; and the addition of the @LogLevels annotation for the SftpRemoteFileTemplateTests class for tracing diagnostics. More details on this release may be found in the release notes.

Versions 2.2.0-M2, 2.1.2 and 2.0.6 of Spring HATEOAS have been released that feature a fix for a regression in the AOT reflection metadata generation so that applications building native images continue to work on the upcoming releases of Spring Boot. Further details on these releases may be found in the release notes for version 2.2-M2, version 2.1.2 and version 2.0.6.

WildFly

Red Hat has released version 29 of WildFly featuring bug fixes, internal housekeeping from the migration to Jakarta EE 10 and new features such as: the ability to secure the management console with WildFly’s native support for OpenID Connect; a new Keycloak SAML Adapter feature pack that uses Galleon Provisioning to add Keycloak’s SAML adapter to an installation of WildFly; and support for MyFaces 4.0, a compatible implementation of Jakarta Faces specification, using the new WildFly MyFaces 1.0.0.Beta1 feature pack.

Quarkus

Red Hat has also released version 3.2.1.Final of Quarkus featuring the addition of the OpenAPI schema and UI to be published when the management interface is enabled along with the ability to configure using the quarkus.smallrye-openapi.management.enabled property. Other notable changes include: support for serialization of class fields with Jackson in native mode; avoiding a race condition on adding a content-length header if it already exists resulting in the response never being sent; and the removal of a default token customizer from an OIDC Microsoft provider that breaks signature verification. More details on this release may be found in the release notes.

Helidon

The first milestone release of Helidon 4.0.0 delivers notable changes: removal of the Helidon Reactive WebServer and WebClient that were based on Netty as Helidon fully commits to Project Níma with new implementations based on virtual threads that have a blocking style API; conversion of other modules with reactive APIs to blocking style APIs; the introduction of Helidon Injection, a deterministic, source-code-first, compile time injection framework; support for MicroProfile 6 and Jakarta 10 Core Profile running on the Níma WebServer; and initial adoption of Helidon Builders, a builder code generation framework. It is important to note that this is a preview release and the new features are a work-in-progress. Further details on this release may be found in the release notes.

Micronaut

The Micronaut Foundation has provided version 4.0.1 of Micronaut Framework, the first maintenance release since the release of Micronaut 4.0, featuring: dependency upgrades; improvements in documentation; a fix for a constructor copy; and the addition of a propagated context to the reactive controller methods in the ReactorPropagation class. More details on this release may be found in the release notes.

Hibernate

Versions 6.3.0-CR1 and 6.2.7 of Hibernate ORM have been released that ships with: new documentation that adds an introductory Hibernate 6 guide and the Hibernate Query Language syntax and feature guide; the ability to generate DAO-style methods for named queries as part of the JPA static metamodel generator; and a new @Find annotation for which arbitrary methods may now be processed by the generator to create finder methods similar to query methods.

MicroProfile

On the road to MicroProfile 6.1, the MicroProfile Working Group has provided the first release candidate of MicroProfile Config 3.1 featuring notable changes such as: the MissingValueOnObserverMethodInjectionTest class fails due to the ConfigObserver bean, defined as @ApplicationScoped and final, which is beyond the scope of the DeploymentException the test was designed to throw; failing TCK tests due to empty beans being declared as all when they should be declared as annotated; and an upgrade of all tests to be compatible with Jakarta Contexts and Dependency Injection 4.0. Further details on this release may be found in the list of issues.

Infinispan

Versions 14.0.13 and 14.0.12 of Infinispan have been released with notable changes such as: a fix for the SearchException while rebuilding an index; filter out illegal characters found in memory pool names; and a fix for the RemoteStore class leaking Netty event loop client threads. More details on these releases may be found in the release notes for version 14.0.13 and 14.0.12.

PrimeFaces

Versions 12.0.5, 11.0.11, 10.0.18 and 8.0.23 of PrimeFaces have been released with notable fixes such as: the FullAjaxExceptionHandler class does not display an error page when an exception is thrown during an AJAX request; a NullPointerException from the encodeScript() method defined in the InputTextRenderer class due to an empty counter bypassing the conditional that checks for null; and fixes for the user interface. Further details on these releases may be found in the changelogs for version 12.0.5, version 11.0.11, version 10.0.18 and version 8.0.23.

OpenXava

The release of OpenXava 7.1.3 features dependency upgrades and notable fixes such as: one security vulnerability in the dependencies; the project does not start if the workspace is in a path with special characters or accents; and use of the @OnChange annotation on the first property of a collection view in an @OneToMany annotation with no cascade displays an error. More details on this release may be found in the release notes.

Gradle

The first release candidate of Gradle 8.3 delivers improvements such as: support for JDK 20; faster Java compilation using worker processes to run the Java compiler as a compiler daemon; the ability to experiment with the Kotlin K2 compiler; and improved output from the CodeNarc plugin. Further details on this release may be found in the release notes.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Big Data Analytics in Telecom Market 2023 Trends with Analysis on … – Glasgow West End Today

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Global Big Data Analytics in Telecom Market Growth (Status and Outlook) 2022-2028

Press Release – (Orbisresearch.com) –  An extensive and in-depth analysis of the worldwide Big Data Analytics in Telecom market is provided by the worldwide Big Data Analytics in Telecom Market Assessment. This study seeks to offer insightful analyses of the market interactions, trends, chances for expansion, and difficulties faced by industry participants. The following are some distinctive features that set this resort apart from others:

Rigid Methodology for the Research: The report uses a thorough research methodology that draws on both primary and secondary data sources, such as interviews with industry professionals, annual reports from companies, and databases for market analysis. This guarantees the veracity and correctness of the information provided.

Detailed Market Assessment: The study provides a comprehensive picture of the global Big Data Analytics in Telecom market, taking into account numerous categories, geographies, and industry verticals. Making informed decisions is made possible by its thorough knowledge of market trends, drivers, restraints, and opportunities.

        Request a pdf sample report : https://www.orbisresearch.com/contacts/request-sample/6677139               

Company Situation: The research offers a thorough analysis of the market’s competitors, including the major players’ market strategies, product lines, and most recent advancements. This aids readers in comprehending the degree of market competition and placement of significant players in the Big Data Analytics in Telecom sector. 

Market Resources and Methods: 

Several marketing tools and techniques were used to create the Global  Big Data Analytics in Telecom Market Report, including:

Market segmentation: The market is divided into groups depending on a variety of variables, including type, application, end-user, and geography. This enables a thorough examination of the market share and development potential of each sector.

Data Gathering and Validation: To ensure accuracy and validity, data were gathered from dependable primary and secondary sources. Utilizing sophisticated statistical tools and models, market size estimation and forecasts were carried out.

SWOT Assessment: A thorough SWOT analysis was conducted to determine the internal and external elements affecting the Big Data Analytics in Telecom market. SWOT stands for Strengths, Weaknesses, Opportunities, and Threats. Key market trends and obstacles are made clear by this analysis.

            Buy the report at https://www.orbisresearch.com/contact/purchase-single-user/6677139

Market Types:

Cloud-based
On-premise

Big Data Analytics in Telecom Market Applications:

Small and Medium-Sized Enterprises
Large Enterprises  

Derivation of Market Value and CAGR: 

The Global  Big Data Analytics in Telecom Market Report’s market valuation and compound annual growth rate, or CAGR, are calculated using a careful procedure that includes:

 Data collecting: To obtain market-related information, including historical and current data, extensive data collection is done through primary and secondary sources.

Market Size Estimation: To calculate the market size and predict future trends, a number of statistical approaches are used, including regression analysis, time series analysis, and exponential smoothing.

CAGR Calculation: The CAGR is computed by calculating the yearly compound growth rate over a given time period. This offers a consistent way to assess the rate of market expansion. 

Regional evaluations 

Comprehensive regional studies covering important areas including North America, Europe, the Asia-Pacific region, Latin America, and the Middle East & Africa are included in the Global  Big Data Analytics in Telecom Market Report. The analyses of the regions emphasize:

Market Size and Growth: The research evaluates each region’s market size, potential for growth, and market share. The main market growth drivers and new business possibilities in each region are identified.

Competitive Environment: The regional studies shed light on the competitive environment by highlighting the presence of important competitors, their market share, and their strategic activities in each region.

Regulation Environment: The paper assesses how each region’s regulatory environment, business practices, and government programs affect the Big Data Analytics in Telecom market.

         Key Players in the Big Data Analytics in Telecom market:

Microsoft Corporation
MongoDB
United Technologies Corporation
JDA Software, Inc.
Software AG
Sensewaves
Avant
SAP
IBM Corp
Splunk
Oracle Corp.
Teradata Corp.
Amazon Web Services
Cloudera

Requirements for the report on the global  Big Data Analytics in Telecom market: 

The following criteria were taken into account to provide a thorough and accurate Global  Big Data Analytics in Telecom Market Report:

·        Thorough market analysis and research

·        Gathering trustworthy and current information from primary and secondary sources

·        Application of cutting-edge statistical methods

·        Comprehensive knowledge of market dynamics, industry trends, and the competitive environment Integration of both quantitative and qualitative information

·        Analyzing the market’s motivators, inhibitors, possibilities, and problems 

For the client’s benefit: 

In comparison to other reports, our Global  Big Data Analytics in Telecom Market report gives clients a competitive edge in the following areas: 

·        Insights about industry trends, development factors, and obstacles that are accurate and trustworthy.

·        Detailed competition analysis to assist businesses in understanding market positioning and making strategic decisions.

·        Thorough regional assessments that provide a thorough insight of market dynamics in various locations.

·        Strategies for new entrants and established firms to enter the market that can be put into practice.

 
      Do Inquiry before Accessing Report at: https://www.orbisresearch.com/contacts/enquiry-before-buying/6677139         

COVID-19 Influence and Avoidance Techniques: 

The global  Big Data Analytics in Telecom market has been considerably influenced by the COVID-19 epidemic. Insights on the pandemic’s impact are provided in this report, including: 

·        Challenges to the supply chain and market disruption.

·        Consumer behavior and demand patterns changing.

·        economic effects and governmental regulations.

·        Market participants have used tactics including remote work, digital transformation, and novel product offerings to overcome obstacles.

For business professionals, investors, and other interested parties looking to gain a thorough knowledge of the Big Data Analytics in Telecom market’s current situation, future possibilities, and business potential, this Global  Big Data Analytics in Telecom Market Research is an invaluable resource.

About Us:

Orbis Research (orbisresearch.com) is a single point aid for all your market research requirements. We have a vast database of reports from leading publishers and authors across the globe. We specialize in delivering customized reports as per the requirements of our clients. We have complete information about our publishers and hence are sure about the accuracy of the industries and verticals of their specialization. This helps our clients to map their needs and we produce the perfect required market research study for our clients.

Contact Us:

Hector Costello
Senior Manager – Client Engagements
4144N Central Expressway,
Suite 600, Dallas,
Texas – 75204, U.S.A.

Phone No.: USA: +1 (972)-591-8191 | IND: +91 895 659 5155 

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Building Cyber-physical Systems with Agile: Learnings from QCon New York

MMS Founder
MMS Ben Linders

Article originally posted on InfoQ. Visit InfoQ

In her QCon New York 2023 talk Success Patterns for building Cyber-Physical Systems with Agile Robin Yeman explored how we can use agile practices at scale for large initiatives with multiple teams, building cyber-physical safety-critical systems with a scope that includes software, firmware, and hardware development.

Development approaches like agile and DevOps have benefitted small initiatives with a single team-building software, Yeman said. They help them to respond to change, reduce product delivery schedules, reduce product cost, increase product quality, and increase employee morale.

Agile helps to deal with complexity, Yeman said. The motivation to migrate to agile is a demand for faster development, difficulty managing change, and increased product complexity, she argued.

We need to decompose by products instead of functions to break down systems correctly and make agile work, Yeman said. Starting with epics that focus on the business outcome, we can come up with features that are comprised of stories describing the needed functionality. Stories can be split up into tasks of no more than eight hours.

It is more expensive to make changes in hardware due to constraints of physicality, but there are tools like simulators, emulators, digital shadows, digital twins, and 3D printers that can help us to get faster feedback. Yeman suggested beginning with your digital twins, virtual replicas of physical assets, try out different scenarios. Moving physical systems into the digital space enable teams to bring down risks and decrease the cost of learning, Yeman said.

Regulatory and safety standards are not in conflict with agile, Yeman mentioned, it’s possible to comply with standards and regulatory requirements using agile. Yeman gave examples of agile and hardware like building a car with Joe Justice and the agile hardware team at Lockheed Martin that develops fleet ballistic missiles.

Yeman concluded her talk by presenting the industrial DevOps principles. Some examples of them are:

  • Organize for the Flow of Value: align your multiple product teams to enable the flow and delivery of value.
  • Architect for Speed: the use of architecture to reduce dependencies and improve the speed of change.
  • Integrate Early and Often: the different levels and types of integration points across large complex systems.
  • Apply a Growth Mindset: the need to continuously learn, innovate, and adapt to the changes around us in order to stay competitive.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


New Empirical Research Report on Enterprise Database Management Suite Market by …

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Worldwide Market Reports announces the publication of its most recently generated research report titled, “Enterprise Database Management Suite Market – Forecast to 2030”, which offers a holistic view of the Enterprise Database Management Suite market through systematic segmentation that covers every aspect of the target market. Enterprise Database Management Suite with 100+ market data Tables, Pie Chat, Graphs, and Figures spread through Pages and easy-to-understand detailed analysis. The information is gathered based on modern floats and requests identified with the administrations and items. Enterprise Database Management Suite Market report includes historic data, present market trends, environment, technological innovation, upcoming technologies, and the technical progress in the related industry

Click Here to Get a Sample Copy of the Report: https://www.worldwidemarketreports.com/sample/945840

Our Sample Report May Includes:

📈 2030 Updated Report Introduction, Overview, and In-depth industry analysis.
📈 100+ Pages Research Report (Inclusion of Updated Research).
📈 Provide Chapter-wise guidance on Requests.
📈 2023 Updated Regional Analysis with Graphical Representation of Size, Share & Trends
📈 Includes Updated List of tables & figures.
📈 Updated Report Includes Top Market Players with their Business Strategy, Sales Volume, and Revenue Analysis.

A specialized assessment review team in conjunction with core business professionals has compiled this detailed study on the Enterprise Database Management Suite market through an accurate representation of this industry’s landscape. This will allow for analytics-based business strategies to be formulated. The objective of this report is to aid our esteemed client to develop strategies that will optimize existing business strategies and practices, thereby enabling them to achieve success. The frameworks included in this comprehensive dossier will include insightful data on prospective mergers & acquisitions, as well as generated revenues recorded by various players in the Enterprise Database Management Suite market. A comprehensive understanding of this industry’s segmentation will allow our client to make the right decisions when engaged in the Enterprise Database Management Suite market. 

The report further explores the key business players along with their in-depth profiling:

Oracle
Microsoft
IBM
SAP
AWS
MongoDB
Google
Broadcom
MarkLogic
MariaDB
InterSystems
Cloudera
Teradata
Vertica
Alibaba Cloud
Knack

Enterprise Database Management Suite Market Segmentation:

Enterprise Database Management Suite Market Types:

Relational Database
Nonrelational Database

Enterprise Database Management Suite Market Application/ End-Users:

SMEs
Large Enterprise 

Enterprise Database Management Suite Market Regional Analysis –

Geographically, the following regions utilization, revenue, market share, and growth rate are studied in detail:

‣ North America (United States, Canada, Mexico)
‣ Europe (Germany, UK, France, Italy, Spain, Others)
‣ Asia-Pacific (China, Japan, India, South Korea, Southeast Asia, Others)
‣ The Middle East and Africa (Saudi Arabia, UAE, South Africa, Others)
‣ South America (Brazil, Argentina, Others)

Enquire Before Purchasing this Report- https://www.worldwidemarketreports.com/quiry/945840

Trends and Opportunities of the Global Enterprise Database Management Suite Market

The global Enterprise Database Management Suite market has seen several trends in recent years, and understanding these trends is crucial to stay ahead of the competition. The global Enterprise Database Management Suite market also presents several opportunities for players in the market. The increasing demand for Enterprise Database Management Suite in various industries presents several growth opportunities for players in the market. 

Key Benefits for Stakeholders:

⏩ The study includes a comprehensive analysis of current Enterprise Database Management Suite Market trends, estimations, and market size dynamics from 2023 to 2030 in order to identify the most potential prospects.
⏩ The five forces study by Porter underlines the role of buyers and suppliers in aiding stakeholders in making profitable business decisions and expanding their supplier-buyer network.
⏩ In-depth research, as well as market size and segmentation, can assist you in identifying current Enterprise Database Management Suite Market opportunities.
⏩ The largest countries in each area are mapped based on their market revenue contribution.
⏩ The Enterprise Database Management Suite Market research report provides an in-depth analysis of the top competitors in the Enterprise Database Management Suite Market.

The Following Topics are Covered in the Report:

☛ A worldwide manufacturing market research approach that is systematic.
☛ Comprehensive industry analysis with main analyst perspectives.
☛ An in-depth examination of the macro and micro factors that influence the industry, accompanied by key recommendations.
☛ Regional legislation and other government policies affecting the comprehensive industry are examined.
☛ Business determinants that are stimulating the Enterprise Database Management Suite industry are discussed.
☛ Market segments are detailed and comprehensive, with sales forecasts distributed regionally.
☛ Profiles of industry leaders in depth, as well as recent developments.

Buy this Premium Report Here Available Now: https://www.worldwidemarketreports.com/promobuy/945840

[FAQ]

1. What is the scope of this report?
2. Does this report estimate the current market size?
3. Does the report provides market size in terms of – Value (US$ Mn) and Volume (thousand ton/metric ton/cubic meter) – of the market?
4. Which segments are covered in this report?
5. What are the key factors covered in this report?
6. Does this report offer customization?

The report concludes with a summary of the key findings, implications for stakeholders in the Enterprise Database Management Suite market, and recommendations for future actions based on the report’s analysis.

Overall, the Enterprise Database Management Suite market research report is a valuable tool for businesses and investors seeking to gain a deeper understanding of the Enterprise Database Management Suite market and make informed decisions based on the analysis provided.

Contact Us:

Mr. Shah
Worldwide Market Reports
533 Airport Boulevard, Suite 400, Burlingame,
CA 94010, United States

📞 U.S.:+1 415 871 0703
📞 UK :+44-203-289-4040
📞 Japan :+81-50-5539-1737
📞 India :+91-848-285-0837
Email: [email protected]
Website: https://www.worldwidemarketreports.com

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Implementing Application Level Encryption at Scale: Insights from Atlassian’s Use of AWS and Cryptor

MMS Founder
MMS Eran Stiller

Article originally posted on InfoQ. Visit InfoQ

Atlassian recently published how it performs Application Level Encryption at scale on AWS while utilising high cache hit rates and maintaining low costs. Atlassian’s solution runs over 12,500 instances and manages over 1,540 KMS keys. It performs over 11 billion decryptions and 811 million encryptions daily, costing $2,500 per month versus a potential $1,000,000 per month using a naive solution.

Cryptor is an encryption library developed by Atlassian to suit their specific Application Level Encryption (ALE) needs at scale in multi-region environments. It is a thin wrapper over the AWS Encryption SDK. Atlassian engineers designed it to offer automated key management, high availability (similar to Atlassian’s Tenant Context Service), distributed caching, and the enforcement of soft limits to enable high-scale operations. Developers can integrate Cryptor as a library or a sidecar, exposing its functionality as HTTP and gRPC APIs.

David Connard, Principal Developer at Atlassian, explains why Atlassian chose to implement ALE wherever possible:

With ALE, sensitive data is encrypted before storage and only decrypted when required (i.e. at the point of use, in the application code). An attacker who gains access to the datastore (or, more commonly, who gains access to a historic replica of it, for example, a backup stored in a less secure location) does not automatically gain access to your sensitive data.

Connard explains that implementing ALE creates significant operational concerns. Implementors should never lose the ability to decrypt the data, encryption key integration should always be protected, and engineers should consider the performance impacts of adding encryption, as ALE adds significant computational effort to the application.

At the heart of Atlassian’s ALE is Envelope Encryption. Envelope Encryption is a cryptographic technique used to secure data. It works by encrypting the data with a unique key called a “data key”. Engineers then encrypt it with another key, the “root key”. Then they bundle the encrypted ciphertext and the encrypted data key in an “envelope encrypted payload” and persist this payload to the data store.

The benefits of using envelope encryption over direct encryption with the root key are that each data key is only used for a small subset of your data, the encryption materials can be cached and re-used across multiple encryption requests, and it allows for fast symmetric encryption algorithms.

Envelope Encryption is well-supported by the AWS Encryption SDK. However, the SDK is mainly designed for single-region scenarios, whereas Atlassian has a heavily multi-region use case, with KMS keys stored and service running in multiple regions. Also, AWS’ SDK enforces strict correctness, which makes sense at lower performance scales. However, Atlassian had to loosen some restrictions and enforce them softly to handle its high-scale operations.

Atlassian also encrypts all of its data at rest. However, encryption at rest provides no defence against many types of data exfiltration possibilities, such as a failure to restrict access to the data store, an authorised application doing something unsafe with restricted data at runtime, or legitimate access to data stores by staff for debugging purposes, or to resolve incidents.

Atlassian intended to open source the library one day. However, it currently needs to be added to their internal roadmaps. According to Connard, “It is certainly something we would consider if the demand and interest exist.”

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Presentation: Streaming from Apache Iceberg – Building Low-Latency and Cost-Effective Data Pipelines

MMS Founder
MMS Steven Wu

Article originally posted on InfoQ. Visit InfoQ

Transcript

Steven Wu: I’m going to talk about streaming from Iceberg data lake. We all want to build data pipelines that have low latency, are cost effective, and easy to operate. I’m going to show you how to achieve those goals by using Apache Iceberg as a streaming source. I’ll first give a quick introduction about Apache Iceberg and Apache Flink. Then we’ll look at some of the motivation why we’re thinking about using Apache Iceberg as a streaming source. Next, we’ll talk about high level-design of the Iceberg streaming source and some of the benefits it brings in. After that, we’re going to dive deeper into the Flink watermark alignment, and how it works in the Iceberg streaming source. Finally, we’re going to look at the evaluation results.

Introduction to Iceberg and Flink

Apache Iceberg is an open table format for huge analytic datasets. What is a table format? You might be familiar with a file format like Apache Parquet. It is about organizing records within our data file. Table format like Apache Iceberg is about organizing data files within a table. Apache Iceberg was designed to adjust the corruptness and performance issue of the old Hive table format. Iceberg brings numerous benefits.

First, it provides serializable isolation. All table changes are atomic. Readers will never see partial or uncommitted data. Iceberg supports fast scan planning with advanced filtering using partitions and column level statistics. Iceberg supports safe schema and partition evolution. It supports time travel, which enables reproducible query using the exact same snapshot in the Iceberg table. Recently, Iceberg added a feature like branch and tagging. With branching, we can implement write, audit, and patch pattern. You can write new data to a staging branch first, run the data validation. Once the validation passed, you can merge the staging branch to the main trunk. You can merge the data at the Core, if you are familiar with the GitHub Git workflow.

Where does Apache Iceberg fit in the big data ecosystem? At the bottom level, we have data files like Parquet or Orc. They’re stored on storage distributed file system, like HDFS or cloud storage like S3. Then we have a table format like Apache Iceberg, Delta Lake, or Apache Hudi. They provide SQL table semantics. Table format, they’re all about metadata management. Compute Engines like Apache Flink, Apache Spark, Trino, they integrate with the table format to access underlying data files. Apache Flink is a distributed stream processing engine. It’s one of the most popular stream processing engines. It is highly scalable.

A single Flink job can process trillions of events per day. It can run on sizable cores, and maintains terabyte of state in a single Flink job. Flink checkpoint provides strong state consistency or exactly once processing semantics. If the sink also supports transaction write, like Iceberg, we can achieve end-to-end exact once also. Flink’s all event time processing semantics, it uses a watermark to reason about the event time progress within the application. Watermark also provide a flexible way to trade off between the completeness and the latency of the results. Flink provides a layered API from low-level data stream API to a high-level table API and SQL.

Motivation

Next, I’m going to explain why we’re thinking about Iceberg as a streaming source. Here are the data pipelines you might be familiar with, where a device sends the raw events to edge API services, who then ingest the raw data into a Kafka message queue. This ingest [inaudible 00:04:46], a second or less. Then with stream processing engines like Flink, read data from Kafka and write a raw event to a data lake like Iceberg, after that is mostly batch jobs for ETL, feature engineering, and maybe offline model training. Those batch jobs are typically scheduled, hourly or daily. The overall latency is at least a few hours, if not days. We know latency matters for machine learning pipelines.

The industry is increasingly shifting towards real-time machine learning for online learning and online inference. How can we reduce the latency of our data pipeline? Kafka is probably the most popular streaming storage for Flink. It can achieve sub-second read latency. If you care about latency, why don’t we switch everything to Kafka and Flink? There are pipelines built that way. Kafka is a fantastic streaming storage. I love Kafka. It can achieve sub-second read latency, which is great if you have really low latency requirement. As we know, system design is all about tradeoffs, nobody is good at everything. There are a few things in Kafka it’s not very pleasant to work with. If you operate a stateful storage system, you know it’s not easy. Setting up queries can be painful. Because we cannot give code to autoscale our stateful storage system, so we have to do careful capacity planning. We have to worry about the burst backfill workload and how to achieve isolation so that the bursty backfill workload does not affect the live traffic.

In a Flink Meetup, Sundaram from Netflix, demonstrated it’s 38 times more expensive to store long-term data in Kafka compared to Iceberg. Kafka also recently introduced tiered storage within Kafka, that can help reduce difference, but it won’t be able to completely bridge the gap. That’s why many companies adopt the practice of tiered storage. They use Kafka to store the recent data for the last few hours or maybe few days. Then they store the long-term data on the Iceberg data lake. Kafka is useful for serving stream processing workload, and Iceberg used to serve the batch workload.

For availability reasons, Kafka brokers are typically placed on different availability zones, that could incur cross-AZ network traffic, from producer to broker, or broker to broker for replication, or broker to consumer. I did a back of envelope calculation using publicly listed pricing for a major cloud provider. My calculation shows that the cross-AZ network costs can be 10 times more expensive compared to the broker costs, compute and storage combined. Kafka do provide a rack-aware partition assignment. It can avoid the cross-AZ traffic from the broker to consumer, but we still have the producer to broker and the broker to broker cross-AZ traffic. That’s still pretty significant. Kafka source doesn’t provide filtering or projection at the broker side. If we had different consumer jobs, they’re interested in different subset of data, like by filter or projection. All of them have to pull down the big pipe, and apply the filter and projection at the consumer side. This is obviously inefficient.

People have been setting up routing jobs just to apply the filter and projection to produce smaller substream, such as all the downstream jobs, they only need to consume the smaller substream. This definitely helps to improve the efficiency, but also, you create an actual routing job to maintain and the data duplications. In Kafka, a partition is an unbounded stream of workers. Kafka source statically assigned the partitions to the workers during the job initialization. This assignment remains static throughout the lifecycle of the job unless that topology changed, for example, adding a new worker.

This leads to a few limitations, first, in production, where sometimes we can see outlier worker node whose performance is significantly lower compared to the other peers. In this case, because the partition is unbounded stream, there’s no work sharing or work stealing. Even if other workers they have extra capacity to spare, they cannot pick up the slack. Second, Kafka source parallelism is limited by the number of Kafka partitions. Adding a new worker won’t help improve the read throughput because it’s not getting a new partition assigned to it, overprovision the number of Kafka partition to help alleviate the problem here, but too many partitions does have performance penalty on the broker side and also on the consumer side.

Autoscaling can help improve the cost efficiency and reduce the operation burden. Let’s assume initially we have six partitions assigned to three workers, each worker gets two partitions, would get a well-balanced workload assignment. Now let’s say you’re autoscaling, you need to scale up the number of workers because the traffic grows. Now we have to add a new worker, now we’re assigning six partitions to four workers. We’ll get an unbalanced workload assignment. Again, overprovisioning the number of partitions can help, but it also has performance implications. This type of thing, is there any other alternative streaming storage we can leverage?

Streaming from Iceberg

Next, I’m going to show you the high-level design of the Iceberg streaming source and the benefits it brings. As I showed earlier in the data pipeline graph, we typically have a stream processing engine at Flink, read data from Kafka, write them in data files, and commit the data file to the Iceberg table. Flink Iceberg sink commits the data files after every successful checkpoint in Flink. One to 10 minutes commit intervals are pretty common.

If we commit too frequently, like a second, we’re going to create a lot of small files, and also create a lot of metadata files for Iceberg to keep track. Too many metadata files can also stress the Iceberg metadata system. If we commit too infrequently, like every hour, we can delay the availability of the data to the downstream consumers. The question we are asking, can a downstream Flink job stream data out of the Iceberg table as they are committed by the upstream job? The answer is yes. Let’s see the kind of data the Iceberg table have snapshot at the end.

The upstream job commits the new data files, it will create a new snapshot in Iceberg table called n plus 1. Iceberg provides this incremental scaling API to discover the new data files appended between two snapshots. This is called the incremental scan. Then, the Iceberg source will discover the new files and the readers will read the records from those data files and admit them to the downstream operators.

For a streaming job, this cycle continues to evolve, because the stream job by definition never finish, it runs forever. This split discovery cycle continues forever. On the left, I have the code snippet for constructing the Kafka source in Flink. Basically, we set the bootstrap servers and Kafka topics, and StartingOffset strategy, I use latest offset, and deserializer. To construct an Iceberg source in Flink, we need to provide tableLoader, basically tell the Iceberg source how to load the Iceberg table from a catalog, like Hive metastore, or Glue. Then we also use StartingStrategy, here I chose starting from the latest snapshot in the Iceberg table. Also, monitorInterval. How often we want the Iceberg source to discover new data files from the table. Both Kafka and Iceberg source are available in the Flink SQL.

This paradigm of streaming from Iceberg works because we observe many streaming use cases, they’re actually fine with minute-level latency. They’re not looking for a sub-second level latency. With that, we can build low-latency data pipelines chained by Flink jobs streaming from Iceberg. We can achieve end-to-end latency maybe in minutes level. In a 2019 Flink Forward presentation, Stephan Ewen, put the data processing application in the spectrum of latency. At the most real time, we have the transaction processing, we have like the request-response model in a microservice. The latency here is typically milliseconds. Then we have the event-driven application. This is probably mostly the Kafka region come from. The latency may be the sub-seconds. Then we have streaming analytics, data pipelines, continuous processing, and batch processing. Batch processing, you can schedule it hourly or daily, so that latency is at least hours or days level.

Stephan thinks that the full category in the middle fit the stream processing paradigm. I think the Flink Iceberg streaming source fits well for the data pipelines and the continuous processing categories, where the latency expectation is probably minutes. You may wonder, this seems like a micro-batch model. It is. How is it different with incremental batch processing? Let me first explain, what do I mean by incremental batch processing? It means that we’re scheduling the batch runs at a much shorter interval than we typically do, like every few minutes, or even every minute.

Typically, a batch job most commonly is hourly or daily. Each batch run process new files added since last run. Each batch run will produce a bookmark, tell you where to end, then you discover incremental data files. As the schedule interval shortens from hourly, or daily, or few minutes, or every minute, the line between streaming and batch actually becomes blurry. What the batch was streaming, you probably cannot tell.

What are the limitations of the incremental batch processing compared to the streaming execution we’re talking about? First, as we shorten the scheduling interval to every minute or every couple minutes, it might be more expensive to tear down the job and bring it up again, just a few seconds later. We might as well just keep the job running all the time, that’s essentially a streaming job, because a streaming job runs forever, they never terminate. Second, I’ve heard from batch users, they are favoring daily scheduling over the hourly scheduling, although hourly scheduling can bring lower latency. The reason is that with hourly scheduling, we can get 24 times more batch runs and higher chance of job failures, and the need to do backfill. That operation burden, push them away from the hourly scheduling to the daily scheduling.

Now imagine if we’re going to schedule the batch runs every minute, or every couple minutes, we will get a lot more batch runs and a lot higher chance for job failures and the need to do backfill. This operation burden can be too high. Third, most non-trivial data processing involves state. For stateful batch processing, intermediate results are discarded after each run and are recomputed in the next run. With stream processing, the intermediate results are stored in Flink state, and they also checkpoint for fault-tolerance purpose.

FLIP stands for Flink Improvement Proposal. It is used to describe major new features or public API changes. FLIP-27 introduced a new source interface in Flink. It was designed to address the limitations with the old source function in Flink. The key idea in FLIP-27 source interface is to separate the work discovery with the actual reading. The enumerator runs on the job manager, which is the coordinator. It is responsible for discovery work. The parallel readers runs on the TaskManagers, those are the worker nodes. They are responsible for actually reading the data from storage. In FLIP-27, a unit of work is defined as a split. In the Kafka source, a split is a Kafka partition.

In Iceberg source, a split is a file, a slice of a large file or a group of small files. A split can be unbounded, like Kafka source case, or bounded like Iceberg source. In the Iceberg source, the enumerator will discover split from the Iceberg table and keep track with the pending split in an internal queue. The readers will request a split during the job initialization, or when it’s done with the current split. Then we assign a split to the reader. Reader requests one split at a time, is a pull-based model. FLIP-27 unifies a batch and streaming sources. The only difference is whether the split discovery is one time for the batch execution or is periodic for the streaming execution.

What are the benefits of using Iceberg as a streaming source? They’re tied to some of the pain points we talked about earlier. First, Iceberg can leverage managed cloud storage like S3. It offloads the operation burden to the cloud provider, like for the system upgrade, capacity planning, bursty workload, and isolation. Cloud storage is also very scalable and cost effective. This simplifies the storage architecture to a unified storage, because both recent data and historic data are stored in Iceberg. Iceberg is used to store both streaming workload and the batch workload. It unifies the live job and the backfill job source to Iceberg.

Most cloud blob storage like S3 don’t charge cross-AZ network cost within a region. There’s no cross-AZ network cost you were seeing earlier. Iceberg source supports advanced data pruning with filter and projection. You can provide a filter expression so that Iceberg scan planning will effectively prune out data files not matching the filter expression. With projection, the Iceberg source will only deserialize the columns it is interested in, because we use column-oriented file format like Parquet. In the Iceberg source, the split assignment is dynamic. It is pull-based, triggered by the reader. That outlier worker node, it allows other worker to pick up the slack because the split is bounded. That’s an opportunity for the other workers to steer the work from the outlier worker node. The other workers, they can just pick up more data files and process them.

With the Iceberg source, we can keep together a lot more file segments than the number of Kafka partitions, this can bring some operational benefit. For example, with a higher number of file segment, we can support higher parallelism, which can be useful in some cases like backfill. You may want to be able to backfill much faster than the live job, so you want a higher parallelism. With more file segments, you cannot do that. With more file segments, it’s also more autoscaling friendly. It is easier to assign the files to the readers in a more balanced fashion. We have merged this FLIP-27 Flink Iceberg source into the Apache Iceberg project. It is fully merged, but the only thing I want to call out is that for the stream and read, right now we only support the append only record. It does not support a CDC read on updates and the deletes. That needs to be addressed in the future.

Watermark Alignment

Next, I’m going to dive deeper into the Flink watermark alignment, and how it works in the Flink Iceberg source. I want to give a quick recap, what a watermark is. In Flink, event-based application, record can come out of order. It’s necessary to buffer and wait to tolerate the late arrival data. We cannot wait forever, otherwise, the result will be delayed for a long time and the Flink state can go unboundedly. At a certain time, we have to stop waiting. That is that watermark is doing. It basically tells us all the data before the watermark have already arrived, so it’s ok to emit a result. Occasionally, the watermark could be heuristic, could be wrong, hence the watermark is a tradeoff between the completeness and the latency of the results. We talk about watermark.

Then, what is watermark alignment? Why do you need it? Let’s look at the use cases of stateful join. For example, let’s say we join ad impression with ad click. For stateful join, we typically need to define join window, which can be determined by how late data can come. In this case, let’s assume we have 6-hour join window. Let’s assume everything works well, in steady state with live traffic. Let’s say we need to replay the two Kafka sources to 24 hours ago, maybe there’s outages, or there’s something like a data issue. Because those two data streams, they may have different volume of data, let’s assume the ads impression stream have 4x amount of data compared to clickstream. Because of different data volume, those two Kafka sources, they will proceed at a different pace. In Flink, the pace is measured as a watermark.

If we zoom into the Flink job a little bit, it can be simplified as four operators. You’d have two Kafka source operators, each need to form one Kafka stream. Then we have the co-process function, which does act as stateful join. This is a stateful operator. Then we have sink operator, which would write a join output to the sink storage. Because clickstream is much smaller, so let’s assume that source 2 was able to catch up much faster, it was able to catch up with the live traffic, which it made the watermark as null. Source 1 was catching up on a much bigger stream, so it will catch up much slower, let’s assume it made the watermark null minus 18-hour. Because the Flink watermark calculates the minimal value of all the inputs, so the stateful co-process function will have a watermark at null minus 18-hour. This is a problem. Because the watermark advanced slower in the stateful operator, now it needs to buffer 24 hours of that clickstream data versus 6-hour during steady state.

This excessive data buffering can lead to performance issues and stability issues for the stateful Flink job. This is actually one of the biggest pain points we experienced when running large stateful jobs in Flink. That’s why Flink introduces watermark alignment strategy in the source. It basically tried to ensure that both sources they’re progressing at a similar pace. The source tool will stop reading from Kafka when its local watermark is too far ahead compared to the global watermark. This is the alignment part. This avoids excess data buffering in the downstream stateful operator. For example, in this case, if we allow one throttle of maximum watermark drift, then the co-processor operator function only needs to buffer 7 hours of click data, which is very close to the steady state. This is how Flink watermark alignment works internally.

Let’s assume we have three readers, each reader from one Kafka partition. The Kafka readers instruct a watermark from a timestamp field in the record it consumes from Kafka. They periodically send a local watermark to the enumerator for global aggregations. The enumerator calculates the global watermark as the minimum value of all the reported local watermarks, which would be 10 times, in this case. The enumerator then broadcasts the global watermark to all the readers. Then the readers need to check the difference between the local watermark and the global watermark, to determine if throttle is needed. Let’s assume the max allowed watermark drift is 30 minutes. In this case, we did 0, we’ll stop reading because its local watermark is too far ahead. This is where the alignment can come in. We need to align the progress of all the source readers.

Before I talk about how the watermark alignment work in the Iceberg streaming source, let’s recap some of the difference between the Kafka source and the Iceberg source. In Kafka source, the splits are unbounded partitions. In Iceberg source, the splits are bounded data files or file segments. In Kafka source, the split assignment is static during the job initialization. In the Iceberg source, the split assignment is dynamic, is pull-based from the readers. In Kafka, records are ordered within a partition, first-in, first-out. In Iceberg source, records may not be sorted by the timestamp field within a data file. In Kafka source case, only readers can extract the watermark information from the records that will be read from Kafka, that’s the only way. In the Iceberg source, because Iceberg keep track of column level statistics, like min-max values, in the metadata files, the enumerator actually can extract the watermark information from the min-max test column level statistic in the metadata file without actually downloading or reading the data files.

The key idea is that the Iceberg source enumerator needs to assign the splits ordered by the time. Let’s assume the enumerator discovered 9 files, and we order those 9 files by the minimum timestamp value from the column level statistic. Here the notation means that F1 contains timestamp with records from 9:00 to 9:03. Let’s assume the enumerator assigned the first three files to the first three readers. The readers extract the local watermark by the minimum timestamp value from this data file.

Reader-0, which calculate the local watermark at 9:00, because that’s the minimum timestamp value we have from F1. The global watermark is calculated at the minimum of all the local watermark. The global watermark here will be 9:00. The readers will check the difference between the local watermark and the global watermark to decide if throttle is needed. In this case, all the three readers are ok to proceed because they are within the threshold of 10 minutes watermark drift. Let’s assume some time later, Reader-2 finishes a data file, so it’s requesting a new file and got a fallback. Now Reader-2 will advance its local watermark to 9:13, because that’s the minimum timestamp value from F4. In this case, Reader-2 needs to stop reading, because its local watermark is too far ahead, because the max allowed drift is 10 minutes. That’s the throttling alignment part.

Let’s assume some time later, Reader-0 also finishes file, and got a new file F5, and Reader-0 advanced its local watermark to 9:16 because that’s the minimum timestamp value from F5. Now Reader-0 also needs to stop reading because its local watermark is too far ahead. After some propagation delay, maybe a couple seconds or 2, the global watermark will advance to 9:04 because that’s the minimum timestamp value of the other local watermark. In this case, the Reader-2 can resume reading because its local watermark now is within the threshold when compared to the latest global watermark. Now Reader-2 will resume the normal reading. The max out-of-orderliness we can achieve in the Iceberg source watermark alignment is equal to the max allowed watermark drift plus the max timestamp range in any data files. Let’s assume, in this case, the max timestamp range from F6 is 8 minutes. If the max watermark drift allowed is 10 minutes, then we can achieve max out-of-orderliness in 18 minutes.

Why is it important to keep the max out-of-orderliness small? Remember the earlier graph I showed you, when we keep the max out-of-orderliness small, we can avoid excessive data buffering in the Flink stateful operator. My colleague Peter Vary is working on the upstream contribution for this watermark alignment in the Flink Iceberg source. This is not fully merged yet.

Evaluation Results

Finally, we’re going to look at some of the evaluation results. This is a test pipeline setup. We have a Kafka source job that reads data from Kafka and writes the same data into Iceberg table. Then we have an Iceberg source job streaming data from the input Iceberg table, and write the same data in the output Iceberg table. It’s like an Iceberg middle maker. The traffic volume is about 4000 messages per second. Each message is about 1 kilobyte. The TaskManager container at 1 CPU and 4 gigabyte memory, so it’s a small test setup.

What are we evaluating? First, we want to look at the read latency of the Iceberg streaming source, because for streaming job, latency matters. We want to understand that. Second, we want to understand how the upstream commit interval affects the downstream consumption. The bursty consumption in the downstream. Third, we want to compare the CPU utilization between the Kafka source and the Iceberg source. We measure the latency from the Kafka broker to the Iceberg source with processing time. If we do [inaudible 00:35:14] three segments. First, we have the Kafka source, it’s Kafka read latency, which is typically very fast, like sub-second. Then we have the commit interval. How often the upstream job commits a data file to the Iceberg table. Third segment is the poll interval, how often the Iceberg source polls the Iceberg table to discover the new data files.

The latency is most determined by the last two segments, the commit interval and the poll interval. Here we’re going to look at some of the latency histogram with 10-second commit interval and 5-second poll interval. The x-axis is time, y-axis is latency in milliseconds. We can say the max latency is less than 40 seconds. The median latency is fluctuating around 10 seconds, which corresponds to the 10 second upstream commit interval, which makes sense. Because we commit a batch of files and downstream read a batch of files. Because the Iceberg data commit is a global transaction, that can lead to stop-and-go consumption pattern.

The top graph shows the CPU usage for the Kafka source job. The x-axis is time, y-axis is the number of cores it used. Remember, we’re using a 1-core container. From this graph, we can see the Kafka source job is always busy, is always pulling data from Kafka, processing them. If we look at the Iceberg source job, here, we’re using 5 minutes commit interval and 30-second poll interval. You can see the Iceberg source job was busy processing a batch of files for maybe 2 to 3 minutes. After it’s done with the batch, it’s idle for the next 2 minutes, maybe. Until the new batch commits in, it will be busy again, then it’s idle for the next 2 minutes. This stop-and-go consumption pattern is what we expect out of the global transaction write in Iceberg. I want to show you, as we shorten the upstream commit interval and Iceberg source poll interval, the CPU usage can become much smoother for the Iceberg source job. The top graph is from the previous slide, is for 5 minutes commit interval, and 30 second poll interval.

The middle graph is for 60 seconds commit interval and 10 second poll interval. You can see, we don’t have those long idle gaps of 2 minutes or 3 minutes anymore, although it is still much choppier, compared to Kafka source job. It’s still burstier compared to Kafka source. As we further shorten the schedule interval to 10 seconds commit interval, and 5 second poll interval, we probably cannot see much difference in terms of CPU usage between the Iceberg source job and the Kafka source job. The pattern maybe looks very similar.

Finally, we’re going to compare the CPU usage between the Kafka source job and the Iceberg source job. The only difference between those two jobs is in the streaming source: one in the Kafka, the other one in Iceberg. Everything else is actually pretty much the same. Here the CPU usage comparison between the Kafka source job and the Iceberg source job, we apply smooth functions so that the CPU usage does not look choppy, so it’s easier to read the overall trend. You can say that Kafka source job has CPU utilization at 60%, Iceberg source job has CPU utilization at 36%. I know benchmark is tricky. I think the main takeaway is that Iceberg source job is as performant as Kafka streaming source, if not better.

Key Takeaway

We can build the low latency data pipelines chained by Flink job streaming from Iceberg. We can achieve overall latency in minutes level. They’re cost effective, and they’re also operation friendly.

Flink and Spark vs. Data Lake Solutions, for Batch Processing

For people coming from a batch background, is there a difference for a different compute engine like Flink, Spark. Do they work better with one data lake solution like Iceberg, or Hudi, or Delta Lake? I think initially, for example, Iceberg were coming from the batch analytics dataset background, and Hudi was coming from the streaming use cases. That’s why I think Hudi probably have better support for the CDC read or something. Last couple years, we can see some convergence on those data lake solutions. They’re also trying to learn from these jobs, trying to address the weakness with its own system. I think all the common data lake solutions like Iceberg, Delta Lake, Hudi, now they have all pretty good integration with other popular engines, like Flink, Spark, Trino. I don’t think that I will say there’s a significance of one is working better with the other. It sometimes depends on what technology you are familiar with or you’re more comfortable with.

How Kafka Key Topics are Handled in Iceberg Source

Kafka supports key topic, will the record for the same key go to the same Kafka partition? With the Iceberg source, how is that handled? Iceberg supports a partition scheme called bucketing, it’s like hashing. You can send all the records with the same bucket ID, which is basically the hashing function, [inaudible 00:42:00] by normal bucket, to the same partition bucket. That way, you can group all the records for the same key go to same bucket, then we can read, you can assign the file for the same bucket to the real reader.

The Difference in CPU Benchmark for Kafka Source vs. Iceberg Source

In the CPU benchmark, the Kafka source has 60% CPU usage and the Iceberg source has 36% CPU usage, why is it such a difference, the way we look into that? I did not look into what exactly is causing the difference. I think a couple things might affect that. Because in the Iceberg source, you have to pull in data in a much bigger chunk, because data comes from S3, we get a data file and [inaudible 00:43:08], so it’s a much bigger batch compared to the Kafka source read. That’s probably one of the reasons. Again, I don’t want to read too much to say, Iceberg is more efficient than Kafka. It’s as performant as it is.

Q&A

Participant 1: In the graphs you had the smoother graphs versus the very spiky graphs, on the smoother as you shorten the intervals, do you run a greater risk with backups?

Greater Risk with Backups with Shortened Intervals

Wu: Not really. As we shorten the interval, the problem is small data files. If we commit a data file every second, we’re going to only write 1 second of data into a data file, that data file is typically going to be much smaller, maybe a few megabytes compared to an ideal data file size, maybe like 100 megabytes we’re looking for. Yes, they are producing much small data file which is the downside. That’s why we typically don’t recommend to go very low. That’s why I think 1 to 10 minutes commit is pretty common.

Iceberg Optimization Based on its Latency, Considering Flink Checkpoints and Recoverability

Participant 2: I have a question about the optimization for Iceberg based on rate of latency of Iceberg itself, you said 10, 11 minutes was optimal interval. Interestingly, you’re not considering Flink checkpoints and recoverability, because you just mentioned operational costs, the larger the checkpoints the more costly it will become to maintain this.

Wu: The Iceberg source after the sink only committed data after every successful checkpoint, that’s why the commit interval and the Flink checkpoint interval, they’re actually identical. They’re the same. The main thing is that, you don’t want to commit too frequently or checkpoint too frequently because it can produce small data files, and also too many metadata files. You also don’t want to commit too infrequently, that can lead to latency and the delay of the result. In production, most common I have seen is 1 to 10 minutes. Some people do far shorter, like every 5 seconds or 10 seconds. Some people do even longer than 15 minutes, but 1 to 10 minutes is more common.

Pros and Cons of Hybrid Source & Lambda Source, Compared to Iceberg

Participant 3: In the previous presentation given by Sharon, she talked about the hybrid source and the Lambda source where you read the historical data from data lake and latest data using Kafka. How do you see the pros and cons, purely in the line of the Iceberg source comparing with the notion of the Lambda source.

Wu: Iceberg can be used to store long-term data. You can store months, years’ data in Iceberg. That’s no problem. People will put the data in the blob storage like S3. I think the main problem with that, when you push up, let’s say, for 30 days of data, the main problem is actually not to read all the state of data in memory, you want to use the watermark alignment to post the old data first, then move them along nicely. You don’t want to read both new data and old data at the same time. The Lambda source is implemented in Flink as the hybrid source. The key idea with the hybrid source is, in Flink you’re reading the historic data first, then you transition to the streaming source.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.