Month: April 2022
MMS • Matt Saunders
Article originally posted on InfoQ. Visit InfoQ
CRM software company Salesforce have revealed their approach to service reliability using service-level indicators and objectives (SLIs and SLOs). After building a platform to monitor SLOs, they saw massive adoption with 1,200 services onboarded in the first year. The platform provides service owners with deep and actionable insights into how to improve or maintain the health of their services, to find dips in SLIs, to find dependent services that weren’t meeting their own SLOs, and overall provide a better understanding of customers’ experience with their services.
Building a platform to monitor service reliablility abstracts away organizational complexities and toil, allowing teams to focus on driving business value. Tripti Sheth talks through how it was crucial for Salesforce to agree a definition of ‘highly reliable’ across a range of tech stacks, and across the many products and individual supporting services and products within the organisation. This led to them being able to frame reliability in terms of SLIs and SLOs.
As documented by Google Cloud, Site Reliability Engineering (SRE) begins with the idea that availability is a prerequisite for success. Service-Level Objectives (SLOs) are a precise numerical target for service availability. A Service-Level Agreement (SLA) defines a promise to a service user that the SLO will be met over a specific time period, and Service-Level indicators (SLIs) are direct measurements of the service’s performance. These generally accepted definitions are often used to show customer experience in a clear, quantitative and actionable way.
In the past, Salesforce’s teams had assembled SLOs manually, meaning that updating these metrics and reporting on them was a time-consuming and error-prone task. Additionally, different teams would calculate and store these values in different ways, preventing the company from gaining a clear picture of customer experience.
Forming a standardized view of service availability was crucial, and Salesforce approached this in three areas:
Standardised Measurements: Salesforce used a previously established SLO framework based on five readings of request rate, errors, availability, duration/latency, and saturation (READS) to define standardised measurement of product and service health.
Standardised Tooling: a dedicated SLO platform for hosting the definitions of SLIs, SLOs and services, including ownership, health thresholds and alert configurations. This metadata is held in a single data store, with long-term storage and retention to give visibility of historical health trends. Automated alerts can be set up based on the data collected.
Standardised Visualisation: as soon as a new service is added to the platform, an out-of-the-box standard view of metrics is generated, with the standard READS SLIs and any custom SLIs added for that specific service. The visualisation includes a dedicated Grafana dashboard for realtime monitoring which is automatically generated and populated by real-time data. Also, the service is added to the service analytics dashboard which is regularly reviewed to drive conversations about service health and availability.
The combination of these three areas creates many benefits:
- Confidence that SLOs are calculated in a standardized way
- Insights from visualized SLI and SLO metrics
- Using granular targets on SLOs to judge if a service is meeting expectations
- Alerting on SLI and SLO metrics
- Correlation of breaches with incidents
- Identification of service dependencies
The SLO platform architecture comprises multiple components. It is centered around a service registry and configuration store – keeping service ownership information, service statuses and service-specific configuration, and data on SLIs, SLOs and the thresholds required for triggering alerting. Peripheral to this are data stores for change and release information, collected for future use in correlating changes with SLO breaches, and a time-series monitoring platform and pipelines for collecting and aggregating metrics.
The unified service health dashboard has become a focal point for operational reviews. The team have used these metrics to trigger architectural reviews, and stimulated discussions around strategic investments and tactical improvements.
Future work will enable a more comprehensive view of the dependencies for a service – with the goal of pinpointing exactly where a failure occurs and minimising recovery times. Furthermore, having collected these data per service, and with a realistic view of its dependent service, Salesforce will be able to set realistic SLIs across the entire stack.
The full article with further detail is available on Medium.
MMS • RSS
Posted on mongodb google news. Visit mongodb google news
Fiera Capital Corp lowered its position in shares of MongoDB, Inc. (NASDAQ:MDB – Get Rating) by 11.5% in the fourth quarter, according to its most recent 13F filing with the Securities & Exchange Commission. The institutional investor owned 81,627 shares of the company’s stock after selling 10,621 shares during the quarter. Fiera Capital Corp owned approximately 0.12% of MongoDB worth $43,209,000 at the end of the most recent quarter.
Other institutional investors have also recently made changes to their positions in the company. Arlington Partners LLC bought a new position in MongoDB during the 4th quarter worth approximately $30,000. Winch Advisory Services LLC raised its holdings in shares of MongoDB by 54.2% in the third quarter. Winch Advisory Services LLC now owns 74 shares of the company’s stock valued at $35,000 after purchasing an additional 26 shares during the last quarter. HBC Financial Services PLLC raised its holdings in shares of MongoDB by 3,233.3% in the fourth quarter. HBC Financial Services PLLC now owns 400 shares of the company’s stock valued at $39,000 after purchasing an additional 388 shares during the last quarter. First Horizon Advisors Inc. raised its holdings in shares of MongoDB by 194.2% in the third quarter. First Horizon Advisors Inc. now owns 203 shares of the company’s stock valued at $78,000 after purchasing an additional 134 shares during the last quarter. Finally, Lindbrook Capital LLC raised its holdings in shares of MongoDB by 56.5% in the third quarter. Lindbrook Capital LLC now owns 180 shares of the company’s stock valued at $85,000 after purchasing an additional 65 shares during the last quarter. Institutional investors and hedge funds own 89.57% of the company’s stock.
A number of equities analysts recently weighed in on the stock. UBS Group upgraded shares of MongoDB from a “neutral” rating to a “buy” rating and boosted their target price for the stock from $345.00 to $450.00 in a report on Friday, March 18th. Royal Bank of Canada assumed coverage on shares of MongoDB in a research report on Tuesday, March 1st. They issued an “outperform” rating and a $505.00 target price for the company. Barclays cut their price objective on shares of MongoDB from $556.00 to $410.00 and set an “overweight” rating for the company in a research report on Wednesday, March 9th. Stifel Nicolaus dropped their target price on shares of MongoDB from $550.00 to $425.00 in a report on Wednesday, March 9th. Finally, Zacks Investment Research downgraded shares of MongoDB from a “hold” rating to a “sell” rating in a research report on Thursday, February 3rd. One investment analyst has rated the stock with a sell rating, one has issued a hold rating and fifteen have assigned a buy rating to the stock. According to MarketBeat, the company presently has an average rating of “Buy” and a consensus target price of $496.72.
In other MongoDB news, CEO Dev Ittycheria sold 35,000 shares of the stock in a transaction that occurred on Wednesday, April 6th. The stock was sold at an average price of $412.38, for a total transaction of $14,433,300.00. Following the transaction, the chief executive officer now owns 204,744 shares in the company, valued at $84,432,330.72. The sale was disclosed in a document filed with the Securities & Exchange Commission, which is accessible through the SEC website. Also, Director Charles M. Hazard, Jr. sold 1,666 shares of the firm’s stock in a transaction that occurred on Tuesday, February 1st. The shares were sold at an average price of $406.70, for a total value of $677,562.20. The disclosure for this sale can be found here. Insiders sold 145,833 shares of company stock worth $57,329,693 in the last 90 days. Company insiders own 7.40% of the company’s stock.
Shares of MDB traded down $22.72 during mid-day trading on Friday, hitting $354.93. 801,656 shares of the company’s stock traded hands, compared to its average volume of 991,906. The company has a quick ratio of 4.02, a current ratio of 4.02 and a debt-to-equity ratio of 1.70. The business’s 50-day moving average price is $383.70 and its 200-day moving average price is $445.17. The firm has a market capitalization of $23.98 billion, a P/E ratio of -74.88 and a beta of 0.84. MongoDB, Inc. has a 1 year low of $238.01 and a 1 year high of $590.00.
MongoDB (NASDAQ:MDB – Get Rating) last posted its quarterly earnings results on Tuesday, March 8th. The company reported ($0.09) earnings per share for the quarter, beating the Thomson Reuters’ consensus estimate of ($1.26) by $1.17. The business had revenue of $266.50 million for the quarter, compared to the consensus estimate of $243.42 million. MongoDB had a negative return on equity of 66.70% and a negative net margin of 35.12%. The company’s quarterly revenue was up 55.8% compared to the same quarter last year. During the same quarter in the previous year, the business earned ($1.01) EPS. Sell-side analysts anticipate that MongoDB, Inc. will post -5.48 earnings per share for the current fiscal year.
MongoDB Profile (Get Rating)
MongoDB, Inc provides general purpose database platform worldwide. The company offers MongoDB Enterprise Advanced, a commercial database server for enterprise customers to run in the cloud, on-premise, or in a hybrid environment; MongoDB Atlas, a hosted multi-cloud database-as-a-service solution; and Community Server, a free-to-download version of its database, which includes the functionality that developers need to get started with MongoDB.
Want to see what other hedge funds are holding MDB? Visit HoldingsChannel.com to get the latest 13F filings and insider trades for MongoDB, Inc. (NASDAQ:MDB – Get Rating).
Receive News & Ratings for MongoDB Daily – Enter your email address below to receive a concise daily summary of the latest news and analysts’ ratings for MongoDB and related companies with MarketBeat.com’s FREE daily email newsletter.
Article originally posted on mongodb google news. Visit mongodb google news
MMS • RSS
Posted on mongodb google news. Visit mongodb google news
In last trading session, MongoDB Inc. (NASDAQ:MDB) saw 0.8 million shares changing hands with its beta currently measuring 0.89. Company’s recent per share price level of $354.93 trading at -$22.72 or -6.02% at ring of the bell on the day assigns it a market valuation of $26.31B. That closing price of MDB’s stock is at a discount of -66.23% from its 52-week high price of $590.00 and is indicating a premium of 32.94% from its 52-week low price of $238.01. Taking a look at company’s average trading volume of 1.24 million if we extend that period to 3-months.
For MongoDB Inc. (MDB), analysts’ consensus is at an average recommendation of a Buy while assigning it a mean rating of 1.90. Splitting up the data highlights that, out of 20 analysts covering the stock, 0 rated the stock as a Sell while 1 recommended an Overweight rating for the stock. 4 suggested the stock as a Hold whereas 15 see the stock as a Buy. 0 analyst(s) advised it as an Underweight. The company is expected to be making an EPS of -$0.22 in the current quarter.
Upright in the red during last session for losing -6.02%, in the last five days MDB remained trading in the red while hitting it’s week-highest on Friday, 04/29/22 when the stock touched $354.93 price level, adding 8.51% to its value on the day. MongoDB Inc.’s shares saw a change of -32.95% in year-to-date performance and have moved -4.34% in past 5-day. MongoDB Inc. (NASDAQ:MDB) showed a performance of -17.50% in past 30-days.
Wall Street analysts have assigned a consensus price target of $463.20 to the stock, which implies a rise of 23.37% to its current value. Analysts have been projecting $325.00 as a low price target for the stock while placing it at a high target of $650.00. It follows that stock’s current price would drop -83.13% in reaching the projected high whereas dropping to the targeted low would mean a gain of 8.43% for stock’s current value.
MongoDB Inc. (MDB) estimates and forecasts
Statistics highlight that MongoDB Inc. is scoring comparatively higher than the scores of other players of the relevant industry. The company lost -29.66% of value to its shares in past 6 months, showing an annual growth rate of 33.90% while that of industry is 5.70. Apart from that, the company came raising its revenue forecast for fiscal year 2022. The company is estimating its revenue growth to increase by 33.30% in the current quarter and calculating -13.30% decrease in the next quarter. This year revenue growth is estimated to rise 43.80% from the last financial year’s standing.
12 industry analysts have given their estimates about the company’s current quarter revenue by setting an average figure of $241.76 million for the same. And 13 analysts are in estimates of company making revenue of $253.75 million in the next quarter that will end on Apr 2022.
Weighing up company’s earnings over the past 5-year and in the next 5-year periods, we find the company posting an annual earnings growth rate of 3.70% during past 5 years.
MongoDB Inc. is more likely to be releasing its next quarterly report between March 07 and March 11 and investors are confident in the company announcing better current-quarter dividends despite the fact that it has been facing issues arising out of mounting debt.
MongoDB Inc. (NASDAQ:MDB)’s Major holders
Insiders are in possession of 7.58% of company’s total shares while institution are holding 90.05 percent of that, with stock having share float percentage of 97.44%. Investors also watch the number of corporate investors in a company very closely, which is 90.05% institutions for MongoDB Inc. that are currently holding shares of the company. Capital World Investors is the top institutional holder at MDB for having 6.8 million shares of worth $3.21 billion. And as of Sep 29, 2021, it was holding 10.19% of the company’s outstanding shares.
The second largest institutional holder is Price (T.Rowe) Associates Inc, which was holding about 6.68 million shares on Sep 29, 2021. The number of shares represents firm’s hold over 10.01% of outstanding shares, having a total worth of $3.15 billion.
On the other hand, Growth Fund Of America Inc and Smallcap World Fund are the top two Mutual Funds which own company’s shares. As of Nov 29, 2021, the former fund manager was holding 4.81 million shares of worth $2.39 billion or 7.20% of the total outstanding shares. The later fund manager was in possession of 1.91 million shares on Sep 29, 2021, making its stake of worth around $898.76 million in the company or a holder of 2.86% of company’s stock.
Article originally posted on mongodb google news. Visit mongodb google news
MMS • Sergio De Simone
Article originally posted on InfoQ. Visit InfoQ
The latest release of GitHub official GUI client app for macOS and Windows, GitHub Desktop 3.0 brings new features that, while not striking on the surface, may improve collaboration and development workflow, including new notifications and improved checks UI.
GitHub recently added support to re-run failed jobs in a GitHub Action without having to re-run the whole Action. This possibility can save a lot of time with complex workflows that take a long time to execute through, says GitHub.
Our users build complex workflows that often rely on multiple jobs and dependencies. Features like job matrices can trigger multiple variant jobs with just a few lines of YAML. However, these workflows can take a while to run, so when a small part fails, it is frustrating to have to wait to run it again.
GitHub Desktop 3.0 improves on the initial implementation of this feature by making it possible to re-run a single check, instead of running all failed jobs. This feature is available under the drop-down menu associate to a PR. Checks can be re-run only when the spawning Action has completed its execution, so the re-run button is disabled while an Action is still in progress.
To improve developer workflow with checks, GitHub Desktop 3.0 also introduces a new notification when a check fails. This is useful, according to GitHub, to speed up the pull request review process while reducing the risk that your teammates start reviewing it before it is final.
Not addressing the issue promptly could result in your teammates reviewing code different than what you intended to merge, or needing to ask them for an additional approval once the checks pass again.
The new notification will lead you to a dialog with more detail about the failed checks and provides a button to re-run those checks if you suspect the problem could be not related to code. Otherwise, you can switch to the PR branch with a click and start working on the fix.
To improve team collaboration, GitHub Desktop includes new notifications for a number of key events, such as a teammate requesting a change or adding a comment. Additionally, you will be notified when a PR has been approved.
As a last remark about notifications, GitHub Desktop 3.0 introduces a new setting to only enable high-signal pull request notifications. When this is enabled, GitHub Desktop does not send a notification for every single event happening in repositories you collaborate but only for significant, PR-related events.
Finally, GitHub Desktop 3.0 also deals with an annoying warning about the default merge policy to use when pulling from a repo by defaulting to merging (as opposed to rebasing).
MMS • RSS
Posted on mongodb google news. Visit mongodb google news
MongoDB (NASDAQ:MDB – Get Rating) and CCC Intelligent Solutions (NYSE:CCCS – Get Rating) are both computer and technology companies, but which is the superior stock? We will contrast the two businesses based on the strength of their profitability, institutional ownership, risk, valuation, analyst recommendations, dividends and earnings.
Insider and Institutional Ownership
89.6% of MongoDB shares are owned by institutional investors. Comparatively, 86.2% of CCC Intelligent Solutions shares are owned by institutional investors. 7.4% of MongoDB shares are owned by insiders. Strong institutional ownership is an indication that endowments, hedge funds and large money managers believe a company is poised for long-term growth.
This is a breakdown of recent ratings and price targets for MongoDB and CCC Intelligent Solutions, as reported by MarketBeat.com.
|Sell Ratings||Hold Ratings||Buy Ratings||Strong Buy Ratings||Rating Score|
|CCC Intelligent Solutions||1||3||1||0||2.00|
MongoDB currently has a consensus price target of $492.13, suggesting a potential upside of 38.66%. CCC Intelligent Solutions has a consensus price target of $11.25, suggesting a potential upside of 21.89%. Given MongoDB’s stronger consensus rating and higher probable upside, analysts clearly believe MongoDB is more favorable than CCC Intelligent Solutions.
Earnings and Valuation
This table compares MongoDB and CCC Intelligent Solutions’ top-line revenue, earnings per share and valuation.
|Gross Revenue||Price/Sales Ratio||Net Income||Earnings Per Share||Price/Earnings Ratio|
|MongoDB||$873.78 million||27.45||-$306.87 million||($4.74)||-74.88|
|CCC Intelligent Solutions||$688.29 million||8.23||-$248.92 million||N/A||N/A|
CCC Intelligent Solutions has lower revenue, but higher earnings than MongoDB.
This table compares MongoDB and CCC Intelligent Solutions’ net margins, return on equity and return on assets.
|Net Margins||Return on Equity||Return on Assets|
|CCC Intelligent Solutions||N/A||10.21%||4.59%|
Volatility and Risk
MongoDB has a beta of 0.84, indicating that its stock price is 16% less volatile than the S&P 500. Comparatively, CCC Intelligent Solutions has a beta of 0.92, indicating that its stock price is 8% less volatile than the S&P 500.
MongoDB beats CCC Intelligent Solutions on 7 of the 12 factors compared between the two stocks.
About MongoDB (Get Rating)
MongoDB, Inc. provides general purpose database platform worldwide. The company offers MongoDB Enterprise Advanced, a commercial database server for enterprise customers to run in the cloud, on-premise, or in a hybrid environment; MongoDB Atlas, a hosted multi-cloud database-as-a-service solution; and Community Server, a free-to-download version of its database, which includes the functionality that developers need to get started with MongoDB. It also provides professional services comprising consulting and training. The company was formerly known as 10gen, Inc. and changed its name to MongoDB, Inc. in August 2013. MongoDB, Inc. was incorporated in 2007 and is headquartered in New York, New York.
About CCC Intelligent Solutions (Get Rating)
CCC Intelligent Solutions Holdings Inc. provides cloud, mobile, AI, telematics, hyperscale technologies, and applications for the property and casualty insurance economy. It SaaS platform digitizes mission-critical AI-enabled workflows, facilitates commerce, and connects businesses across the insurance economy, including insurance carriers, collision repairers, parts suppliers, automotive manufactures, financial institution, and others. The company offers CCC Insurance solutions, including CCC workflow, CCC estimating, CCC total loss, CCC AI and analytics, and CCC casualty; CCC Repair solutions, such as CCC network management, CCC repair workflow, and CCC repair quality; CCC Other Ecosystem solutions, comprising CCC parts solutions, CCC automotive manufacturer solutions, CCC lender solutions, and CCC payments; and CCC International solutions. CCC Intelligent Solutions Holdings Inc. was founded in 1980 and is headquartered in Chicago, Illinois.
Receive News & Ratings for MongoDB Daily – Enter your email address below to receive a concise daily summary of the latest news and analysts’ ratings for MongoDB and related companies with MarketBeat.com’s FREE daily email newsletter.
Article originally posted on mongodb google news. Visit mongodb google news
Database as a Service (DBaaS) Provider Market 2022 Segments Analysis by Top Key Players
MMS • RSS
Posted on mongodb google news. Visit mongodb google news
New Jersey, United States,- Mr Accuracy Reports published new research on Global Database as a Service (DBaaS) Provider covering micro level of analysis by competitors and key business segments (2022-2029). The Global Database as a Service (DBaaS) Provider explores comprehensive study on various segments like opportunities, size, development, innovation, sales and overall growth of major players. The research is carried out on primary and secondary statistics sources and it consists both qualitative and quantitative detailing.
Some of the Major Key players profiled in the study are IBM, Zoho Creator, Ninox, AWS, Oracle, MongoDB Atlas, Beats, Azure, Aiven, Kintone, Fusioo, Google Cloud Bigtable, SAP, DataStax, Caspio
Get PDF Sample Report + All Related Table and Graphs @: https://www.mraccuracyreports.com/report-sample/401251
Various factors are responsible for the market’s growth trajectory, which are studied at length in the report. In addition, the report lists down the restraints that are posing threat to the global Database as a Service (DBaaS) Provider market. This report is a consolidation of primary and secondary research, which provides market size, share, dynamics, and forecast for various segments and sub-segments considering the macro and micro environmental factors. It also gauges the bargaining power of suppliers and buyers, threat from new entrants and product substitute, and the degree of competition prevailing in the market.
Global Database as a Service (DBaaS) Provider Market Segmentation:
Database as a Service (DBaaS) Provider Segmentation by Type:
Database as a Service (DBaaS) Provider Segmentation by Application:
Large Enterprises, SMEs
Key market aspects are illuminated in the report:
Executive Summary: It covers a summary of the most vital studies, the Global Database as a Service (DBaaS) Provider market increasing rate, modest circumstances, market trends, drivers and problems as well as macroscopic pointers.
Study Analysis: Covers major companies, vital market segments, the scope of the products offered in the Global Database as a Service (DBaaS) Provider market, the years measured and the study points.
Company Profile: Each Firm well-defined in this segment is screened based on a products, value, SWOT analysis, their ability and other significant features.
Manufacture by region: This Global Database as a Service (DBaaS) Provider report offers data on imports and exports, sales, production and key companies in all studied regional markets
Market Segmentation: By Geographical Analysis
The Middle East and Africa (GCC Countries and Egypt)
North America (the United States, Mexico, and Canada)
South America (Brazil etc.)
Europe (Turkey, Germany, Russia UK, Italy, France, etc.)
Asia-Pacific (Vietnam, China, Malaysia, Japan, Philippines, Korea, Thailand, India, Indonesia, and Australia)
The cost analysis of the Global Database as a Service (DBaaS) Provider Market has been performed while keeping in view manufacturing expenses, labor cost, and raw materials and their market concentration rate, suppliers, and price trend. Other factors such as Supply chain, downstream buyers, and sourcing strategy have been assessed to provide a complete and in-depth view of the market. Buyers of the report will also be exposed to a study on market positioning with factors such as target client, brand strategy, and price strategy taken into consideration.
Key questions answered in the report include:
- who are the key market players in the Database as a Service (DBaaS) Provider Market?
- Which are the major regions for dissimilar trades that are expected to eyewitness astonishing growth for the Database as a Service (DBaaS) Provider Market?
- What are the regional growth trends and the leading revenue-generating regions for the Database as a Service (DBaaS) Provider Market?
- What will be the market size and the growth rate by the end of the forecast period?
- What are the key Database as a Service (DBaaS) Provider Market trends impacting the growth of the market?
- What are the major Product Types of Database as a Service (DBaaS) Provider?
- What are the major applications of Database as a Service (DBaaS) Provider?
- Which Database as a Service (DBaaS) Provider Services technologies will top the market in next 7 years?
Please click here today to buy full report @ https://www.mraccuracyreports.com/checkout/401251
Table of Contents
Global Database as a Service (DBaaS) Provider Market Research Report 2022 – 2029
Chapter 1 Database as a Service (DBaaS) Provider Market Overview
Chapter 2 Global Economic Impact on Industry
Chapter 3 Global Market Competition by Manufacturers
Chapter 4 Global Production, Revenue (Value) by Region
Chapter 5 Global Supply (Production), Consumption, Export, Import by Regions
Chapter 6 Global Production, Revenue (Value), Price Trend by Type
Chapter 7 Global Market Analysis by Application
Chapter 8 Manufacturing Cost Analysis
Chapter 9 Industrial Chain, Sourcing Strategy and Downstream Buyers
Chapter 10 Marketing Strategy Analysis, Distributors/Traders
Chapter 11 Market Effect Factors Analysis
Chapter 12 Global Database as a Service (DBaaS) Provider Market Forecast
If you have any special requirements, please let us know and we will offer you the report as you want. you can also get individual chapter wise section or region wise report version like North America, Europe or Asia.
Article originally posted on mongodb google news. Visit mongodb google news
MMS • Raymond Roestenburg Sergey Bykov
Article originally posted on InfoQ. Visit InfoQ
Breck: I work in real-time systems, and have for a while. I think Astrid’s talk, talked about thinking of the electrical grid as a distributed system, and how are we going to manage all these renewable devices on the electrical grid. That’s inherently a real-time problem. Our shift to these really event-based real-time systems is out of necessity. It’s not just a fad. The same thing in the work that I do. If we think of ride sharing, or if we think of financial transactions, these things are all very real-time, event-based systems. We’ve had a lot of success in the transition to these systems. Often, we don’t hear about some of the real hard parts, especially in living with these systems, I think over time. That’s what I want to focus this panel discussion on.
I’m Colin Breck. I lead the cloud platforms team at Tesla, in the Energy Division, and mainly focused on real-time services for battery charging, and discharging, solar power generation, and vehicle charging.
Bykov: I’m Sergey Bykov. Currently I work at Temporal Technologies, at a startup that is focusing on building a workflow engine for code-first workflows to automate a lot of things. Before joining Temporal, I spent like forever at Microsoft, there is group sec from servers embedded, eventually go into cloud and building cloud technologies with an angle on actors, and streaming, and highly biased towards gaming scenarios, or like event basis. Very important.
Roestenburg: My name is Raymond Roestenburg. I’m a tech lead at Lightbend. I’m a member of the Akka team, where I work on Akka serverless, Akka platform, and related technologies. I’m specialized in event-based systems, highly scalable distributed systems, and data streaming, in those areas. I’ve been working with event-based streaming for about 15 years since. I counted the other day, I’ve been using the JVM for 25 years or so. That was a quarter of a century. Also, I wrote a book on Akka, “Akka in Action.” There’s a second edition now available for early access. That’s what I do nowadays.
Batch Systems and Real-Time Systems
Breck: I think there was lots of talk about the Lambda architecture a few years ago, where we would have parallel batch systems and build time systems. There’s a lot of tension in that, especially maintaining data models and code that does aggregations or these kinds of things, in two parallel systems, is a real burden. Even say, answering certain queries across those two systems becomes a huge problem. I think there was maybe a notion that, as we move to purely event-based systems, purely real-time systems that we can do away with the batch system. In my experience, that isn’t the case. We’re still living with batch systems. Do you also see that? Do you have opinions on how batch systems and real-time systems are maybe complementary?
Roestenburg: I think they are complementary, in many cases. I think Lambda in some shape or form is still used. People obviously want to move away from the complexity of these systems. For instance, in machine learning, it makes a lot of sense to have training on your larger historical data, and then do the inference or scoring on the streaming information that passes by. That’s a very useful way for using a Lambda like architecture. There’s obviously, as we know, lots of work in trying to figure out a better architecture that is simpler to use and that removes some duplication in the code. It’s very well known, written about in the Kappa architecture, which is very often used with Kafka, there is a Delta architecture, which is something that Databricks is doing with Delta tables. They are basically making it very easy for you to do upserts on top of very large historical data. Upserting real-time information on top of existing data.
At the same time, it also depends on how you look at streaming. You can look at streaming from an in-application operation, or as a data oriented ETL pipeline. There’s quite a few different areas in which you could use streaming, of course. If we’re talking about streaming between microservices, for me, it’s quite different from the streaming case where you’re doing ETL and you’re basically purifying data, and doing ML Ops or machine learning type systems.
Bykov: I generally agree. I think these are complementary scenarios, and they come from optimizing for different things. The batch systems have always been much more efficient than granularly your per-event, that is more expensive, but the latency requirements or capabilities go way down. It can be much more contextualized. It can be more rapid than reacting, whether it is device operation, some alerts, or a financial transaction like fraud prevention. Another factor I see is that there are different kinds of tooling. For people that do Hadoop, Hive queries, or machine learning, it’s a different skill than extending queries in more real-time systems. Again, they’re complementary.
None of these technologies die. I was at Costco the other day, and I was solving some problems, I saw they still use an AS/400 for the backend. I’m sure Lambda will stay for a long time. I think Kappa may be like more what I’ve been saying recently, where there is some ingestion layer where things go through the streaming system, but then they diverge, and one fork of it goes into more real-time the other goes literally into like blob to create batch for the system. Some cross between Lambda and Kappa, I think it’s more what’s popular.
Data and Code Governance in Batch and Event-based Systems
Breck: I definitely see in our work, things like late reporting IoT devices, and doing certain types of aggregation, say like a daily rollup and these kinds of things, are just much easier in batch systems, or more reliable in batch systems, easier to iterate on in batch systems than in purely event-based systems. I think a lot of the tension there is how do you maintain data models or code that maybe you’re doing aggregations or derivations and those kinds of things? Do you have experience or have you seen people having success maybe managing data governance or code governance across systems when they’re doing one thing in an event-based system, in a real-time system, but then also bringing that maybe to their data warehouse or their batch systems?
Roestenburg: What I’ve seen so far is that that’s very often separate systems. It’s easier to do it that way. Very often the real-time processing, because of what you just said, the possibilities for aggregation are different. If you’re having batch or you can do more query like things. There’s far more state available. The moment you are doing stateful processing, it makes a lot of sense to have batch or micro-batch. For instance, Spark on top of Parquet, those kind of systems. Where on the real-time side, you keep stuff in memory, but it’s more transient. It’s more, you’re processing, you keep some information, but you can’t bring the whole world in there because then it will take too long to stay in real-time. The actual logic will be different. Even though it might seem like you’re duplicating, what’s being duplicated is very often the schemas and the data formats. There is a need to combine both the real-time and the historical data, so there’s a similarity there. I think the logic is very often quite different.
Bykov: I agree with that. In developer experience, if you make a change, speaking about schema evolution and evolution of systems, for real-time you have different constraints with rollout of the new version of your streaming system. You cannot usually go much further back and reprocess. While in the batch system, you can say, let’s go a month back, or even a year in some cases, and reprocess everything.
The other difference I saw is that real-time systems are much less concerned with data protection and governance laws. In the batch system, you have to think upfront how you’re going to delete this data, or how you’re going to prove that you handle it correctly. In real-time, they’re more ephemeral, and you have aggregates, or some products persisted, there are less objects to have.
Breck: I think this is a particularly interesting subject area, actually, you have a lot of decisions being made in real-time systems that then aren’t necessarily represented historically. You can imagine in some operational technology, where an operator or an algorithm makes a certain decision based on real-time data, but that’s actually not persisted. You can’t actually audit or evaluate how the system made its decision, I think it’s actually quite interesting.
State of the Art for Workflow Management in Event-based Systems
One of the challenges I keep seeing over again, in event-based systems is that there’s a chain of events that happen when we get an event. You can see that in Astrid’s talk as well, there’s a time component to things. It’s not just we’ve made a decision to discharge a battery, but we want it to follow this profile over time, and maybe have other triggers in the system that are dependent on it. It blows up into this, essentially workflow. There’s a workflow defined through a series of events that trigger it. Of course, there’s implications there for state management, whether that’s state of devices, or whether that’s some notion of timing and retries, or giving up on things, or event windowing, watermarks that are using streaming systems, these things. I know that’s close to your heart Sergey, and what you have been working on recently. What do you think the state of the art is for workflow management in event-based systems?
Bykov: I’m not sure I can claim something as state of the art. I think what you point into is the key question, because data needs to be processed in the context. That context is like a user or a device, some session. Within that context, you have recent history, you have timing, which is much harder to reason about in the context of millions of users or devices. This is where you trade efficiency of batching for less efficient processing, but much more easy to understand and reason about it, and evolve the processing. You have watermarks and some events that happened a certain time ago, this is where that goes into this, you have workloads that can call workflow business process, which is some set of rules that say, ok, if this happens three times, and then if time more than x, do this action. Then you can change that logic. Somebody reviewing this change will understand, ok, we just did this way. Maybe I’m biased towards more like the real-time and more contextualized processing and small scale versus batch, but that’s where I’m thinking about.
Developer Experience in Event-based Workflows
Breck: Maybe talking about the developer experience there is like, there are certain engines for managing workflows. I haven’t seen developers get that excited about using these and they often become quite constraining after a certain point. You get a POC, it is ok. Especially if you’re driving real-time services, like imagine bidding into the energy market and these kinds of things. Those workflow engines haven’t served people that well. Then I’ve also seen people that don’t even go near that stuff, and they’re essentially building their own workflow engine from scratch, and dealing with all these hard problems. How do you see that evolving? Do you think the developer experience in terms of event-based workflow will get better and we’ll be building on top of platforms that serve those needs?
Bykov: This is a hot topic, actually. I think the contention is between code-first versus declarative or data-driven definitional workflows. I think I’m squarely in the code-first camp because there, the developer writes code, and then can reason and can debug code, what’s going on, versus things that define for a simple XML, like XAML, or YAML, all those things. They look simple, like you said. In POC, we have three steps, and then add more. Then they evolve over time, and before you know, we have this huge blob of some ad hoc DSL, and it’s very difficult to figure out what went wrong. Again, developer experience, and especially dealing with failures, I think, is paramount and is often underestimated when people get into this, “I’ll just write some DSL and handle it.” Then two years later, somebody who inherited this system, they cannot manage it. It’s like, leave it as-is. My bias is squarely towards code-first, or code based, to be developer friendly.
Roestenburg: For me as well, I’ve always been in the code-first camp. I would agree with you there, Sergey. I’ve seen a lot of systems where people start using a few Kafka topics at first, and then they get more Kafka topics. Then over time, it becomes very difficult to track where everything goes. You can obviously add all kinds of logging, all kinds of tracing stuff. It’s definitely not a simple thing. Being able to easily debug and have a good developer experience is, I think, something that the future hopefully brings us.
Bykov: On the very same visit to Costco when I saw the AS/400, I ran into an old friend of mine who is working at this huge payment processing company, and he was selling exactly the solar array you were mentioning. He’s reviewing architecture of systems and services, and people ask what’s wrong. He sees this aggregate, starts with Kafka topic, and then gets into a dead-letter queue. Then there’s three layers of dead-letter queues and some ad hoc logic to deal with that. Because yes, it’s simple from the start, before you get into these real life cases.
Dealing with Complexity in Event-based Systems
Breck: I’m definitely dealing with this, you transition fully into these event-based architectures, often largely around some messaging system at the core. Each new problem you have, like wanting a dead-letter queue or something like that, you produce back to Kafka, something like that. Then you tack on one more layer. It reminds me a little bit of the third talk in the track of this microservice sprawl comes first. Then you need to figure out solutions for data governance, for federation, for solving the hard problems. Yes, I’m not sure we’ve quite got there yet with event-based systems. How do you see that evolving maybe? Say, a company who has standardized on Kafka, and it’s pretty easy to create a new service against it or create a new topic and eventually that becomes fairly intertwined. It’s hard to tell where events are going. It’s hard to change the system over time. Are there companies that are doing that well? What are the techniques they’re using?
Roestenburg: There are different things that you can do, though, eventually, you have to build up a picture of what’s going on. You can imagine a whole bunch of topics with processing parts in between, and you need to understand where things go. One of the things that you can do is in the metadata or the header information, similar to CloudEvents, for instance, you can put contextual information about what the message was, but you would still have to extract it from every topic and then build up a bigger picture. The same thing you can do with log tracing systems where you can see which components called which. This is definitely not easy to do. You have to start doing it from the beginning. This is one of the difficult issue with event-based systems, from the beginning, you have to build things in, because otherwise you can’t later on see what’s going on.
For instance, one very simple thing, if you have a processor that reads from a topic and writes to another topic, when you’re processing from that topic, and you get any corruption, basically the only option you have is to drop the message and to continue, because otherwise your system is stuck, you have a poisoned queue. The dead-letter queues that you mentioned, very often are not very useful, because it was corrupted data. How are you going to read this? It needs a lot of human intervention. You would have to go to previous versions of code, see what it was maybe. By that time your system is already a lot further. These interactions become very difficult.
Yes, I think building up more context per message and keeping this context with the message or inside the message itself in the domain of itself, or as metadata on the message, can help you to later on create a picture of what’s going on. As you can understand, the moment you lose a message, that’s something you don’t see. It’s very difficult to then see where it’d actually go. I haven’t seen any stream lineage tools or something that really can just very automatically trace back to where things began and how it fanned out, or something. I haven’t seen that yet so far.
Bykov: We see a bunch of things with customers, where we started referring to them on a queues and duct tape architecture. Where the queues by themselves are great, but this challenge of when there’s a poisoned message, or just pure failure to talk to another system, which may be down or maybe there’s some interrupt and retries, and exponential backups, backpressure. These set of concerns are pretty standard when you go from one stage to another or from one microservice to another. The challenge there is to have a simple to use, unambiguous set of tools for developers, which will be very clear what the behavior of the system would be without writing code for this duct tape. Have duct tape be part of the system, and then you can reason about what’s going on. You can trace what’s going on by looking at standardized, unified way of, here’s the progress of this processing, within context. I think context comes first.
Visibility into State Management & Workflow Progress for Millions of Instances
Breck: Maybe back to that context, let’s maybe think of an event-based workflow that has lots of this context, maybe applied to an IoT scenario, when you have millions of them. How do we get visibility into state management, workflow progress, those kinds of things, when that is defined not as a single instance, but say millions of them?
Bykov: It depends on the scenario. In some scenarios, like in IoT, there may be transient state that it’s ok because there’ll be subsequent update, maybe like a minute later, that will invalidate the previous state anyway. It’s not important to alert in this case. Or like telemetry, another example, where there may be some hiccup in telemetry, some metrics, and generally at high volume it’s ok because they keep being reissued. There are different scenarios like payment transactions, where it’s absolutely unacceptable to handle them, then you need to alert. Then you need to stop processing, and maybe even like page a developer or an engineer, to go figure out.
Using CloudEvents to Solve Metadata Problems
Breck: Are you seeing anybody use CloudEvents to solve the metadata problems you’re just talking about of traceability of events, or correlating events to certain actions?
Bykov: I don’t. Maybe people do that, but I have not myself seen that.
Roestenburg: Yes, we are using CloudEvents in Akka serverless, for instance, and there is some additional information that we keep, but it’s not necessarily meant, at this point, in any case, to extend for this information. You could use Kafka’s normal headers as well. It’s not necessarily CloudEvents, you could use anything. Although it is handy to have a particular format that you keep for all your stuff, whatever you’re doing, so you can very easily inspect, ok, if I put everything in CloudEvents, I can very easily see at least some of these metrics, some of these headers. The other thing you could obviously do is, in your stateful processing, keep information about what’s going on and then provide that in telemetry again. It’s very case by case basis, potential in what you want to do.
Bykov: I think Telemetry is key. If you emit enough telemetry, then you can separate systemic issues versus some one-off things, and then you can decide where you want to invest time.
Breck: That’s been my experience as I think CloudEvents sounds very interesting, but I haven’t seen much adoption of it. I’ve seen more what you’re mentioning, Ray, with using Kafka headers, or propagating request IDs, even all the way through Kafka. You can even tie a gRPC, or an HTTP request to an event that ended up in Kafka that got processed later on to do something.
Roestenburg: You could do it inside your messages or outside as a header, it depends. The nice thing about if you put on the headers is that they are always readable so you wouldn’t get a corrupt header, or something really horrible would happen to Kafka maybe. That’s the benefit of using these envelopes.
Breck: I’ve seen people wanting to do transactional things, actually using the Kafka offsets and propagating those as metadata. Especially when you run an event to another topic, to another topic, because if you maintain the original offset, then you can actually do some transactional things against that, which is interesting.
Roestenburg: We’ve done that but mostly from the topic where you’re producing or where you’re processing the topic, that’s where you do is in your writes to the output, so when you write to any output, even to a database. In a transaction, you write to offset where you left off, so in that way you can do effectively once processing within that service. It’s very hard. I don’t know how to do that over many services. One of those hops is fine, and so you can build these services that can die at any time, continue where they left off. You might have to deal with some duplication. In the case where you write the offsets to the output where you’re writing, if that has a transactional model, then you can be effectively once basically, for processing.
Bykov: Same here, I’ve seen it within the scope of a service, but crossing service boundaries are usually different mechanism.
Developer Experience and Ecosystems
Breck: We’re going to talk a bit about developer experience and ecosystems. A lot of event-based systems, they have their own libraries, or they have their own developer experience. I think of Kafka and Kafka streams and the systems people build around that. Most of my experience in the last five years has been using Akka Streams, even around Kafka, because I can interface with all sorts of different systems, and I’m not just tied into one ecosystem. Do you think that’s a model that will continue to evolve in libraries in lots of different languages, or do you think there are certain advantages of having ecosystem specific libraries? Do you have opinions there? What are you seeing develop?
Bykov: My opinion here is that what’s most important is establishing patterns, and Kafka did, it established. It’s essentially the partition log. It’s not even the queuing system. For example, when I had to explain years ago, what Azure Event Hubs is, it’s essentially hosted Kafka. It’s the same thing, like the Kinesis. Establishing this pattern, and then people can use different services like that in the same manner. Then switching from library [inaudible 00:29:46] is speaking the same language. In terms of abstractions, I think it’s easy. Trying to unify it into one layer that hides everything, I think that’s dangerous. I think we made that mistake early in streaming, back in the days when we provided a single interface for queuing systems with very different semantics, like log-based versus SQS as your queue versus even in-memory like TCP, direct messages. I think that’s a dangerous thing to try to hide that.
Roestenburg: I definitely agree with that. I think it’s very good to take advantage of these very specific libraries. Streaming is not streaming, there are so many different things you can do. You can do what you said with Akka Streams, which obviously I like. It’s a particular kind of in-application streaming that you do there. The moment you want to do data oriented streaming, it’s a whole different thing. For instance, Akka Streams doesn’t have things like windowing, watermarks. It’s possible to build those things, but there’s far better support in Spark, or Flink, which are completely different models, so you basically need specific libraries. I don’t think it’s a place inside a programming language. That’s a danger as well to expect that of a programming language. It’s very specific needs, if you look further than just memory constraints, processing of data.
Breck: Are you seeing any use of Apache Beam as the model for solving some of those problems?
Roestenburg: I sadly have not. I’ve always been hoping to see Apache Beam being used more. I do see things like Flink. The actual standards of Apache Beam being used, is not something I actually know. I was expecting it. That’s a very large standard for these things.
The Evolution of Developer Experience for Building Event-based Systems
Breck: Let’s talk maybe about the evolution of these systems. I think a lot of events-first systems, streaming systems, people have built those. I have lots of experience with actors over the last many years. I know both of you have a lot of experience with actor-like. I know Sergey, you wrote an article saying it’s such a loaded term, it’s hard to use sometimes. Most of what I build, we run it ourselves. We’re building these distributed, stateful actor systems that model entities, and we’re building it ourselves. It seems like, more recently, we’re starting to see this evolution on top of Cloud platforms with Azure Durable Functions, or the Cloudflare Durable Objects. It seems like this location transparent, stateful actor model is almost becoming the way of programming in the cloud in the future. Do you agree? What do you think that evolution is in terms of developer experience for building event-based systems?
Bykov: That’s been my belief that it would come to that serverless hosted, cloud hosted systems based on the actor or context oriented computations. I think it’s inevitable because there’s this data gravity. Yes, you can deploy and run your own replicated databases in the whole estate, but most people don’t want to. They want to outsource this problem. When it comes to hosting your compute, of course, we can all run on Kubernetes clusters. Do we want to, becomes the question if it’s cost effective to run it in the cloud provider. Then before you know, you outsource most of this platform, you can start to focus on the actual application. I think that’s inevitable for a lot of scenarios, if not for most, that would be much easier, more economical to just run it in the cloud.
Roestenburg: In all cases, you want to focus on the business. A lot of these systems, which are wonderful, they are also very hard to run. Kafka is quite notorious in this respect. You can get these distributed logs, Pulsar, Kafka, you can get them run for you. Nowadays even with infinite storage, tiered storage, those things are very hard to set up yourself. Even if you would be able to manage a Kafka cluster, how are you going to set up all of this extra stuff that you need to not run out of space? A similar thing you can see with Google Spanner where there’s also an infinite amount of storage. There, I think it’s very interesting that people love transactions in the sense of simplicity. That comes back as well in what Databricks is doing with the Delta Lake stuff. I think people are moving towards, ok, it’s very interesting, all these technologies, but I want to focus on a simple model, and have a lot of the Ops taken care of for me, instead of having to deal with all these complexities.
Bykov: One other thing I forgot to mention is compliance. If you hosted yourself, getting SOC 2 compliance, or other compliance is just a pain. Yes, if it’s outsourced, then you just present their SOC 2 documents and you’re good.
Roestenburg: Even security to some extent will be better arranged than what you are going to do yourself. Although there’s still obviously always a big open problem there.
The Saga Pattern
Breck: Maybe back to workflows. I hope most people are familiar with the Saga pattern, the notion of compensating transactions and taking care of things when they fail. Is that actually being successfully applied? My impression is that most people that try this, learn that it’s actually really difficult to apply the Saga pattern correctly. That there is really no such thing as rolling back the world. You can compensate in certain ways sometimes, but you can’t go backwards.
Bykov: I’m not sure I agree with you actually. Yes, it’s compensation. You’re not rolling the clock back, like in this classic example of reservations, yes, you can cancel through reservations if one of them failed or got declined. I think there is nothing wrong with Saga as a pattern. I think Saga is a restricted case of a general workflow. It’s like what do you want to do if this step in your business logic fails. Saga just says, you unwind what you allocated before. In a more general case, you might want to do something else. I don’t see that as necessarily special, separate case that’s different from a normal workflow.
Breck: What I’ve seen is depending on the state that you arrive in you can’t actually compensate. The reason you can’t move forward is also the reason you can’t compensate. It’s almost like head-of-line blocking again, you’re stuck.
Bykov: That’s fine. Because in this case, like in a general workflow logic, you need to have a step or a case for notifying. Saying, this thing needs to be manually handled, or call manager, or something. It’s a more general escape hatch for something that did not progress automatically on the predefined simple logic.
Roestenburg: I think one of the issues is that you really have to keep it in the business domain, in the logic of the business. Modeling things as Sagas purely in the business domain makes sense, because you can have these different scenarios, you compensate or you continue. The really tricky part is, if you might be confused with the fact that you could use Sagas for intermittent errors or something, or technical errors. The service wasn’t there, and then you might have already called it. You might have already reserved a table, you just didn’t get a response back. On the business level, where you have more context, you can then say, this workflow ends because something went wrong and I’m just going to get someone to look at it. If you try to solve technical problems with the Saga, then you’re going to be in a big problem.
See more presentations with transcripts
MMS • Tejas Chopra
Article originally posted on InfoQ. Visit InfoQ
Chopra: My name is Tejas Chopra. I’ll be talking about Netflix Drive. I work on infrastructure solutions, specifically software that deals with billions of assets and exabyte scale data that is generated and managed by Netflix studios and platforms.
We’ll go over a brief explanation of what Netflix Drive is. Some of the motivations for creating a software such as Netflix Drive. The design and the architecture, lifecycle of a typical instance of Netflix Drive, and some learnings from this process.
What Is Netflix Drive?
You can think of Netflix Drive as an edge software that runs on studio artists’ workstations. It’s a multi-interface, multi-OS cloud file system, and it is intended to provide the look and feel of a typical POSIX file system. In addition to that, it also behaves like a microservice in that it has REST endpoints. It has backend actions that are leveraged by a lot of the workflows and automated use cases where users and applications are not directly dealing with files and folders. Both the interfaces, the REST endpoint as well as the POSIX interface can be leveraged together for a Netflix Drive instance. They are not mutually exclusive. The other main aspect of Netflix Drive is that it is a generic framework. We intended it to be generic so that there can be different types of data and metadata stores that can be plugged into the Netflix Drive framework. You could imagine Netflix Drive working on the cloud datastores and metadata stores, as well as hybrid datastores and metadata stores. An example could be, you could have Netflix Drive with DynamoDB as the metadata store backend, and S3 as a datastore backend. You could also have MongoDB and Ceph Storage as the backend datastores and metadata stores for Netflix Drive. It has event alerting backends also configured to be a part of the framework, and that eventing, alerting is a first-class citizen in Netflix Drive.
Let us get into what are some of the motivations of having Netflix Drive. Netflix is in general pioneering the idea of studio in a cloud. The idea is to give artists the ability to work from different corners of the world, on creating stories and assets to entertain the world. In order to do so, the platform layer needs to provide a distributed, scalable, and performant infrastructure. At Netflix, assets, which you can think of are a collection of files and folders, have data and metadata that are stored and managed by disparate systems and services. Starting at the point of ingestion, where data is produced right out of the camera, till the point where the data eventually makes its way to movies and shows, these assets get tagged with a variety of metadata by different systems based on the workflow of the creative process. At the edge, where artists work with assets, the artists’ application and the artists themselves expect a file and folder interface so that there can be a seamless access to these assets. We wanted to make working with studio applications a seamless experience for our artists. This is not just restricted to artists, and it can actually extend to more than just the studio use case as well. A great example is all the asset transformations that happen during the rendering of content for which Netflix Drive is being used today.
The other thing is that studio workflows have a need to move assets across various stages of the creative iterations. At each stage, a different set of metadata gets tagged with an asset. We needed a system that could provide the ability to add and support attaching different forms of metadata with the data. Along with that, there is also a level of dynamic access control, which can change per stage, which projects only a certain section of assets to the applications, users, or workflows. Looking at all of these considerations, we came up with the design of Netflix Drive, which can be leveraged in multiple scenarios. It can be used as a simple POSIX file system that can store the data on cloud and retrieve data from cloud, but also has a much richer control interface. It is a foundational piece of storage infrastructure to support a lot of Netflix studios and platforms’ needs.
Netflix Drive Architecture
Let us dive a bit into the architecture of Netflix Drive. Netflix Drive actually has multiple types of interfaces. The POSIX interface just allows simple file system operations, such as creating a file, deleting a file, opening a file, renames, moving, close, all of that. The other interface is the API interface. It provides control interface and a controlled IO interface. We also have events and telemetry as a first-class citizen of the Netflix Drive architecture. The idea is that different types of event backends can be plugged into the Netflix Drive framework. A great example of where this may be used is audit logs that keep a track of all the actions that have been performed on a file or a set of files by different users. We’ve also abstracted out the data transfer layer. This layer abstracts the movement of data from the different types of interfaces that are trying to move the data. It deals with bringing the files into a Netflix Drive mount point on an artist workstation or machine, and pushing files to the cloud as well.
Getting a bit deeper into the POSIX interface, it deals with the data and metadata operations on Netflix Drive. All the files that are stored in Netflix Drive, get read, write, create and other requests from different applications, users, or there could be separate scripts and workflows that do these operations. This is similar to any live file system that you use.
The API interface is of particular interest to a lot of workflow management tools or agents. This exposes some form of control operations on Netflix Drive. The idea is that a lot of these workflows that are used in studio actually have some notion and awareness of assets or files. They want to control the projection of these assets on the namespace. A simple example is, when Netflix Drive starts up on a user’s machine, the workflow tools will only allow a subset of the large corpus of data to be available to the user to view initially. That is managed by these APIs. They are also available to perform dynamic operations, such as uploading a particular file to cloud, or downloading a specific set of assets dynamically and showing them up, and attaching them at specified points in the namespace.
Events have telemetry information. You could have a situation where you want audit logs, metrics, and updates to all be consumed by services that run in cloud. Making it a generic framework allows different types of event backends to get plugged into the Netflix Drive ecosystem.
Data transfer layer is an abstraction that deals with transferring data out of Netflix Drive to multiple tiers of storage. Netflix Drive does not deal with sending the data in line to cloud. This is because of performance reasons. The expectation is that Netflix Drive will perform as close as possible to a local file system. What we do is we leverage local storage, if available, to store the files. Then we have strategies to move the data from the local storage to cloud. There are two typical ways in which our data is moved to cloud. First is dynamically issuing APIs that are done by the control interface to allow workflows to move a subset of the assets to cloud. The other is auto-sync, which is an ability to automatically sync all the files that are there in local storage to cloud. You can think of this the same way that Google Drive tries to store your file to cloud. Here, we have different types of tiers in cloud storage as well. We have peculiarly called out media cache and Baggins here. Media cache is a region-aware caching tier that brings data closer to the edge at Netflix, and Baggins is our layer on top of S3 that deals with chunking and encrypting content.
Overall, picture of Netflix Drive architecture looks as follows. You have the POSIX interface that has data and metadata operations. The API interface that deals with different types of control operations. The event interface that tracks all the state change updates. In fact, events is also used to build on top of Netflix Drive. Notions of having shared files and folders can be built using this event interface. Then, finally, we have the data transfer interface that abstracts moving the bits in and out of Netflix Drive to cloud.
Netflix Drive Anatomy
Let us now discuss some of the underpinnings of the design that go back to the motivation by discussing the anatomy of Netflix Drive. Here are some terminology that we’ll use. CDrive is a studio asset-aware metadata store that is used in Netflix. Baggins is Netflix’s S3 datastore layer that deals with chunking content and encrypting content before pushing it to S3. Media cache is an S3 region-aware caching tier whose intention is to bring the data closer to the applications and users. Intrepid is an internally developed high leverage transport protocol used by a bunch of Netflix applications and services to transfer data from one service to another.
This is a picture of the Netflix Drive interface, or Netflix Drive in general. We have the interface layer, which is the top layer, and this has all the FUSE file handlers alongside the REST endpoints. The middle layer is the storage backend layer. One thing to note is that Netflix Drive provides a framework where you can plug and play different types of storage backends. Here we have the abstract metadata interface and the abstract data interface. In our first iteration, we have used CDrive as our metadata store, and Baggins and S3 as our datastore. Finally, we have the Intrepid layer, which is the transport layer that transfers the bits from and to Netflix Drive. One thing to note is that Intrepid is not just used to transport the data, but here it is also used to transfer some aspects of the metadata store as well. This is needed to save some state of the metadata store on cloud.
To look at it in another way, we have the abstraction layers in Netflix Drive, so you have the libfuse, because we are using a FUSE based file system, that handles the different types of file system operations. You initially start the Netflix Drive and bootstrap it with a manifest. You have your REST APIs and control interface as well. Your abstraction layer abstracts the default metadata stores and the datastores. You can have different types of data and metadata stores here. In this particular example, we have the CockroachDB adapter as the metadata store, and an S3 adapter as the datastore. We can also use different types of transfer protocols, and they are also a plug and play interface in Netflix Drive. The protocol layer that is used can be REST or gRPC. Finally, you have the actual storage of data.
This here shows the different services and how they are split between workstation and cloud. You have the typical Netflix Drive API and POSIX interface on the workstation machine that sends the bits and bytes to the transport agent and library. You have a bunch of services on cloud as well. Namely, they’re your metadata store, which is CDrive in our case. You have a media cache, which is a middle caching tier of storage. You finally have object storage in S3. Netflix Drive on your local workstation will talk to the metadata store and the datastore using the transport agent and the library.
One thing to note here is that we also use local storage to cache the read and the write, and to absorb a lot of the performance that the users expect out of Netflix Drive. Security is a first-class citizen in Netflix Drive. We wanted to provide two-factor authentication on Netflix Drive. The reason is that a bunch of these cloud services are actually used by a lot of applications, they front all of the corpus of assets in Netflix. It is essential to make these assets secure, and to only allow users that have proper permissioning to view the subset of assets that they are allowed to use and view.
Let us discuss a typical lifecycle of Netflix Drive and some runtime aspects of it. Given the ability of Netflix Drive to dynamically present namespaces and bring together disparate datastores and metadata stores, it is essential to discuss the lifecycle. This may not be true in typical file systems where you do not necessarily have a typical stream of events that happen in the life cycle. In Netflix Drive’s case, we initially bootstrap the Netflix Drive using a manifest. An initial manifest could be an empty manifest as well. You have the ability to allow workstations or workflows to download some assets from cloud and preload and hydrate the Netflix Drive mount point with this content. The workflows and the artists would then modify these assets. They will periodically either snapshot using explicit APIs, or leveraging the auto-sync feature of Netflix Drive and upload these assets back to cloud. This is how a typical Netflix Drive instance will run.
Let us get into the bootstrapping part of it. Netflix Drive, typically during the bootstrap process expects a mount point to be specified. Some amount of user identity, for authentication and authorization. Location of the local storage, where the files will be cached. The endpoints. The metadata store endpoint and the datastore endpoint. Optional fields for preloading content. Also, persona. Netflix Drive is envisioned to be used by different types of applications and workflows. Persona gives Netflix Drive its flavor when working for applications. For example, a particular application may specifically rely on the REST control interface because they are aware of the assets, and so they will explicitly use APIs to upload files to cloud. Some other application may not necessarily be aware of when they want to upload the files to cloud, so they would rely on the auto-sync feature of Netflix Drive to upload files in the background to cloud. That is defined by the persona of Netflix Drive.
Here is a sample bootstrap manifest. A particular Netflix Drive mount point can have several Netflix Drive instances that are separate from each other. You have a local file store, which is the local storage used by Netflix Drive to cache the files. The instances get manifested under the mount. In this case, we have two separate instances, a dynamic instance and a user instance, with different backend datastores and metadata stores. In the first instance, which is a dynamic instance, you have a Redis metadata store, and an S3 datastore. You will also uniquely identify a workspace for data persistence. Then in the second one, you have CockroachDB as a metadata store and Ceph as a datastore, again.
Namespace of Netflix Drive is all the files that are viewed inside Netflix Drive. There are two options to actually create namespace. Netflix Drive can create the namespace statically at bootstrap time, where you can specify the exact files and folders that you need to pre-download and hydrate your current instance with. For this, you present a file session and a Netflix Drive container information. You have workflows that can prepopulate your Netflix Drive mount point with some files, so that the subsequent workflows can then be built on top of it. The other way to hydrate a namespace is to explicitly call Netflix Drive APIs in the REST interface. In this case, we use the stage API to stage the files and pull them from cloud, and attach them to specific locations in our namespace. One thing to note is that both these interfaces are not mutually exclusive.
Let us now get into some of the Netflix Drive operations. Modifications for Netflix Drive content can happen through POSIX interface or REST APIs. File system POSIX operations that can modify a file would be open, rename, move, read, write, and close. There could also be a subset of REST APIs that can be used to modify a file. For example, staging a file, which pulls the file from cloud. Checkpointing a file. Saving a file, which actually explicitly uploads the file to cloud. An example of how a file is uploaded to cloud is using the publish API. We have the ability to autosave files, which would periodically checkpoint the files to cloud and the ability to also have an explicit save. The explicit safe would be an API that is invoked by different types of workflows to publish content.
A great example of seeing where these different APIs can be used is the case where artists are working on a lot of ephemeral data. A lot of this data does not have to make to cloud because it’s a work in progress. In that case, for those workflows, explicit save is the right call. Because once they are sure of the data, and they want to publish it to cloud to be used by subsequent artists or in subsequent workflows, that’s when they would invoke this API. That would pick the files. It will snapshot the files in Netflix Drive mount point, and then pick them and deliver it to cloud, and store it in cloud under the appropriate namespace. That is where you can see a difference between autosaving, which is like a Google Drive way of saving files, and an explicit save that is called by artists or workflows.
Given that Netflix Drive is used in multiple personas by different types of workflows, here are some of the learnings that we had while developing Netflix Drive. The number one learning was that there were several points of making different choices for our architecture. We intended it to be a generic framework that can have any datastore and metadata store plugged into it. Also, a lot of our architectural choices were dictated by the performance and the latency aspects of files, workflows, and artists’ workstation, and artists’ experience that we wanted to provide using Netflix Drive. A great example is we used FUSE based file system. We implemented a lot of the code of Netflix Drive using C++. We compared other languages, and we thought that C++ gave us the best performance results when compared to other languages, and performance was a critical feature of Netflix Drive that we wanted to provide.
The other is designing a generic framework for several operating systems is very difficult. In our case, we support Netflix Drive on CentOS, OS X, and Windows. We leverage FUSE file system. We had to then investigate a lot of the alternatives for FUSE based file systems on these different operating systems. That also multiplied our testing matrix, our supportability matrix.
The third learning is that the key to scalability is handling metadata. In our case, because we work with disparate backends, and we have different layers of caching and tiering, we actually rely heavily on metadata operations being cached in Netflix Drive. That gives us a great performance for a lot of the studio applications and workflows that are very metadata heavy. Having multiple tiers of storage can definitely provide performance benefits. When we designed Netflix Drive, we did not restrict ourselves to just the local storage or cloud storage, we in fact wanted it to be built in a way that different tiers of storage can easily leverage Netflix Drive framework and be added as a backend for Netflix Drive. That came through in our design, in our architecture, and in our code.
Having a stacked approach to software architecture was very critical for Netflix Drive. A great example is again, the idea of shared namespaces. We are currently working on the ability to share files between different workstations or between different artists. This is built on top of our eventing framework, which is again, part of the Netflix Drive architecture itself. When one Netflix Drive has a file that is added to the namespace, it generates an event, which is consumed by different cloud services, and is also then using the REST interface of the subsequent Netflix Drive to inject that file into the Netflix Drive instances namespace. This is how you can build on top of existing primitives of Netflix Drive.
If you would like to learn more about Netflix Drive, we have a tech blog available on the Netflix tech blog channel.
Questions and Answers
Watson: Being an application you built natively on the cloud, what was your biggest challenge around scalability? I assume everything wasn’t just flawless day one. Did you have to make some technical tradeoffs to achieve the scale you wanted given how many assets Netflix handles?
Chopra: We are targeting Netflix Drive to serve exabytes of data and billions of assets. Designing for scalability was one of the cornerstones of the architecture itself. When we think in terms of scaling a solution on cloud, oftentimes, we think that the bottleneck would be the datastore. Actually, it is the metadata store that becomes the bottleneck. We focused a lot on the metadata management, how we could reduce the amount of calls that are done to metadata stores. Caching a lot of that data on the Netflix Drive locally, was something that gave us great performance.
The other thing is, in terms of the datastore itself, we explored having file systems in the cloud, like EFS. With file systems, you cannot scale beyond a point, it impacts your performance. If you really want to serve billions of assets, you need to use some form of an object store and not a file store. That meant that our files and folders which our artists are used to had to be translated into objects. The simplest thing to do is have a one is to one mapping between every file and an object. That is very simplistic, because sometimes file sizes may be bigger than the maximum object size supported. You really want the ability to have a file mapped to multiple objects, and therein lies the essence of deduplication as well. Because if you change a pixel in the file, you will then only change the object that has that chunk of the file. Building that translation layer was a tradeoff and was something that we did for scalability. These are examples of how we designed for the cloud, thinking about scalability challenges there.
Watson: How would you compare Netflix Drive with other cloud storage drive systems like Google Drive?
Chopra: That was one of the first things that we were asked when we were designing Netflix Drive. We compared the performance of Google Drive with Netflix Drive. The advantage that Netflix Drive has is because we have a tiered architecture of storage, where we leverage our media cache. Media cache is nothing but a media store, a caching layer that is closer to the user and the application. We also cache a lot of data on the local file store. Google Drive doesn’t do that. We could always get local file system performance compared to Google Drive. If you remove the local file system from the picture, even then, because we have our media caches that are spread across the globe, we can still perform much better than Google Drive.
Watson: It probably doesn’t help to log your data in S3 as well, I don’t know if Google Drive does well.
What prompted you to invent Netflix Drive over, for example, using the AWS File Storage Gateway?
Chopra: One of our cornerstones for thinking about Netflix Drive is security. We wanted to design a system that could provide our artists that are globally distributed, a secure way of accessing only the data that is relevant to them. We investigated AWS File Storage Gateway, but the performance and the security aspects of it did not match our requirements. The second thing is that the file Storage Gateway, I don’t think, when we investigated it, could translate it into objects on the backend. For us, we really wanted the power to have these objects closer to the user in our media caches, and control where these objects are stored. A great example is the case where, let’s say multiple artists are working on an asset. These assets, if every iteration of it is stored in the cloud, your cloud costs will explode. What we wanted to do was enable these assets to be stored in media caches, which is something that we own, and only the final copy goes to the cloud. That way, we can leverage a hybrid infrastructure, control what gets pushed to the cloud versus what stays locally, or in shared storage. Those are parameters that were not provided by File Storage Gateway to us.
Watson: We have similar challenges at Pinterest, where people will say, why don’t you use EKS? Ninety percent of the work we do is integrating our paths around the cloud offering. It’s like you still have to do all that work with security.
How many requests can your system handle in an hour? Do you have any sense of like rough metrics?
Chopra: I ran an Fio to compare Netflix Drive with other alternatives, and we perform much better than Google Drive. Google Drive, I think, has a lot of limits on the number of folders, the number of files that you can put in, Netflix Drive has no such limits.
Would EFS work as a substitute to some extent for Netflix Drive?
EFS could have worked as a substitute, but EFS will not scale beyond a point. Because, in general, if you have to design scalable systems, you have to pick object stores over file stores. Object stores give you more scalability, and you can also cache pieces of objects. Also, deduplication is a challenge because if you change a pixel of a file, then you have to probably have that synced to EFS, and have it stored as a file in multiple tiers of caches. That is not something that we felt was giving us the performance that we needed, so we went with object stores.
Why was CockroachDB selected as a database? Was there a specific use case for it? Did you guys compare it with other options?
We actually have a lot of databases that are built inside Netflix, and we have layers on top of different databases. Cockroach database was the first one that we picked because it gave us a seamless interface to interact with it. We wanted to leverage something that was built in-house, and it is scalable horizontally. It’s a SQL database that is horizontally scalable. That’s the reason why we went with that. Also, you could deploy it on-premise or on cloud. If we ever wanted to go into a hybrid situation, we always had the option of doing that. That being said, CockroachDB was the first one that we picked, but our code is written in a way that we could have picked any one. We also built a lot of security primitives, event mechanisms, all of that for CockroachDB. We wanted to just leverage what is already built, so we went with CockroachDB.
Watson: Having been at Netflix myself for quite a few years, I know you have so many partners there. Have your partners in the industry adopted use of the drive? I assume you have to work with different publishing houses and producers.
Chopra: At this point we are working with artists that are in Netflix, and they have adopted Netflix Drive. We haven’t reached out to the publishers as yet. The independent artists work with our tools that actually would be supported by Netflix Drive. Those tools will be a part of Netflix Drive.
You’ve described how big objects are split, and every small bit of it is referenced in a file, and then, that there was a challenge deduplicating, how is that different from symlinking?
In object stores, you have versioning. Every time you change, or mutate a small part of an object, you create a new version of the object. If you imagine a big file, but only a small pixel of it gets changed, that means that in a traditional sense, if your file was mapped to an object, you would have to write the entire file again as an object, or send the entire bits again. You cannot just send the delta and apply that delta on cloud stores. By chunking, you actually reduce the size of the object that you have to send over to the cloud. Choosing the appropriate chunk size is more of an art than a science, because now if you have many smaller chunks, you will just have to manage a lot of data and a lot of translation logic, and your metadata will just grow. The other aspect is also encryption, because today we encrypt at a chunk granularity. If you have many chunks, you will have to then have so many encryption keys, and manage the metadata for that. That is how it is challenging to deduplicate content also, in such scenarios.
Do you use copy-on-write techniques in Netflix Drive to optimize your storage footprint?
At this point, we do not use that. We could definitely explore using copy-on-write to optimize the storage footprint. Yes, that’s a good suggestion.
What is the usual chunk size?
Chunk size, it depends. I think if I remember correctly, it’s in megabytes, like 64 megabytes is what I remember last seeing. It is configurable. Configurable in the sense that it is still statically assigned at the start, but we can change that number to suit our performance needs.
Watson: How are security controls managed across regions, it’s usually an external API or built into the drive?
Chopra: It’s actually built into our CRDB, as a layer on top of CockroachDB that we have. Netflix Drive leverages that layer to handle all the security primitives. There are several security services that are built within Netflix, so we leverage those at this point. We don’t have external APIs that we can plug in to at this point. We plan to abstract them out as well, whenever we release our open source version so that anyone can build those pluggable modules as well.
Watson: You mentioned open sourcing this, and I know open sourcing is always a hurdle and takes a lot of effort. Do you have a general sense as to when this would be open source?
Chopra: We are working towards it for next year. We hope to have it open source next year because we have a lot of other folks that are trying to build studios in the cloud, reach out to us. They want to use Netflix Drive as well, the open source version of it, and build pluggable modules for their use cases. We do intend to prioritize this and get it out next year.
C++ for performance reasons. Have you considered using Rust?
We did not consider using Rust because I think one of the reasons was that a lot of FUSE file system support was not at that time present in Rust. Secondly, C++ was a choice. First of all, you pick a language for the performance reasons. The second thing is you pick a language that you’re more familiar with. I’m not familiar with Rust. I have not used it. It would take me some time to get used to it and explore it and use it in the best way possible. That would have meant lesser time devoted to the development of Netflix Drive. There is nothing that stops us from investigating it in the future, when we’ve made it a framework and we release it out there.
See more presentations with transcripts
MMS • RSS
Posted on nosqlgooglealerts. Visit nosqlgooglealerts
Rockset today announced launched a new integration into SQL Server that will allow customers to continuously index production data as it hits the relational database and deliver sub-second analytics on that data.
Rockset is a relational database designed to enable customers to run real-time analytics on fast moving data. One of the key elements of the offering is its Converged Index, which indexes every field in customers’ data, including structured data but also JSON, geo and time-series data. Once it’s indexed, Rockset employs SQL to query the data, which can include joins, windowing functions, sorts, aggregations, and order-bys, among others.
The new integration with SQL Server will enable customers to move away from batch-oriented ETL workflows and instead use change data capture (CDC) functionality to get at fresh data as soon as it’s entered into the relational database, the company says.
The combination of CDC with the Rockset index gives customers a new cpaabilty to analyze fast-moving data, says Shruti Bhat, the chief product officer at Rockset.
“While leading cloud providers now provide ways to tap into changes as they occur in the form of database CDC streams, most warehouses are not equipped to consume and analyze these CDC streams in real-time” Bhat says in a press release. “As a real-time analytics platform built for the cloud, Rockset is fully mutable to elegantly handles upserts and stay in sync with SQL Server tables, in real-time.”
Rockset says the new integration will enable analysts to query data generated in SQL Server using visualization tools like Tableau, Retool, Redash, and Superset. It will also enable customers to build new applications that incorporate real-time analytics in the areas of e-commerce, logistics, delivery tracking, gaming leaderboards, fraud detection systems, health and fitness trackers, and social media newsfeeds, the company says.
The San Mateo, California company boasts a collection of integrations and connectors for a variety of data sources, including data buses like Apache Kafka and Amazon Kinesis; NoSQL databases like MongoDB and DynamoDB; and relational databases like PostgreSQL and, now, SQL Server.
Rockset was founded by former Facebook engineers, including some of the team that developed Newsfeed and Search. Venkat Venkataramani, the company’s co-founder and CEO, is also a Datanami Person to Watch for 2022.
Did Rockset Just Solve Real-Time Analytics?
Investors Continue to Target Real-Time Analytics
MMS • Keren Halperin
Article originally posted on InfoQ. Visit InfoQ
Shane Hastie: Good day, folks. This is Shane Hastie from InfoQ Engineering Culture Podcast. Today, I have the privilege of sitting down across many miles with Keren Halperin. Keren, welcome. Thanks for taking the time to talk to us today.
Keren Halperin: Thank you, Shane, for hosting me. It’s a great honor.
Shane Hastie: You’re in Israel, I’m in New Zealand, so we’re pretty much at the opposite time of the day and almost as far apart as we could be, but a good starting point is, who’s Keren?
Keren Halperin: Almost 50, proud to say so. Related to my job, I have an extensive background in global high tech and startup companies in rapid growth stages and dynamic environments. I love the hectic phase, you know, when things are not so certain, when things are very hard and difficult, when it looks like impossible, and to see how you can make things possible. I specialize in startup growth. My passion is to see teams growing from scratch and help companies to connect with employees through values, ethic, proactive approach, and better communication.
I truly believe in actions. This is why I choose startups. I truly believe in employee experience. I think this is one of my latest understanding about employee experience. I know it’s a little bit awkward for an HR leader to say so, but during the last few years, I realized that it’s almost the most important thing for me as an HR leader. I always pray and believe that the company will win and swim, I think it’s going to be incredible. But one thing I can promise the employees that they can have a great experience above the money. They paycheck and everything, but the experience is something you cannot compare to a paycheck. So this is one of my biggest missions.
Shane Hastie: Let’s explore that a little bit. With your background in high growth startups and technology organizations, what is it that these technical people are looking for today in the workforce?
Things technologists look for in an employer [02:40]
Keren Halperin: Technical people, they have a few things that are major. I think one is the culture. It’s the technology stack. It’s not a secret. It’s important for them. The leaders, of course, they want to see tech leaders with great people skills they can learn from. So they are looking whatever for a great leader or senior developers to be part of the team. They are looking for a great product. Most of them are looking for a product that can make a difference, that really bring values. I wouldn’t say about everyone because some of the developers, the technology stack is enough for them, whatever it does. They love like the complexity of tech questions and less about what the value of the product, but a lot are curious about the product itself. I think today, they are looking for flexibility, anything related to work from home, they work-life balance. And of course, they are looking for a great team to be part of.
I know when I talk with developers, they are reviewing the LinkedIn, check each and every person to see their background if they are good enough, if they are a team they want to be part of. So in general, these are the major things that I’m familiar with. And of course, salary.
Shane Hastie: Let’s dig into great teams. What do we need to do to create a great team environment?
What’s needed to create a great team environment? [04:04]
Keren Halperin: Wow! This is a big question. You know, each and every company, they have their own secret sauce, but I can say that the best teams that I’m familiar with, and also if you will ask developers, they are looking for a team where they have a good balance of very good people, smart, but at the same time kind and friendly, a great leader. Someone they can learn from, someone who cares for them, who build like career development for them, they can learn from each other. For example, I can talk about Swimm related to this topic. We have a culture of Swimminars seminars, where people have the chance to teach each other about a specific topic. And in this way, we keep nurturing each other. They want to have fun environment, not just like coding. Also, have fun events, have the chance to go together when work days end and they have the, and to meet each other over coffee, over dinner.
I can say about Swimm team that this is something that I’m really proud of to … Not me, but for Swimm, that the R&D team here is very, very connected to each other. They have a very good vibe. They even ask to go together for a vacation to London and for The Book of Mormon, because they love to be with each other, which is a very, very good signal.
Yeah, so I think these are the major factors for developers and great teams, trying to think about, like, of course, the ability to experience complexity, the ability to take a feature end to end, to have the freedom to come with ideas. And the leader, the team leader, the team, they are considering their ideas seriously.
Shane Hastie: We’re in a world where we hear a lot about the talent shortage today. How real is that, and what can organizations do about it?
The talent shortage is real [06:02]
Keren Halperin: Maybe it’s a good moment to share my experience. Like four years ago, I actually started to do some research about the talent shortage. And back then, it was already clear that the direction is very clear that the talent shortage is here to stay, and even going to be worse. I will go into 2020 and I will say when the COVID broke into our life, a CEO asked me, “What do you think? What is going to be right now?” And I said, “I don’t know.” And I think some of the CEOs and entrepreneurs, they were assuming maybe now the world is going to change. In a way, it’s going to be employers’ time. And I said, “I don’t know.” But I know for sure that my experience, and I have enough experience to look back and say that always recruitment and talent was an issue.
It’s never easy to attract and to hire strong talent. And when we talk about talent, we are talking about the people you want to attract, but it’s here to stay. The COVID showed us that technology is accelerating. More and more money is invested in technology because it’s a very unique momentum. When COVID showed that technology kept economic transactions, communication between people, helping us to be connected, and more and more investments was in these areas.
So there is a lot of money in technology. There is going to be, I guess, more money in technology and the talent is not growing as much as investments. So I think it’s going to be a challenge for at least a few years, if not even more. Not mentioning the digital, I think revolution, that even made it more challenging. Because again, as you, from New Zealand and me from Israel can talk right now, it’s happening.
Shane Hastie: So if I’m a manager in a technology organization looking to build out my team and to grow my team, what do I do?
Advice for managers looking to recruit [08:03]
Keren Halperin: I think it starts with a mindset of flexibility, understanding that you need to understand first, what is your challenge? What kind of people are you looking for? If you are in Israel or whatever other place, what are your possibility? What is the competition? What is your value proposition to the talent that you are looking to attract? And do the pros and cons of every option. But when I say flexibility is really to put on paper all the options you have, starting from local team through fully remote, through like maybe friends or friends of friends that are remote, maybe a remote site. You need to put maybe like thinking about looking for more potential and building educational or bootcamp within the company. What are the possibilities? What is your challenge? And put it on paper and see that you are not missing anything.
If you decide to start with a certain direction, that’s fine, but you need also to, at a certain point, say, “Okay, do I keep this direction? Do I explore other options that I put on paper?” That what I suggest for hiring managers. And of course, of course, building a great culture is always a very, very good foundation for hiring and also awareness, like brand awareness, help talent out there to recognize you, to send signals, why they should join you? It’s important to understand how you compete in market. Do you have a real value proposition to the local talent? Do you have value proposition to other talents out there? It’s like when you sell product. Career is a complex product, it’s not something that people say, “Okay, I will try it. And if it’s not good, nevermind, I will move to the other product.” Career is a very big decision for people.
Shane Hastie: You mentioned flexibility, and one of the things we touched on in our conversation earlier on, before we started recording was the way people earn money today. How do organizations accommodate these very, very different ways, and what are they?
Remote teams require leaders with stronger emotional intelligence skills [10:16]
Keren Halperin: I think for me, it’s a question, but not that much. I guess, some big companies are experiencing this in a much holistic way. I think the two main things related to this question are communication and personalization, which means like everything you do, you need to consider that remote is complicated because communication is always difficult. When you write something, when you write on Slack, or email, or other communication channel, it’s always less soft. It sometimes can be perceived as more rough. So, you need to be more sensitive. I think emotional intelligence for managers that are managing remote teams should be much, much considerate.
The other thing is personal, be more personal because when you don’t see someone, you need to understand that sometimes you can ignore things because you don’t see them and you need to be even more, more cautious about how to engage them, how to let them feel you see them, you hear them, you understand them, and they are integrated into the company. So it means like from how you hire through how you onboard people, changing the onboarding process, be more personal, using more videos, using more communication, through how you integrate people with company ceremonies and events and meeting them. It means like more visiting each other, inviting people to the headquarter and so on. So it’s an effort. There is a cost effect and a lot of investment, and you need to act as a global organization and also diverse. You can’t keep like the headquarter and okay, we have additional people around the world, and that’s fine.
Shane Hastie: How do we tap into things like the gig economy?
Multiple engagement options [12:10]
Keren Halperin: I think that’s part of the challenge, that today people have more options to earn money. Organization is not an easy environment in many cases for some of the people. It depends on the company. And I think Swimm is not there yet, but I think, again, it comes into a mindset. You measure people on outcomes and less on you see how many hours they work. And of course, it makes things more difficult, but also gig economy is here to stay and it’s going to grow. It’s not stopping. And you see in the U.S. as much as I understand that people started to develop their income from home, and a lot of people did not come back to the workplace. So it’s not going to be easier.
Shane Hastie: Another challenge that we see in the technology industry is a lack of diversity in many, many ways. How do we address that?
Tackling the lack of diversity in technology [13:05]
Keren Halperin: I think it starts from the top of the funnel, and also it starts from the mindset, always the mindset, and mean the mindset, the diversity is good. It’s good for the business, it’s enriching the teams, and it produce more success. If you understand this, then the rest will follow. So assuming this, it’s starting to build like hiring plans that attract a diverse people into the process. I think you don’t need to give them a chance in a way of reducing your bar. I think people from all across the board, whatever they come from, they are super talented, and you need to give them the chance to be part of the process.
I can say, for example, at Swimm, we have a process for, by the way, developers. And in a way, it’s a story that can give you the idea of how Swimm is dealing with this. Not mentioning using channels that are attracting diverse people. We use a question about coding. We use a question how the person is thinking. It’s exceptional. It’s not something that we do so often, but we hired someone very young, not coming from a university, even not with experience, only like self-educated, self-sufficient because he solved the question and it was not about coding. We hired him. It took him a while. He worked very hard because he appreciated the opportunity. He was working very, very hard, coming very early in the morning, leaving late. And after like four months, he started to produce in a very, very good way. We bet on him, but we bet on him assuming he knows how to think and not only how to code.
So this is an example for giving a chance sometimes. And when you talk about diverse, I think again, see what people can give you, not only their experience. Because some of the people, they don’t have the income or the option to … You know, their parents cannot afford a university or something like this, but they have the passion, they self-sufficient, they work hard. For example, that’s one example. Of course, you can also think about, let’s say I have the example of Booking.com.
Booking.com did something very interesting. They did their research, identify universities, very good universities, even in nowhere, in places that nobody will go to, and they started to hire them. And they decided to build their teams by talent, not by location. And they looked for the talent and where the talent is, they will have a site. So this is another way of thinking, not thinking from what is comfortable, but what can help the company bring good minds, great minds. So I don’t know if I’m answering your question, but that’s what I can say. I think it comes from the top of the funnel.
Shane Hastie: One of those elements of diversity is generational. You mentioned the X and Y generations and people younger than us. You mentioned you were nearly 50. I’m just over 60. So we are at the other end of that, but most of the workplace is not.
Generational diversity [16:13]
Keren Halperin: Yeah. It’s a very good question. I’m still learning. I’m still trying to understand their minds. I can read the books about generation Z and generation Y. I’m still learning, to be honest with you, but they need a lot of feedback. I can tell you that we just launched our Swimm Up program that is for performance and development, and we put a lot of emphasis on career. Because these generations are … And again, considering the talent shortage and their other options and choices out of signing with a company, they ask, “What’s in it for me?” And you need to deliver. You need to convince them that they have a real career path in the organization, you really care about them, they can have a great experience. You need to give a lot of feedbacks.
For example, we are not doing the traditional biyearly or yearly performance talk. It’s no longer relevant, it’s not efficient. And you can talk with someone about something that happened a few months ago. So this is why we are integrating and developing a culture of a continuous feedback, relying on the one-on-ones and developing our managers to become more a coach rather than a boss. So all these kinds of leadership style that is important to attract and drive young people is part of it. And of course, the communication. All the emojis, and Slack, and fast communication. If you update today on Slack something that happened yesterday, it’s weird. Everything is happening here and now, it’s instant, it’s immediate, and this is the way the leadership needs to update the people. For example, Oren, our CEO, after every board meeting, he sends to employees the full letter about what happened on the board. Everything is transparent. They know about everything. So that’s part of the change that we are experiencing.
Shane Hastie: You touched on an interesting point. Again, thinking of our audience, these technical leaders, some of them are people who are moving into the field, but others have been there for a while. What does it mean to be a manager as a coach, or leader as a coach?
Manager/leader as coach [18:27]
Keren Halperin: Going through your question, so I think it’s a very good practice to promote young people that understand, they have the leadership DNA and motivation, and help them to become leaders because this is a good way for the leadership that is usually more mature with age to help connect with young people. Manager as a coach, it means like you don’t tell people what to do. You give them the tools to do the right things. You give them the space. You also take into consideration that they can make mistakes and that’s fine. And you support them, and you facilitate their environment, and you let them grow. It’s not about, you need to do this, this and that without any reason. It won’t work. It simply won’t work.
You need to be flexible. You need to be sensitive about their work-life balance. If you are going to be strict and they can feel you don’t care about, or maybe they are suspect you don’t care about their private life, it’s also not a good practice. But a manager as a coach is really to ask the questions, not to come up with the answers, to grow them from within and help them to grow and bring themselves into the environment. It’s a process. It’s a process, it’s not happening in one day. It’s relying a lot on, on the job training. Of course, also build with them their career development, their career path. They define where they want to go, and you are the one to help them reach their career goals.
Shane Hastie: This is very different to traditional management.
Leaders as coaches is a necessary positive shift [20:06]
Keren Halperin: I agree. I’m very happy, by the way, that it’s happening. I think it’s a very, very positive direction. I’m so happy that even though it’s a challenge, I learned from those young people, even though it can be perceived as self-centered, but what’s in it for me was something that in the past, it was for selfish people. People that asked the organization, “What’s in it for me?” You shouldn’t ask this question. You are here for the organization. You get money for this. I think it’s a very, very positive direction. I really like the young people that they put themselves in the center. I think we can learn from it.
It’s a challenge. It’s a challenge because it require more sophisticated management. And this is why I think putting younger managers that are skilled and with the leadership DNA in a managerial positions can really help drive the organization because they understand their generation, they understand their people better than me, I can say. I’m sometimes like surprised, but you know, you need to be committed. I think one of the things that we are struggling with as organizations, that the commitment is less than it was in the past. People are less committed. By the way, organizations as well are less committed. The commitment is not that strong as it was.
Shane Hastie: Where do I learn these skills as a manager?
Help people build the skills needed for the new ways of leading [21:26]
Keren Halperin: You know, managers today, I think also in the past, and especially in technology organizations, it’s always a challenge. First and foremost, they need to have people skills, which means they need to see the people first and then the task or producing. They need to understand that they’re actually managing emotions of their people, of their team, their colleagues, their peers, and so on. And as much as they understand this, if in their mind, they understand that people are humans, then it’s a good start. And it means like, see the people, care for them, help them understand where the team goes, where is the focus. Help them to reduce obstacles.
I think one of the things that’s really frustrating, whatever developers and I think developers, especially in startups when startup is growing and suddenly you realize that it’s very hard to move, and in order for you to code or to produce, you need to ask the DevOps or suddenly things are very difficult. It’s frustrating. In many cases, they can even break from such situations. So a good manager knows how to keep their team focused on creation, on coding, on producing, and in a fun and engaging environment. And listen to them, give them feedback, see where they need help, and support them, and guide them. So I don’t know if I’m answering your question, but this is when I think about development team and what makes it a good team.
Shane Hastie: Thanks very much for taking the time to talk to us today. If people want to continue the conversation, where do they find you?
Keren Halperin: They can find me on LinkedIn. LinkedIn, I’m working constantly with it. So that’s a great channel. They can also connect with me directly on my Swimm, Keren@swimm.io.
Shane Hastie: Thank you. I really enjoyed the conversation.
Keren Halperin: Shane, thank you very, very much.
What are the major trends that matter right now in software development and technical leadership?
Data Pipelines & Data Mesh,
and many more topics at
this May 10-20.
Find practical inspiration (not product pitches) from software leaders deep in the trenches creating software, scaling architectures and fine-tuning their technical leadership to help you make the right decisions.
Save your spot now!
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.