2 Top Tech Stocks Under $250 to Buy in 2025 – The Globe and Mail

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Artificial intelligence (AI) has taken the world by storm. It is transforming the technology industry through advancing innovation, efficiency, and new opportunities across multiple sectors. Top tech companies such as Amazon (AMZN), Microsoft (MSFT), and Nvidia (NVDA) continue to control the stock market.

Aside from these well-known tech stocks, there are a few other undervalued options under $250 that could be valuable additions to your portfolio in 2025. With a market capitalization of $466 billion, Oracle (ORCL) has been a long-standing player in the tech industry. 

Meanwhile, Goldman Sachs analysts, led by Ryan Hammond, believe platform stocks such as MongoDB (MDB) could be the “primary beneficiaries of the next wave of generative AI investments.” Let’s see if now is a good time to buy these great tech stocks.

Tech Stock #1: Oracle Corporation

The first tech stock on my list is Oracle Corporation, the largest enterprise-grade database and application software provider. Its products and services include Oracle Database, Oracle Fusion Cloud, and Oracle Engineered Systems, among others. 

The company’s better-than-expected earnings in 2024 boosted investor confidence, leading to a surge in its stock price. The stock soared 60.1% in 2024, outperforming the S&P 500 Index’s ($SPX)gain of 24%

A graph of stock marketDescription automatically generated with medium confidence
www.barchart.com

The company operates in three segments. The cloud and license segment generates the majority of Oracle’s revenue. It includes Oracle cloud service subscriptions as well as on-premises software license support. The company’s cloud offerings include Oracle Cloud Infrastructure (OCI) Fusion Cloud Applications, among others. The other two segments are hardware, which includes Oracle Engineered Systems, and services, which assist customers in optimizing the performance of their Oracle applications and infrastructure. 

Oracle has integrated AI capabilities into its cloud services and applications, thereby enhancing their functionality and appeal. In the third quarter, total revenue increased 9% to $14.1 billion, with the cloud and license segment up 11% year-over-year. Adjusted earnings per share increased 10% to $1.47. The total remaining performance obligation (RPO), which refers to contracted revenue that has yet to be earned, increased by 49% to $97 billion. 

Oracle also pays dividends, which adds to its appeal to income investors. It yields 0.95%, compared to the technology sector’s average of 1.37%. Its low payout ratio of 19.5% also makes the dividend payments sustainable for now. 

The global cloud computing market is expected to reach $2.29 trillion by 2030. Oracle’s investments in cloud infrastructure and applications position it to benefit from this growth. However, it operates in a highly competitive environment, with rivals such as Microsoft Azure, Amazon’s AWS, and Google (GOOGL) Cloud, which together account for 63% of the cloud market. Oracle owns just 3% of this market.

Oracle’s prospects are dependent on its ability to implement its growth strategy effectively. Sustained double-digit growth in cloud services is critical to maintaining investor confidence. 

At the end of the quarter, Oracle’s balance sheet showed cash, cash equivalents, and marketable securities totaling $11.3 billion. The company also generated free cash flow of $9.5 billion, which allowed it to effectively manage its debt while funding acquisitions and returning capital to shareholders via dividends. 

While Oracle’s balance sheet remains strong, competitors such as Amazon and Microsoft have significant capital and resources, posing a constant threat. 

Analysts that cover Oracle stock expect its revenue and earnings to increase by 8.9% and 10.7% in fiscal 2025. Revenue and earnings are further expected to grow by 12.5% and 14.5%, respectively, in fiscal 2026. Trading at 23x forward 2026 earnings, Oracle is a reasonable tech stock to buy now, backed by its strong financial performance, competitive advantages, and exposure to high-growth markets. 

What Does Wall Street Say About ORCL Stock?

Overall, analysts’ ratings for Oracle are generally positive, with 20 maintaining a “Strong Buy” or “Outperform” rating out of the 32 analysts covering the stock. Plus, 11 analysts recommend a “Hold,” and one suggests a “Strong Sell.” The average target price for Oracle stock is $193.63, representing potential upside of 16.2% from its current levels. The high price estimate of $220 suggests the stock can rally as much as 32% this year. 

A screenshot of a computerDescription automatically generated
www.barchart.com

Tech Stock#2: MongoDB

The second on my list is an emerging AI company, MongoDB. With a market cap of $17.3 billion, MongoDB is emerging as a leading name in the database management space. Its business is built around database solutions, with MongoDB Atlas serving as its flagship product, a cloud-based database-as-a-service (DBaaS). Atlas has been deployed on major cloud providers such as AWS, Azure, and Google Cloud, accounting for a majority of MongoDB’s revenue, and has been a key driver of its growth.

MongoDB stock has fallen 36% over the past 52 weeks compared to the broader market’s 24% gain. This dip could be a great buying opportunity, as Wall Street expects the stock to soar this year. 

www.barchart.com

In the third quarter of fiscal 2025, total revenue increased by an impressive 22% year on year to $529.4 million, with Atlas revenue growing by 26%. The company’s subscription-based revenue model guarantees a consistent stream of recurring income, which increased by 22% in the quarter.

MongoDB offers consulting, training, and implementation services to help businesses make the most of their database solutions. Services revenue increased by 18% to $17.2 million in Q3. Adjusted earnings per share stood at $1.16, an increase of 20.8% from the prior-year quarter.

Compared to $1.68 billion in revenue and EPS of $3.33 in fiscal 2024, management expects fiscal 2025 revenue of $1.975 billion and adjusted EPS of $3.02. Analysts predict that the company’s revenue will increase by 17.6%, but earnings may fall to $3.05, higher than the company’s estimate.

However, in fiscal 2026, the company’s earnings could increase by 9.3% to $3.33 per share, followed by a 17.2% increase in revenue. MDB stock is trading at seven times forward 2026 sales, compared to its five-year historical average of 21x.

What Does Wall Street Say About MDB Stock?

Overall, Wall Street rates MDB stock a “Moderate Buy.” Out of the 32 analysts covering the stock, 22 rate it a “Strong Buy,” three suggest it’s a “Moderate Buy,” five rate it a “Hold,” and two recommend a “Strong Sell.”

The average target price for MDB stock is $378.86, representing potential upside of 62.7% from its current levels. The high price estimate of $430 suggests the stock can rally as much as 84.7% this year. 

A screenshot of a computerDescription automatically generated

On the date of publication, Sushree Mohanty did not have (either directly or indirectly) positions in any of the securities mentioned in this article. All information and data in this article is solely for informational purposes. For more information please view the Barchart Disclosure Policy here.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Improving Threads’ iOS Performance at Meta

MMS Founder
MMS Sergio

Article originally posted on InfoQ. Visit InfoQ

An app’s performance is key to make users want to use it, say Meta engineers Dave LaMacchia and Jason Patterson. This includes making the app lightning-fast, battery-efficient, and reliable across a range of devices and connectivity conditions.

To improve Threads performance, Meta engineers measured how fast the app launches, how easy it is to post a photo or video, how often it crashes, and how many bug reports people filed. To this aim, they defined a number of metrics: frustrating image-render experience (FIRE), time-to-network content (TTNC), and creation-publish success rate (cPSR).

FIRE is the percentage of people who experience a frustrating image-render experience, which may lead to them leaving the app while the image is rendering across the network. Roughly, FIRE is defined as the quotient of the number of users leaving the app before an image is fully rendered by the sum of all users attempting to display that image. Measuring this metric allows Threads developers to detect any regressions in how images are loading for users.

Time-to-network content (TTNC) is roughly the time required for the app to launch and display the user’s feed. Long loading time is another experience killer that may lead users to abandon the app. Reducing the app’s binary size is paramount to keeping the binary small:

Every time someone tries to commit code to Threads, they’re alerted if that code change would increase our app’s binary size above a configured threshold.

Additionally, they removed unnecessary code and graphics assets from the app bundle, resulting in a binary one-quarter the size of Instagram.

As to navigation latency, this is possibly even more critical than launch time. Meta engineers carried through A/B tests to find out that:

With the smallest latency injection, the impact was small or negligible for some views, but the largest injections had negative effects across the board. People would read fewer posts, post less often themselves, and in general interact less with the app.

To ensure that no changes cause a regression in navigation latency, Meta engineers created SLATE, a logger system that tracks relevant events like triggers of a new navigation, the UI being built, activity spinners, and content from the network or an error being displayed.

It’s implemented using a set of common components that are the foundation for a lot of our UI and a system that measures performance by setting “markers” in code for specific events. Typically these markers are created with a specific purpose in mind.

Creation-publish success rate (cPSR) measures how likely it is for an user to successfully complete the process of posting some content. On iOS, posting a video or large photo is especially tricky, since the user could background the app after posting their content without waiting for the upload to complete, in which case the app may be terminated by the OS.

Here, the approach taken by Meta was aimed at improving the user experience in those cases when posting failed. This was accomplished by introducing a new feature, called Drafts, to allow users to manage failed posts in more flexible ways instead of just providing the option to retry or abort the operation.

We discovered that 26 percent fewer people submitted bug reports about posting if they had Drafts. The feature was clearly making a difference.

Another approach was trying to reduce perceived latency, as opposed to absolute latency, showing a request has been received when the data upload completes but before it’s been processed and published.

Last but not least, Meta engineers saw a great improvement in app stability after they adopted Swift’s complete concurrency, which, they say, does a great job at preventing data races and reducing hard-to-debug problems caused by data races.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


MongoDB Inc (MDB) Shares Up 4.42% on Jan 2 – GuruFocus

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Shares of MongoDB Inc (MDB, Financial) surged 4.42% in mid-day trading on Jan 2. The stock reached an intraday high of $247.00, before settling at $243.10, up from its previous close of $232.81. This places MDB 52.30% below its 52-week high of $509.62 and 14.27% above its 52-week low of $212.74. Trading volume was 1,281,051 shares, 56.7% of the average daily volume of 2,257,831.

Wall Street Analysts Forecast

1874864343175294976.png

Based on the one-year price targets offered by 32 analysts, the average target price for MongoDB Inc (MDB, Financial) is $377.12 with a high estimate of $520.00 and a low estimate of $180.00. The average target implies an upside of 55.13% from the current price of $243.10. More detailed estimate data can be found on the MongoDB Inc (MDB) Forecast page.

Based on the consensus recommendation from 35 brokerage firms, MongoDB Inc’s (MDB, Financial) average brokerage recommendation is currently 2.1, indicating “Outperform” status. The rating scale ranges from 1 to 5, where 1 signifies Strong Buy, and 5 denotes Sell.

Based on GuruFocus estimates, the estimated GF Value for MongoDB Inc (MDB, Financial) in one year is $506.50, suggesting a upside of 108.35% from the current price of $243.1032. GF Value is GuruFocus’ estimate of the fair value that the stock should be traded at. It is calculated based on the historical multiples the stock has traded at previously, as well as past business growth and the future estimates of the business’ performance. More detailed data can be found on the MongoDB Inc (MDB) Summary page.

This article, generated by GuruFocus, is designed to provide general insights and is not tailored financial advice. Our commentary is rooted in historical data and analyst projections, utilizing an impartial methodology, and is not intended to serve as specific investment guidance. It does not formulate a recommendation to purchase or divest any stock and does not consider individual investment objectives or financial circumstances. Our objective is to deliver long-term, fundamental data-driven analysis. Be aware that our analysis might not incorporate the most recent, price-sensitive company announcements or qualitative information. GuruFocus holds no position in the stocks mentioned herein.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


MongoDB’s Future Surge! Why Tech Investors Are Buzzing – Mi Valle

MMS Founder
MMS RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

In an era where data reigns supreme, MongoDB Inc. (NASDAQ: MDB) is capturing the spotlight of tech-savvy investors, promising unprecedented growth within the evolving landscape of database management solutions. As businesses transition from traditional SQL databases to more flexible, scalable solutions, MongoDB’s NoSQL architecture is emerging as a vital asset for modern enterprises.

The reason for this newfound investor enthusiasm goes beyond MongoDB’s adaptability. With the rise of machine learning, artificial intelligence, and IoT technologies, organizations are increasingly prioritizing solutions that can handle diverse and unstructured data. MongoDB’s document-oriented database model is tailor-made for such demands, offering versatility that rigid relational databases struggle to match.

Furthermore, MongoDB’s strategic partnerships and expansion efforts, including collaborations with cloud giants like AWS and Google Cloud, have bolstered its market position. This integration with leading cloud platforms positions MongoDB as a prime player in the cloud-native database sector, an industry projected for substantial growth over the next few years.

What sets MongoDB apart is not just its technological prowess, but its commitment to innovation and adaptation. Initiatives like MongoDB Atlas, its cloud-based Database as a Service (DBaaS), reflect its dedication to staying ahead in the data solutions arena.

For forward-thinking investors, MongoDB represents a compelling prospect—a company not just keeping pace with today’s data demands but setting the stage for tomorrow’s advancements. With an expanding market and technological relevance, MongoDB stocks are quickly becoming a top consideration for those chasing the future of data.

Why MongoDB is the Future of Data Management: Insights and Innovations

In today’s data-driven world, MongoDB Inc. (NASDAQ: MDB) is quickly becoming a standout choice for investors focused on the transformative future of database management solutions. While the underlying buzz around MongoDB stems from its adaptive NoSQL architecture, there are several additional aspects driving its rise in prominence worth exploring.

### MongoDB: Features and Innovations

MongoDB’s document-oriented database model addresses the demands of handling diverse, unstructured data by providing unparalleled versatility. This feature makes it especially suitable for businesses delving into machine learning, artificial intelligence, and IoT applications. The platform’s ability to manage vast and varied datasets gives it an edge over traditional relational databases which typically lack such flexibility.

#### Key Features:

– **Document-Based Storage:** Allows more flexible data structures.
– **Scalability and Performance:** Easily accommodates growing data needs.
– **Cloud Integration:** Seamlessly works with AWS and Google Cloud services.

### MongoDB Atlas: Cloud-Based Solution

One of MongoDB’s standout innovations is MongoDB Atlas, a cloud-based Database as a Service (DBaaS) offering. This solution underscores MongoDB’s commitment to pioneering cloud-native data management solutions, facilitating easier and more efficient interactions with cloud infrastructures. By providing automated operational tasks like backups, scaling, and updates, MongoDB Atlas helps organizations manage their data pipelines with reduced overhead, placing it at the forefront of cloud data services.

### Strategic Partnerships and Market Expansion

The strategic alliances MongoDB has formed, particularly with cloud service giants like AWS and Google Cloud, provide it with a fortified position in the cloud-native database market. Such partnerships enhance MongoDB’s capabilities and expand its reach, making it a crucial tool for enterprises venturing into the technological future. The cloud-native database industry is projected for significant growth, further increasing MongoDB’s attractiveness as a long-term investment prospect.

### Predictions and Market Trends

As businesses continue to prioritize data-driven decision-making, the demand for databases that can support big data analytics and innovative tech solutions will soar. MongoDB is strategically placed to capture a substantial share of this market. Analysts predict a strong upward trend in MongoDB’s growth trajectory, driven by its pioneering approach to database management and sustained investment in innovation and partnerships.

### Conclusion

For those seeking investment opportunities at the intersection of technology and data management, MongoDB presents a compelling prospect. With its robust features, cloud-first orientation, and innovative spirit, MongoDB is not only adapting to today’s data demands but is also setting the stage for future advancements in the field. As the data landscape continues to evolve, MongoDB’s strategic positioning makes it a frontrunner in the pursuit of next-gen database solutions.

For more information, visit the official MongoDB website: MongoDB

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


2 Top Tech Stocks Under $250 to Buy in 2025 – The Globe and Mail

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Artificial intelligence (AI) has taken the world by storm. It is transforming the technology industry through advancing innovation, efficiency, and new opportunities across multiple sectors. Top tech companies such as Amazon (AMZN), Microsoft (MSFT), and Nvidia (NVDA) continue to control the stock market.

Aside from these well-known tech stocks, there are a few other undervalued options under $250 that could be valuable additions to your portfolio in 2025. With a market capitalization of $466 billion, Oracle (ORCL) has been a long-standing player in the tech industry. 

Meanwhile, Goldman Sachs analysts, led by Ryan Hammond, believe platform stocks such as MongoDB (MDB) could be the “primary beneficiaries of the next wave of generative AI investments.” Let’s see if now is a good time to buy these great tech stocks.

Tech Stock #1: Oracle Corporation

The first tech stock on my list is Oracle Corporation, the largest enterprise-grade database and application software provider. Its products and services include Oracle Database, Oracle Fusion Cloud, and Oracle Engineered Systems, among others. 

The company’s better-than-expected earnings in 2024 boosted investor confidence, leading to a surge in its stock price. The stock soared 60.1% in 2024, outperforming the S&P 500 Index’s ($SPX)gain of 24%

A graph of stock marketDescription automatically generated with medium confidence
www.barchart.com

The company operates in three segments. The cloud and license segment generates the majority of Oracle’s revenue. It includes Oracle cloud service subscriptions as well as on-premises software license support. The company’s cloud offerings include Oracle Cloud Infrastructure (OCI) Fusion Cloud Applications, among others. The other two segments are hardware, which includes Oracle Engineered Systems, and services, which assist customers in optimizing the performance of their Oracle applications and infrastructure. 

Oracle has integrated AI capabilities into its cloud services and applications, thereby enhancing their functionality and appeal. In the third quarter, total revenue increased 9% to $14.1 billion, with the cloud and license segment up 11% year-over-year. Adjusted earnings per share increased 10% to $1.47. The total remaining performance obligation (RPO), which refers to contracted revenue that has yet to be earned, increased by 49% to $97 billion. 

Oracle also pays dividends, which adds to its appeal to income investors. It yields 0.95%, compared to the technology sector’s average of 1.37%. Its low payout ratio of 19.5% also makes the dividend payments sustainable for now. 

The global cloud computing market is expected to reach $2.29 trillion by 2030. Oracle’s investments in cloud infrastructure and applications position it to benefit from this growth. However, it operates in a highly competitive environment, with rivals such as Microsoft Azure, Amazon’s AWS, and Google (GOOGL) Cloud, which together account for 63% of the cloud market. Oracle owns just 3% of this market.

Oracle’s prospects are dependent on its ability to implement its growth strategy effectively. Sustained double-digit growth in cloud services is critical to maintaining investor confidence. 

At the end of the quarter, Oracle’s balance sheet showed cash, cash equivalents, and marketable securities totaling $11.3 billion. The company also generated free cash flow of $9.5 billion, which allowed it to effectively manage its debt while funding acquisitions and returning capital to shareholders via dividends. 

While Oracle’s balance sheet remains strong, competitors such as Amazon and Microsoft have significant capital and resources, posing a constant threat. 

Analysts that cover Oracle stock expect its revenue and earnings to increase by 8.9% and 10.7% in fiscal 2025. Revenue and earnings are further expected to grow by 12.5% and 14.5%, respectively, in fiscal 2026. Trading at 23x forward 2026 earnings, Oracle is a reasonable tech stock to buy now, backed by its strong financial performance, competitive advantages, and exposure to high-growth markets. 

What Does Wall Street Say About ORCL Stock?

Overall, analysts’ ratings for Oracle are generally positive, with 20 maintaining a “Strong Buy” or “Outperform” rating out of the 32 analysts covering the stock. Plus, 11 analysts recommend a “Hold,” and one suggests a “Strong Sell.” The average target price for Oracle stock is $193.63, representing potential upside of 16.2% from its current levels. The high price estimate of $220 suggests the stock can rally as much as 32% this year. 

A screenshot of a computerDescription automatically generated
www.barchart.com

Tech Stock#2: MongoDB

The second on my list is an emerging AI company, MongoDB. With a market cap of $17.3 billion, MongoDB is emerging as a leading name in the database management space. Its business is built around database solutions, with MongoDB Atlas serving as its flagship product, a cloud-based database-as-a-service (DBaaS). Atlas has been deployed on major cloud providers such as AWS, Azure, and Google Cloud, accounting for a majority of MongoDB’s revenue, and has been a key driver of its growth.

MongoDB stock has fallen 36% over the past 52 weeks compared to the broader market’s 24% gain. This dip could be a great buying opportunity, as Wall Street expects the stock to soar this year. 

www.barchart.com

In the third quarter of fiscal 2025, total revenue increased by an impressive 22% year on year to $529.4 million, with Atlas revenue growing by 26%. The company’s subscription-based revenue model guarantees a consistent stream of recurring income, which increased by 22% in the quarter.

MongoDB offers consulting, training, and implementation services to help businesses make the most of their database solutions. Services revenue increased by 18% to $17.2 million in Q3. Adjusted earnings per share stood at $1.16, an increase of 20.8% from the prior-year quarter.

Compared to $1.68 billion in revenue and EPS of $3.33 in fiscal 2024, management expects fiscal 2025 revenue of $1.975 billion and adjusted EPS of $3.02. Analysts predict that the company’s revenue will increase by 17.6%, but earnings may fall to $3.05, higher than the company’s estimate.

However, in fiscal 2026, the company’s earnings could increase by 9.3% to $3.33 per share, followed by a 17.2% increase in revenue. MDB stock is trading at seven times forward 2026 sales, compared to its five-year historical average of 21x.

What Does Wall Street Say About MDB Stock?

Overall, Wall Street rates MDB stock a “Moderate Buy.” Out of the 32 analysts covering the stock, 22 rate it a “Strong Buy,” three suggest it’s a “Moderate Buy,” five rate it a “Hold,” and two recommend a “Strong Sell.”

The average target price for MDB stock is $378.86, representing potential upside of 62.7% from its current levels. The high price estimate of $430 suggests the stock can rally as much as 84.7% this year. 

A screenshot of a computerDescription automatically generated

On the date of publication, Sushree Mohanty did not have (either directly or indirectly) positions in any of the securities mentioned in this article. All information and data in this article is solely for informational purposes. For more information please view the Barchart Disclosure Policy here.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Database Trends: A 2024 Review and a Look Ahead – The New Stack

MMS Founder
MMS RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

<meta name="x-tns-categories" content="AI / Cloud Services / Data“><meta name="x-tns-authors" content="“>


Database Trends: A 2024 Review and a Look Ahead – The New Stack


<!– –>

As a JavaScript developer, what non-React tools do you use most often?

Angular

0%

Astro

0%

Svelte

0%

Vue.js

0%

Other

0%

I only use React

0%

I don’t use JavaScript

0%

2025-01-02 06:28:53

Database Trends: A 2024 Review and a Look Ahead

Here’s a round-up of the big database influences in 2024 — like vector store, GraphQL, and open table formats — and what they portend for 2025.


Jan 2nd, 2025 6:28am by


Featued image for: Database Trends: A 2024 Review and a Look Ahead

Image by Diana Gonçalves Osterfeld.

For databases, 2024 was a year for both classic capabilities and new features to take priority at enterprises everywhere. As the pressure increased for organizations to operate in a data-driven fashion, new and evolving data protection regulations all over the world necessitated better governance, usage, and organization of that data.

And, of course, the rise of artificial intelligence shined an even brighter light on the importance of data accuracy and hygiene, enabling accurate, customized AI models, and contextualized prompts, to be constructed. Key to managing all of this have been the databases themselves, be they relational, NoSQL, multimodel, specialized, operational, or analytical.

In 2025, databases will continue to grow in importance, and that’s not going to stop. Even if today’s incumbent databases are eventually eclipsed, the need for a category of platforms that optimize the storage and retrieval of data will persist because they are the essential infrastructure for high-value, intelligent systems.

Remember, there’s nothing magic or ethereal about “data.” They are simply point-in-time recordings of things that happened, be it temperatures that changed, purchases that were made, links that were clicked, or stock levels that went up or down. Data is just a reflection of all the people, organizations, machines, and processes in the world. Tracking the current, past, and expected future state of all these entities, which is what databases do, is a timeless requirement.

The most dominant database platforms have been with us for decades and have achieved that longevity by adopting new features reflecting the tech world around them while staying true to their core mission of storage and querying with the best possible performance.

Decades ago, it was already apparent that all business software applications were also database applications, and that’s no less true today. But now that truth has expanded beyond applications to include embedded software at the edge for IoT; APIs in the cloud for services and SaaS offerings, and special cloud infrastructure for AI inferencing and retrieval.

What’s Our Vector, Victor?

One big change of late, and one that will continue into 2025, is the use of databases to store so-called vectors. A vector is a numerical representation of something complex. In physics, a vector can be as simple as a magnitude paired with a direction. In the data science world, a vector can be a concatenated encoding of machine learning model feature values.

In the generative AI world, the complex entities represented by vectors include the semantics and content of documents, images, audio, and video files, or pieces (“chunks”) thereof. A big trend that started in past years but gained significant momentum in 2024 and that will increase in 2025 is the use of mainstream databases to store, index, search and retrieve vectors. Some databases are serving as platforms on which to generate these vector embeddings, as well.

This goes beyond the business-as-usual practice of operational database players adding to their feature bloat. In this case, it’s competitive move meant to counter vector database pure-play vendors like Pinecone, Zilliz, Weaviate and others. The big incumbent database platforms, including Microsoft SQL Server, Oracle Database, PostgreSQL, and MySQL on the relational side, and MongoDB, DataStax/Apache Cassandra, Microsoft Cosmos DB, and Amazon DocumentDB/DynamoDB on the NoSQL/multimodel side, have all added vector capabilities to their platforms.

These capabilities usually start with the addition of a few functions to the platform’s SQL dialect to determine vector distance and then extend to support for a native VECTOR data type, including string and binary implementations. Many platforms are also adding explicit support for the retrieval augmented generation (RAG) programming pattern that uses vectors to contextualize the prompts sent to large language models (LLMs).

Where does this leave specialist vector databases? It’s hard to say. While those platforms will emphasize their higher-end features, the incumbents will point out that using their platforms for vector storage and search will help avoid the complexity that adopting an additional, purpose-specific database platform can bring.

GenAI for Developers and Analysts

Vector capabilities are not the only touch point between databases and generative AI, to be sure. Sometimes, the impact of AI is not on the database itself but rather on the tooling around it. In that arena, the biggest tech at the intersection of databases and generative AI (GenAI) is a range of natural language-to-SQL interfaces. The ability to query a database using natural language is now so prevalent that it has become a “table stakes” capability. But there’s a lot of room left for innovation here.

For example, Microsoft provides not just a chatbot for generating SQL queries but also allows inline comments in SQL script to function as GenAI prompts that can generate whole blocks of code. It also provides for code completion functionality that pops up on the fly right as developers are composing their code.

On the analytics side, Microsoft Fabric Copilot technology lends a hand in notebooks, pipelines, dataflows, real-time intelligence assets, data warehouses and Power BI, both for reports and DAX queries. DAX — or Data Analysis eXpressions — is Microsoft’s specialized query language for Power BI and the Tabular mode of SQL Server Analysis Services. It’s notoriously hard to write in the opinion of many (including this author), and GenAI technology makes it much more accessible.

Speaking of BI, analytical databases have AI relevance, too. In fact, in July of this year, OpenAI acquired Rockset, a company with one such platform, based on the open source RocksDB project, to accelerate its platform’s retrieval performance. Snowflake, a relational cloud data warehouse-based platform, also supports a native VECTOR type along with vector similarity functions, and its Cortex Search engine supports vector search operations. Snowflake also supports six different AI embedding models directly within its own platform. Other data warehouse platforms support vector embeddings, including Google BigQuery. On the data lakehouse side, Databricks is in the vector game, too.

‘OLxP’

Staying with analytical databases for a minute, another trend to watch for in 2025 will be that of bringing analytical and operational databases together. This fusion of OLTP (online transactional processing) and OLAP (online analytical processing) sometimes gets called operational analytics. It also garners names like “translytical,” and HTAP (hybrid transactional/analytical processing).

No matter what you call it, many vendors are bullish on the idea. This includes SAP, whose HANA platform was premised on it, and SingleStore, whose very name (changed from MemSQL in 2020) references the platform’s ability to handle both. Snowflake’s Unistore and Hybrid Tables features are designed for this use case as well. Databricks’ Online Tables also use a rowstore structure, though they’re designed for feature store and vector store operations, rather than OLTP.

Not everyone is enamored of this concept, however. For example, MongoDB announced in September of this year that its Atlas Data Lake feature, which never made it out of preview, is being deprecated. MongoDB seems to be the lone contrarian here, though.

Data APIs and Mobile

That’s not the only area where MongoDB has retreated from territory where others have gone running in. MongoDB also announced the deprecation of its Atlas GraphQL API. Meanwhile, Oracle REST Data Services (ORDS), Microsoft’s Database API Builder, and AWS AppSync add GraphQL capabilities to Oracle Autonomous Database/Oracle 23ai, Microsoft Azure SQL Database/Cosmos DB/SQL Database for Postgres, and Amazon DynamoDB/Aurora/RDS, respectively.

What about mobile databases? At one time, they were a major focus area for Couchbase, for Microsoft, and for MongoDB, with its Atlas Device SDKs/Realm platform. Couchbase Mobile is still a thing, but Microsoft Azure SQL Edge at least nominally shifts the company’s focus to IoT use cases, and MongoDB has officially deprecated Atlas Realm, its Device SDKs and Device Sync (though the on-device mobile database will continue to exist as an open source project). It’s starting to look like purpose-built small-footprint databases, including SQLite, and perhaps Google‘s Firebase, have withstood the shakeout here. Clearly, using one database platform for every single use case is not always an efficacious choice.

Multimodel or Single Platform?

Is the same true for NoSQL/multimodel databases, or can conventional relational databases be a one-stop shop for customers’ needs? It’s hard to say. Platforms like SQL Server, Oracle and Db2 added graph capabilities in past years, but adoption of them would seem to be modest. Platforms like MongoDB and Couchbase still dominate the document store world. Cassandra and DataStax are still big in the column-family category, and Neo4j, after years of competitive challenges, still seems to be king of the graph databases.

But the RDBMS jack-of-all-trades phenomenon isn’t all a mirage. Mainstream, relational databases have bulked up their native JSON support immensely, with Microsoft having introduced in preview this year a native JSON data type on Azure SQL Database, and Azure SQL Managed Instance. Microsoft also announced at its Ignite conference in November that SQL Server 2025 (now in private preview) will support a native JSON data type as well.

Oracle Database, MySQL, Postgres and others have for some time now had robust JSON implementations too. And even if full-scale graph implementations in mainstream databases have had lackluster success, various in-memory capabilities in the major database platforms have nicely ridden out the storm.

Multimodel NoSQL has shown real staying power as well. Microsoft’s Cosmos DB supports document, graph, column-family, native NoSQL and full-on Postgres relational capabilities, in a single umbrella platform. Similarly, DataStax explicitly supports column-family, tabular, graph and vector, while Couchbase supports document and key-value modes.

Data Lakes, Data Federation and Open Table Formats

One last area to examine is that of data virtualization and federation, along with increasing industry-wide support for open table formats. The requirement of cross-data-source querying has existed for some time. Decades ago, the technology existed in client-server databases for querying external databases, with technologies like Oracle Database Links, Sybase Remote Servers and Microsoft SQL Server linked servers. Similarly, a killer feature of Microsoft Access over 30 years ago was its Jet database engine’s remote table functionality, which could connect to data in CSV files, spreadsheets, and other databases.

With the advent of Hadoop and data in its storage layer (i.e., what later came to be known as data lakes), bridging conventional databases to “big data” became a priority, too. Starting with a technology called SQL-H in the long-gone Aster Database, acquired by Teradata in 2011, came the concept of an “external” table. By altering the CREATE TABLE syntax, a logical representation of the remote table could be created without physically copying it, but the query engine could still treat it as part of the local database.

Treating remote data as local is also called data virtualization. Joining a remote table and a local one (or multiple remote tables across different databases) in a single SQL SELECT is called executing a federated query. To varying degrees, both data virtualization and federation have been elegant in theory but often lacking in performance.

To help address this, open table formats have come along, and this year, they have become very important. The top contenders are Delta Lake and Apache Iceberg, with Apache Hudi coming in as somewhat of an also-ran. In practical terms, all three are based on Apache Parquet. Unlike CSV, other text-based file formats, or even Parquet itself, data stored in open table formats can be queried, updated and managed in a high-performance, high-consistency manner.

In fact, for its Fabric platform, Microsoft reworked both its original data warehouse engine and its Power BI engine to use Delta Lake as a native format. Snowflake did the same with Iceberg and other vendors have followed suit. Meanwhile, there are still a variety of database platforms that connect to data stored in open table formats as external tables, rather than as truly native ones.

Next year, look for open table format support to become increasingly more robust, and get ready to devise a data strategy based upon it. With support for these formats, there’s a good chance that many, if not most, database engines will be able to share the same physical data stored in these formats, query it at native speeds, and operate on it in both a read and write capacity. Proprietary formats may slowly be giving way, and platforms in the future may succeed on their innate capabilities more than the feedback loop between their dominance and resulting data gravity.

Feature and Skillset Equilibrium

Ultimately, a single database platform cannot be all things to all customers and users. Many of the incumbent platforms try to support the full complement of use cases, but most end up having a litany of new features added over the years that turn out to be more gimmick than mainstay.

Luckily, the new-fangled features in incumbent databases tend to go through a Darwinist process, eventually distilling down to a core set of capabilities that likely won’t achieve parity with those of specialized database platforms but will nonetheless be sufficient for a majority of customers’ needs. As the superfluous capabilities are whittled away, important workhorse features get onboarded, adopted, and added to the mainstream database and application development canon.

The market works as it should: incumbents add features in response to competitive pressures from new players that they likely would not have added on their own initiative. This allows room for innovative new players but also lets customers who wish to leverage their investments in existing platforms do just that.

It’s interesting how the fundamentals of relational algebra, SQL queries, and the like have stayed relevant over decades, but the utility, interoperability and applicability of databases keep increasing. It means the skillsets and technologies are investments that are not only safe but also, like good financial investments, grow in value and pay dividends.

Some customers will want a first-mover advantage, and so brand-new, innovative platforms will appeal to them. But many customers won’t want to re-platform or trade away their skillset investments and will prefer that vendors widen the existing road rather than introduce detours in it. Those customers should demand and place their bets with the vendors that embrace such approaches. And in 2025, they should expect vendors to welcome and accommodate them.

YOUTUBE.COM/THENEWSTACK

Tech moves fast, don’t miss an episode. Subscribe to our YouTube
channel to stream all our podcasts, interviews, demos, and more.

Group
Created with Sketch.

TNS owner Insight Partners is an investor in: SingleStore, Databricks.







Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


The 9 Largest NYC Tech Startup Funding Rounds of December 2024 – AlleyWatch

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Armed with some data from our friends at CrunchBase, I broke down the largest NYC Startup funding rounds in New York for December 2024. The analysis includes supplementary details like industry, company descriptions, round types, and total equity funding raised to provide a more comprehensive view of the venture capital landscape in NYC.


The AlleyWatch audience is driving progress and innovation on a global scale. With its regional media properties, AlleyWatch serves as the highway for technology and entrepreneurship. There are a number of options to reach this audience of the world’s most innovative organizations and startups at scale including developing prominent brand placement, driving demand generation, and building thought leadership among the vast majority of key decision-makers in the New York business community and beyond. Learn more about advertising to NYC Tech at scale.



9. Cofactr $17.2M
Round: Series A
Description: Cofactor allows companies to optimize every step of their electronics supply to manufacturing journey. Founded by Matthew Haber and Phillip Gulley in 2021, Cofactr has now raised a total of $28.4M in total equity funding and is backed by Y Combinator, Pioneer Fund, Correlation Ventures, Bain Capital Ventures, and DNX Ventures.
Investors in the round: Bain Capital Ventures, Broom Ventures, DNX Ventures, Floating Point, Y Combinator
Industry: Software, Supply Chain Management
Founders: Matthew Haber, Phillip Gulley
Founding year: 2021
Total equity funding raised: $28.4M
AlleyWatch’s exclusive coverage of this round: Cofactr Raises $17.2M to Help Aerospace and Defense Companies Navigate Complex Supply Chains


The AlleyWatch audience is driving progress and innovation on a global scale. With its regional media properties, AlleyWatch serves as the highway for technology and entrepreneurship. There are a number of options to reach this audience of the world’s most innovative organizations and startups at scale including developing prominent brand placement, driving demand generation, and building thought leadership among the vast majority of key decision-makers in the New York business community and beyond. Learn more about advertising to NYC Tech at scale.



8. Stainless $25.0M
Round: Series A
Description: Stainless is building the platform for high-quality, easy-to-use APIs Founded by Alex Rattray in 2021, Stainless has now raised a total of $28.5M in total equity funding and is backed by Sequoia Capital, Felicis, The General Partnership, Zapier, and MongoDB Ventures.
Investors in the round: Felicis, MongoDB Ventures, Sequoia Capital, The General Partnership, Zapier
Industry: Developer APIs, Developer Platform, Developer Tools, Enterprise Software, SaaS
Founders: Alex Rattray
Founding year: 2021
Total equity funding raised: $28.5M


The AlleyWatch audience is driving progress and innovation on a global scale. With its regional media properties, AlleyWatch serves as the highway for technology and entrepreneurship. There are a number of options to reach this audience of the world’s most innovative organizations and startups at scale including developing prominent brand placement, driving demand generation, and building thought leadership among the vast majority of key decision-makers in the New York business community and beyond. Learn more about advertising to NYC Tech at scale.



7. Sollis Health $33.0M
Round: Series B
Description: Sollis Health offers concierge medical centers that provide on-demand care for families, including same-day visits and virtual care options. Founded by Andrew Olanow, Benjamin Kruger, and Dr. Bernard Kruger in 2017, Sollis Health has now raised a total of $80.4M in total equity funding and is backed by Foresite Capital, Torch Capital, Montage Ventures, Arkitekt Ventures, and Friedom Partners.
Investors in the round: Foresite Capital, Friedom Partners, Montage Ventures, One Eight Capital, Read Capital, Torch Capital
Industry: Health Care, Health Diagnostics, Medical, Personal Health
Founders: Andrew Olanow, Benjamin Kruger, Dr. Bernard Kruger
Founding year: 2017
Total equity funding raised: $80.4M
AlleyWatch’s exclusive coverage of this round: Sollis Health Raises $33M to Transform Emergency Healthcare with Concierge Model


The AlleyWatch audience is driving progress and innovation on a global scale. With its regional media properties, AlleyWatch serves as the highway for technology and entrepreneurship. There are a number of options to reach this audience of the world’s most innovative organizations and startups at scale including developing prominent brand placement, driving demand generation, and building thought leadership among the vast majority of key decision-makers in the New York business community and beyond. Learn more about advertising to NYC Tech at scale.



6. Basis $34.0M
Round: Series A
Description: Basis provides AI agents that automate accounting workflows for professionals. Founded by Ryan Serhant in 2023, Basis has now raised a total of $37.6M in total equity funding and is backed by Khosla Ventures, BoxGroup, Daniel Gross, Jeff Dean, and Kyle Vogt.
Investors in the round: Aaron Levie, Adam D’Angelo, Amjad Masad, Avid Ventures, Azeem Azhar, Better Tomorrow Ventures, BoxGroup, Claire Hughes Johnson, Clem Delangue, Daniel Gross, Douwe Kiela, Jack Altman, Jeff Dean, Jeff Wilke, Khosla Ventures, Kyle Vogt, Larry Summers, Lenny Rachitsky, Michele Catasta, Nat Friedman, NFDG Ventures, Noam Brown
Industry: Accounting, Artificial Intelligence (AI), Information Technology
Founders: Ryan Serhant
Founding year: 2023
Total equity funding raised: $37.6M


The AlleyWatch audience is driving progress and innovation on a global scale. With its regional media properties, AlleyWatch serves as the highway for technology and entrepreneurship. There are a number of options to reach this audience of the world’s most innovative organizations and startups at scale including developing prominent brand placement, driving demand generation, and building thought leadership among the vast majority of key decision-makers in the New York business community and beyond. Learn more about advertising to NYC Tech at scale.



5. Sage $35.0M
Round: Series B
Description: Sage is an operation management system that enhances senior caregiving efficiency. Founded by Ellen Johnston, Matthew Lynch, and Raj Mehra in 2020, Sage has now raised a total of $59.0M in total equity funding and is backed by IVP, Friends & Family Capital, Maveron, ANIMO Ventures, and Goldcrest Capital.
Investors in the round: ANIMO Ventures, Distributed Ventures, Friends & Family Capital, Goldcrest Capital, IVP, Maveron, Plus Capital
Industry: Apps, Social, Software
Founders: Ellen Johnston, Matthew Lynch, Raj Mehra
Founding year: 2020
Total equity funding raised: $59.0M
AlleyWatch’s exclusive coverage of this round: Sage Raises $35M to Modernize Senior Living Operations


The AlleyWatch audience is driving progress and innovation on a global scale. With its regional media properties, AlleyWatch serves as the highway for technology and entrepreneurship. There are a number of options to reach this audience of the world’s most innovative organizations and startups at scale including developing prominent brand placement, driving demand generation, and building thought leadership among the vast majority of key decision-makers in the New York business community and beyond. Learn more about advertising to NYC Tech at scale.



4. S.MPLE by SERHANT $45.0M
Round: Venture
Description: S.MPLE by SERHANT provides AI-powered professional team of real estate experts. Founded by Ryan Serhant in 2020, S.MPLE by SERHANT has now raised a total of $45.0M in total equity funding and is backed by Camber Creek and Left Lane Capital.
Investors in the round: Camber Creek, Left Lane Capital
Industry: Commercial Real Estate, Real Estate, Real Estate Brokerage
Founders: Ryan Serhant
Founding year: 2020
Total equity funding raised: $45.0M
AlleyWatch’s exclusive coverage of this round: Real Estate Mega Broker Ryan Serhant Raises $45M for S.MPLE to Free Agents From Admin Work


The AlleyWatch audience is driving progress and innovation on a global scale. With its regional media properties, AlleyWatch serves as the highway for technology and entrepreneurship. There are a number of options to reach this audience of the world’s most innovative organizations and startups at scale including developing prominent brand placement, driving demand generation, and building thought leadership among the vast majority of key decision-makers in the New York business community and beyond. Learn more about advertising to NYC Tech at scale.



3. Precision Neuroscience $102.0M
Round: Series C
Description: Precision Neuroscience is a neural platform that engages in brain-computer interface technology. Founded by Benjamin Rapoport, Demetrios Papageorgiou, Mark Hettick, and Michael Mager in 2021, Precision Neuroscience has now raised a total of $248.0M in total equity funding and is backed by General Equity Holdings, Alumni Ventures, B Capital, Draper Associates, and Duquesne Family Office.
Investors in the round: B Capital, Duquesne Family Office, General Equity Holdings, Steadview Capital
Industry: Medical, Medical Device, Neuroscience, Product Research
Founders: Benjamin Rapoport, Demetrios Papageorgiou, Mark Hettick, Michael Mager
Founding year: 2021
Total equity funding raised: $248.0M


The AlleyWatch audience is driving progress and innovation on a global scale. With its regional media properties, AlleyWatch serves as the highway for technology and entrepreneurship. There are a number of options to reach this audience of the world’s most innovative organizations and startups at scale including developing prominent brand placement, driving demand generation, and building thought leadership among the vast majority of key decision-makers in the New York business community and beyond. Learn more about advertising to NYC Tech at scale.



2. Public $105.0M
Round: Series D
Description: Public is a fractional investing platform that allows members to build diverse portfolios, including stocks, ETFs, crypto, and NFTs. Founded by Jannick Malling, Leif Abraham, Matt Kennedy, Peter Quinn, and Sean Hendelman in 2019, Public has now raised a total of $413.5M in total equity funding and is backed by Accel, Bossa Invest, Scott Belsky, Inspired Capital Partners, and Lakestar.
Investors in the round: Accel
Industry: Cryptocurrency, FinTech, Stock Exchanges, Trading Platform
Founders: Jannick Malling, Leif Abraham, Matt Kennedy, Peter Quinn, Sean Hendelman
Founding year: 2019
Total equity funding raised: $413.5M


The AlleyWatch audience is driving progress and innovation on a global scale. With its regional media properties, AlleyWatch serves as the highway for technology and entrepreneurship. There are a number of options to reach this audience of the world’s most innovative organizations and startups at scale including developing prominent brand placement, driving demand generation, and building thought leadership among the vast majority of key decision-makers in the New York business community and beyond. Learn more about advertising to NYC Tech at scale.



1. Cleerly $106.0M
Round: Series C
Description: Cleerly is a digital healthcare company that offers heart disease diagnosis solutions. Founded by James K. Min in 2017, Cleerly has now raised a total of $386.5M in total equity funding and is backed by Novartis, Fidelity, Insight Partners, T. Rowe Price, and Sands Capital Ventures.
Investors in the round: Battery Ventures, Insight Partners
Industry: Apps, Artificial Intelligence (AI), Health Care, Medical, Wellness
Founders: James K. Min
Founding year: 2017
Total equity funding raised: $386.5M


The AlleyWatch audience is driving progress and innovation on a global scale. With its regional media properties, AlleyWatch serves as the highway for technology and entrepreneurship. There are a number of options to reach this audience of the world’s most innovative organizations and startups at scale including developing prominent brand placement, driving demand generation, and building thought leadership among the vast majority of key decision-makers in the New York business community and beyond. Learn more about advertising to NYC Tech at scale.


Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


How to Go from Copy and Paste Deployments to Full GitOps

MMS Founder
MMS Ben Linders

Article originally posted on InfoQ. Visit InfoQ

InnerSource helped reduce the amount of development work involved when introducing GitOps by sharing company-specific logic, Jemma Hussein Allen said at QCon London. In her talk, she showed how they went from copy and paste deployments to full GitOps. She mentioned that a psychologically safe environment is really important for open and honest discussions that can help resolve pain points and drive innovation.

Their version control tool at the time was Subversion, which was moved to a more popular distributed version control system – Github, the Git based developer platform.

When they started the implementation of GitOps, their servers were running a LAMP stack (Linux, Apache, MySQL and PHP). This was a standard software stack for PHP web applications, Hussein Allen mentioned, which they didn’t want to change as the application was running well and the focus was on deployment automation.

The CI/CD tool of choice was Jenkins because of the flexibility in pipeline building block configuration and the large number of plugin integrations with other tools that were available, Hussein Allen said. Puppet was used for configuration management as it was already implemented for the more recent deployments and worked well, so the decision was made to continue with it, she mentioned.

Hussein Allen mentioned the four GitOps principles:

  • Declarative – the desired system state needs to be defined declaratively
  • Versioned and Immutable – the desired system state needs to be stored in a way that is immutable and versioned
  • Automatic pulls – the desired system state is automatically pulled from source without manual interventions
  • Continuous reconciliation – the system state is continuously monitored and reconciled

The principles they found the most important were a declarative system state definition, solid versioning and continuous reconciliation, because of the benefits they brought in terms of faster development and deployments, Hussein Allen said.

The removal of all manual interventions is one that they used more cautiously, Hussein Allen said, as it required a fully comprehensive set of automated tests, high availability, a very mature monitoring solution, and compatibility with team ways of working.

Developers customised their workspaces with Docker, as Hussein Allen explained:

We set up an image registry where developers could download base images and use these as building blocks to develop and test new features locally before integrating them into the multi-environment test and deployment workflow.

Before the changes, developers would change code locally and deploy to the development environment to test, Hussein Allen said. The challenge with this testing method came when multiple developers needed to test different changes in the development environment at the same time, she explained:

When the development environment included multiple changes, it didn’t allow for testing in isolation and didn’t provide reliable results. The image registry meant that developers could test their own changes locally, against the code running in production, in isolation before integrating the change into the main development testing pool.

A common theme that emerged from developer feedback on GitOps was the need for building blocks and a “quick start” guide to help adopt more quickly. Introducing an InnerSource capability encouraged developers to create and contribute to these building blocks and boilerplates, Hussein Allen said. The increased contribution to shared resources positively and directly impacted the speed that developers could adopt new tooling.

As developer requirements evolve, creating an open dialogue within a psychologically safe environment is invaluable for understanding the evolving needs of developers, Hussein Allen said; establishing regular connections and solid offline communication channels, ensuring developers stay up-to-date with the platform roadmap and any alterations that could impact their work. These channels also provide valuable opportunities for developer feedback and suggestions, which can be integrated into the platform strategy, she concluded.

InfoQ interviewed Jemma Hussein Allen about adopting GitOps.

InfoQ: What did you do to help developers with the transition from the old way of working?

Jemma Hussein Allen: We spent time training or pair programming with developers. Some teams took longer to transition to the newer way of working, mainly due to heavy workloads, as it took time for developers to familiarise themselves with the new process and become as efficient using it as they were with the old way of working.

Working with team product owners and stakeholders to show the benefits of the new process helped to give developers the bandwidth to learn and adopt the new tooling into their daily work.

InfoQ: What’s your approach to knowing the needs of the developers and keeping the dialogue going?

Hussein Allen: What we found helpful in understanding developer needs was providing the opportunity for platform engineers and developers to work more closely together. In organisations with a centralised platform team structure, initiatives such as “Walk a day in my shoes” where platform engineers are embedded into product teams for a short time and vice versa can be really valuable to get an understanding of any pain points or improvements that can be made to the platform.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Presentation: Panel: What Does the Future of Computing Looks Like

MMS Founder
MMS Julia Lawall Matt Fleming Joe Rowell Thomas Dullien

Article originally posted on InfoQ. Visit InfoQ

Transcript

Dullien: We will discuss a little bit about how the hardware has changed from the mental model that we all used to have when we started computing, and what that implies for things like operating system interfaces, what this implies for things like observability, what this implies for benchmarking. Because one of the themes I think we’ve seen in the track was that, quite often, software is optimized for both the hardware platform and a use case that is no longer current. Meaning, we build some software, we design it for a particular use case, for a particular hardware, and then by the time that anybody else gets to use that software, it’s years later, and everything has changed.

The computing infrastructure is undergoing more change at the moment than it used to undergo, let’s say, from 2000 to 2020 with the arrival of accelerators, with the arrival of heterogeneous cores, with the extreme availability of high I/O NVMe drives and so forth. We’ll discuss a little bit how this is impacting everything. When was the last time that you saw something in software that was clearly designed for hardware that is no longer a reality?

Fleming: Data systems, various kinds of databases, streaming applications that don’t respect the parallelism required to achieve peak performance in disks.

Rowell: I see a lot of code on a regular basis that’s written that assumes a very particular model of x86 that has not existed for the last 30 years and will not exist for the next 30.

Lawall: There’s a lot of applications that when they want to be parallel, they just make themselves n threads on n cores, without regarding the fact that maybe those are hyper-threads, and it’s not always useful to overload the actual physical threads, not taking into account P cores and E cores, performance and energy saving cores, not taking into account perhaps virtualization issues. Perhaps we need some better model to allow the applications to figure out what is the best way to get performance out of the given machine.

Existing Programming Languages and GPU Throughput Computing

Dullien: Joe’s talk talked about code mixing regular CPU bound code and GPU code. I don’t think any of us ever expected that we would be running one C program in one address space on which two entirely different CPUs would be operating at the same time. What’s everybody’s thoughts on, are the programming languages and paradigms we use at the moment actually very adapted to this, or will we need different programming languages to deal with the arrival of GPU throughput computing in our service?

Fleming: I’m a big believer in modularization being one of the superpowers of this industry. I think we’ll find a way to basically compartmentalize this type of hardware, so that we only have to deal with one language at once. I think we’ll maybe get new languages for specific things, but not an all-encompassing language. We’ll still manage to treat these things as composable units.

Rowell: I think I feel the same way, in so far as I think a lot of the issues I raised in my talk can be properly handled by good coding discipline. For example, there’s been a trend in some C project lately to not actually use pointers, but to instead hand out a handle, essentially, and then use that in relation to some base object. This probably would have solved some of the issues that I had. Because if you have a different object, suddenly that handle has no meaning. I think you could still use these things. I think we need to start treating those things as antipatterns, rather than as something that we should just do because we want to, essentially.

Lawall: I would also think that maybe it’s not new programming languages. Maybe we don’t necessarily want to expose all of these details to the programmer. A programming language is the way the person thinks about the software that they’re trying to develop, and maybe it’s more like building the software modularity, as has been mentioned, that should be thinking about these issues of like, where should this code be running on what kind of hardware, using what kind of resources, and so on.

The Wall Between Pure Software Engineering and Underlying Hardware

Dullien: What I got from the entire row was that it’s more about the actual software engineering and less about the programming language. I think what we can see a little bit, is also, for lot of my youth, I was told, don’t worry about the low-level details, you write software at a high level, application level, and the magical compiler engineers and the magical electrical engineering wizards in the end, will make everything fast. I think what I’m observing, at least, is that we’re seeing a little bit of a dissolution of that wall between pure software engineering in the sense of application programmers, and the idea that they don’t need to know anything about the underlying hardware, as the hardware becomes more complicated or more heterogeneous. What are your thoughts on that wall and the dissolution of that wall?

Fleming: I think a lot of this may be cyclical, so you have peaks and waves where hardware accelerates and maybe you don’t have to care, and then times where you do have to care. I think we’re in that time now where it’s very important for programmers to know how the hardware works and to take full advantage.

Rowell: I think I agree with the sentiment, but I actually think it’s a slightly different issue, which is, you get to these situations where you can’t express efficiently what it is you want to say, or at least the compiler is not clever enough, and so you write something that does exactly what you want, but then any future compiler will never be able to write something better. It’s like with Richard’s talk. Because everything was written in terms of shifts, the compiler was never going to go, that should be a multiplication instead, I know better. I think you end up locking yourself into a situation where you’ve written something that’s good now but won’t be good forever. If any of you are clever enough to do this, I’d really love to see someone write a tool that will automatically go through and try to undo all of these performance tweaks and then see what you get out the other side, because I think that would be a really interesting case study.

Lawall: Maybe it’s naive on my part, but I would hope that compilers would figure this out over time, once people know what they want, or once the evolution of the hardware stabilizes, reaches a plateau for a certain amount of time, then presumably the compiler should be able to step up. That’s what has happened in the past. It can definitely be necessary that maybe you need to give some options or something like that, like favor a GPU for this or something, but hopefully the compilers would fill in the gap.

Dullien: Historically, we’ve had both successes on the sufficiently smart compiler and we had some failures on the sufficiently smart compiler front. It’s interesting to see that CUDA is somewhere in the middle. Because CUDA is C code, but it’s also a special dialect of C code that provides extra information.

Making The Linux Scheduler More User Configurable

Another thing I wanted to talk about is with the heterogeneity of workloads, like with Meta, or Google, or whoever having very specific workload problems that may not be shared by everybody else in the industry. The Linux kernel trying to serve many masters in terms of the scheduler, but also, for example, the C++ language trying to serve many masters. We’re seeing a push on the language design front from Google, or Meta, or whatever, to change parts of a spec. Also, in terms of operating system, we’re seeing a push from these giants to provide certain features, like a configurable scheduler that can also help them with their problems. I would just love to hear a little bit about the efforts to make the Linux scheduler more user configurable.

Lawall: You have your particular software, it has particular scheduling needs, and so you’d like to write a scheduling policy that is particularly tailored to your piece of software. The difference between Meta and Google, Meta is keeping you at the kernel level. You write your code in BPF, and then you obtain a kernel module. It’s basically like putting the scheduler in a kernel module, but you’re doing in a more safe way, in the sense that you’re writing BPF. On the other hand, the Google effort, it’s written at user level. You write your code in whatever language you like, perhaps you use your debugger, your traditional development environment, and then they have efficient messages that are sent down to the kernel to cause whatever things you requested to happen.

Both of these efforts reflect a frustration, perhaps, that the current scheduler that tries to do everything for everyone is actually not succeeding at being optimal for particular jobs that these companies are particularly interested in. Not just Google, Meta, but other companies also obviously have particular jobs that they want to work well. Some kind of database you could imagine has some very particular scheduling requirements. It seems like it’s very hard to resolve this distance between, we have very specific requirements, we want very high performance and so on, and the goal of being completely general. There’s an effort to open things up.

Then there’s also the question of, what do you make available? Do you make everything available in terms of, what can you configure? If you allow people to configure everything, then maybe they should just be writing their own scheduler in C and integrating it to the kernel. If you think in advance about, people will likely want to update these things, then you may miss something that people actually need, and then it will be somehow unsatisfactory, because if they’re missing some expressivity, then they won’t be able to use the approach at all. These things are just evolving. We haven’t reached a perfect version at the moment.

One aspect is just to somehow speed up the evolution time to be able to write and maintain policies that are adapted to particular software. Another aspect is to speed up the testing time. We talked a little bit during my talk about how maybe we don’t want to recompile the kernel, and it takes some time. It’s a bit obscure how to do it. Once you learn how to do it’s pretty easy, but I cannot deny that it takes some time. There are also certain kinds of execution environments that require a lot of setup time, and so to do your testing, the whole development and test cycle can get very long.

If you can just update things dynamically from the user level, then you don’t have the reboot time and the cost of restarting your entire execution environment. It definitely shows an interest in the different community in actually thinking about the scheduler and even thinking about other operating system components. You can think about, how could you dynamically change the memory manager? There’s actually been a long history of how can applications specify their own networking policies? This idea that you should just bypass at least some of the kernel and just manage these resources on your own is starting to get distributed to other resource management problems. It’s something interesting that’s evolving, and so we’ll see how it goes in the coming years.

Introspection (Pain Points)

Dullien: When it comes to performance work, I think we’ve all run into the issue that the systems we’re working on are not as inspectable as we would like them to be. Can you name a tool that you don’t have yet that you would like to have? Can you think about a situation where the last time you tried to resolve an issue you wish you had better introspection into x or something to do y? Is there something, like an itch you would like to scratch when it comes to introspection?

On my main CPU, I know how the operating system can help me profile stuff. Everything related to GPUs are super proprietary, 250 layers of licensing restrictions, NDAs, and so forth from NVIDIA to do anything. I would just like to have a clean interface to actually measure what a GPU is doing.

Fleming: I’d like to have basically all the tools I have now, but they can tell me the cost of running certain functions, like monetary cost in the cloud. Like this function cost this amount of money. This network request costs this amount of money. Maybe I’ve been using them so long, I think the interfaces for the tools that I have are pretty good but it’s missing that aspect of the price performance problem, which is like, what is the price?

Rowell: The first thing I’d really like is a magic disassembler that takes every piece of proprietary code and tells me what it does. I spend a lot of time working with GPUs, and you get to a point very quickly where you have no idea what’s going on. In fact, even if you open it and say GDB or stuff, you will see, you do have the functions. You do have the stack, but the names of these functions do not correlate to anything that you would think that they would. They’re like _CUDA54 or things like that. They’re completely opaque. The second thing I’d really quite like, actually, is better causal profiling.

Causal profiling is this idea that, your system is very complicated, and so rather than sampling your call stack constantly, what you do is, is you slow down one of the threads, and you see how much that would change the overall behavior of your application. The point is, is that rather than just speeding up a single hotspot, you’re actually working out what the performance dependencies are in your program. Every time I’ve tried to use one of these, especially in a GPU context, it’s actually ended up being harmful to my ability to understand what’s going on. Having a better version of that would be really good for me.

Lawall: I was actually really inspired with what Matt talked about with the CPD, so that can show you where are the change points in your execution, and somehow being able to zoom in immediately onto those change points and figure out what changed at those change points, and what are the different resources and so on that are involved in that. You have a long execution, maybe it runs for hours or something like that, and you find that its overall execution time is slower than you expected, so the ability to zoom in on exactly the place where things started going badly would be very nice.

Computer Science/Developer Ed and Emphasis on Empirical Science

Dullien: One thing I’ve observed in Matt’s talk was, at the end of the talk, there was somebody asking, have you worked with a data scientist on your problem. One thing that haunts me personally when doing performance work, my background is originally pure mathematics with a minor in computer science, and it turns out that there’s very little empirical science like hypothesis testing, statistics, if you choose that education. What are your thoughts on, does computer science education or software developer education need more emphasis on empirical science in order to deal with the complexity of modern systems? Because the reality is, in my computer science studies, it was all about, here’s an abstract model of computation, here’s some asymptotic analysis, and so forth.

The fact that a modern computer is a bunch of interlocking systems that cannot be reasoned about from first principles but need empirical methods just wasn’t a thing. With the increased complexity of modern hardware, do we need a change in computer science education to have more focus on empirical methods to understand your systems?

Fleming: Yes. I think if you’re doing interesting work, sooner or later, you come across a problem that nobody or very few people have hit before. It doesn’t always happen a lot, but eventually you run into a compiler issue, a library issue, something that there is no Stack Overflow answer to, or GitHub Issue for. I think this ability to quickly move through the problem space comes down to hypothesis testing and being able to cut off certain branches of the decision tree as you’re moving through. In my experience, like I was never taught to do this. I’ve not seen a lot of people demonstrate this, apart from the really good debuggers and engineers. I think it’s something that the whole industry would benefit from.

Rowell: I think that we’re actually sitting in a very exciting period of time. For those of you who have grown up in the UK, you’ll know that for a long time, computer science education in the UK was basically Excel. You sat down, you went to class, you just singled ICT, which was how you made PowerPoints and stuff like that. As a result, at university level, there wasn’t really much background that you could assume. Actually, if you had done any computer science before, maybe it would be your third year of university before you learned something that was truly novel to you. I really hope that in future years, this changes. I hope that we take this opportunity with having computer science actually being taught at a younger age to update what we’re teaching in further education.

Lawall: When I talk with people, I see a lot of feeling, we have to think through this somehow. I think there needs to be more of a balance, definitely thinking through things, trying to understand what the algorithms are and so on, is very important. I think there needs to be more of a balance between trying to reason through things and trying to do experiments and getting more thought about, how can we do those experiments, and how can we get out the relevant information? Because I think maybe people tend to try to just think things through independently, because it’s very hard at the moment to get actual information out of the huge amount of information that’s collected, if you try to trace your code. We talked in the beginning about accelerators and so on. Things are going to get even more complicated with different kinds of hardware that are available, and teasing apart all that different information to show you that, like in Joe’s talk, like your memory is going badly, because of your sharing with your GPU. Something is going badly, but what is it? It seems currently very hard to do at very large scale.

Path Dependence in Tech

Dullien: There’s often a path dependence in technology, where, for example, we write some code, we use a compiler to compile this code, and then we design the next CPU so it runs the code that we’ve already compiled, more quickly, which locks us into one path, because now trying anything else will make us more slower. Have you encountered something that looks like this was a path dependence that nobody would ever build again in the same way if they could start from scratch in computing?

Fleming: I think I’ve seen the consequences of that, rather than actual clear, bona fide examples of that. I think that this comes back to people’s inability to assess things from first principles, particularly performance and systems. They assume that what they have today doesn’t need revisiting. That what is there is there and it doesn’t need to be changed. This has a lot of implications for secondary systems, where actually, if you redesigned it, you would get more cheaper performance. I don’t think that we take enough looks at stuff like that.

Rowell: There’s a very famous example that came up recently, which is floating-point control words. If you ever look at any of the floating-point specifications, there’s various flags that control how rounding is done, how things are discarded, and stuff like that. I think it was last year we found out that if you ran one program that was compiled with, I don’t care, do whatever, it set that flag for all of the programs running on your CPU. That’s completely unbelievable. Of course, it’s a legacy from when, actually, we didn’t care about this so much, where maybe you had one program or it was a system-wide decision. I don’t think anyone would ever design it like that now. I think it’s just asinine.

Lawall: At least, I think operating systems are very much a collection of heuristics. This goes back to what I was saying before about the user level scheduling. The existing operating systems are large collections of heuristics. It’s very hard to tweak those. You can add more heuristics, but it’s hard to think about actually changing them, because you don’t know exactly why they are in that way anymore, and so that might break something that’s critical somehow, so people would like this kind of programmability so they can just throw away all those heuristics and start with their own thing. I think, in general, we’re stuck with this because we’ve always done it this way, and we need to maintain the performance that we had, so we can’t actually go to new design strategies or something like that.

Dullien: We have a bit of a lock-in to a local maximum that might not be a global maximum anymore.

Building/Developing with the Future in Mind

Given that constants or magic parameters that are chosen at one point in time for one hardware platform that then expire are so ubiquitous, is it a sensible idea to try annotating parts of a source code that is likely going to expire, with an expiry date. One of my favorite examples is, there’s a compression algorithm called Brotli, Brotli is in your browser these days. When it was created, the author of Brotli trained, essentially, a dictionary of common words based on the web corpus at the time to compress the web corpus better.

At that point in time, Brotli got much better results than the competitors, but that was more than 10 years ago, and the web corpus changed now. Nowadays, the Brotli spec contains this large collection of constant data that is of no use anymore, but can’t be swapped because it’s the standard now. What are your thoughts on, how can we on the software engineering side manage things better that are likely going to expire in the future?

Fleming: I’ve definitely seen this problem. I’ve seen issues in the Linux scheduler, where ACPI tables from 2008 CPUs were used, when AMD EPYC came out, the values pulled out of the table were completely relevant to like a machine that was built 10 years later. I don’t know that people really think this way, and I think that’s the problem, that the thing I’m building now is designed for the performance of the systems today. I don’t think people would necessarily annotate or write documents in that way, though they should. If they did, I have this feeling that the annotations would be lost over time. I think it’s a much bigger problem than it would seem. It’s a mindset shift.

Rowell: I think I’d go a step further and say that it’s unknowable. I’ll give you an example. If you’ve ever written any C++ code, you will know that in any method, you have an implicit this pointer everywhere. Actually, the fact that you have a pointer everywhere means that you can never pass these objects in registers. You always have to push them onto the stack because you have to be able to take their address. I don’t think whenever they design a language, they would have known this, that this would have actually had an impact on performance. It’s called verification. We have a tendency to fix design points in our space to make it easier for us to reason about them. I think that’s unknowable in a way. It’s the same with security. I think we end up fixing certain things to make it easier to understand. I would agree. I would love it if my code refused to compile past a certain date, so that I could go back and fix things.

Lawall: It’s not just constants, it’s any kinds of design decisions if you have some labels like, this is a P core relevant process, and this is a E core relevant process. That might change over time as well. I think there need to be somehow more specifications of like, what was the purpose? More explanation in some way of what is the purpose of doing this particular computation or classification and so on. On the other hand, you could say, but developers will never want to do these things.

There’s like Rust, and Rust requires one to put all kinds of strange annotations on one’s types and things like that. Maybe there’s hope in the future that developers will start to become more aware that there’s like some knowledge in their head, and then they transmit this knowledge onto paper, and there’s more awareness of, there’s an information loss between the head and the paper, and that that lost information is not going to be able to be reconstructed easily in the future, and that’s going to be an important thing in the future.

Dullien: That will make the point for not so much better programming languages, just really expressive type systems. The advantage of strong type systems is also that documentation gets out of date, but a type system will at some point allow the compiler to refuse to compile.

Where Performance Data Collection and System Transparency Converge

Joe, you mentioned security. As somebody working between security and performance, there was a famous attempt at backdooring a compression algorithm or a compression library to then get a backdoor into OpenSSH, which would have been the dream of every attacker and fairly disastrous for every defender. First of all, the person that noticed this noticed it because it created 500 milliseconds of extra lag during an SSH login. Then the person who took essentially performance tools as a first step to investigate what’s going on, to analyze this backdoor.

For me, who used to work in security and now works in performance, it was very gratifying to see a little bit the convergence that more introspectable systems are systems that are easier to reason about from a performance standpoint, but they also help you deal with the security incidents more. What are your thoughts on, first of all, the convergence between gathering performance data that can also be used for other purposes, such as security, and for the importance of transparency in systems. We mentioned CUDA and NVIDIA’s closed ecosystem before. What are your thoughts on this?

Fleming: Having this openness in open-source software is important, because, security and performance are about different tradeoffs, but you can increase security at the expense of functionality, usually. Performance has a similar tradeoff where you get more performance for a specific use case. I think the ability for people to understand what’s going on in their systems because of these tradeoffs is critical. Also, repeatability, to me, is lumped into this whole open thing as well, and maybe verification is about, you need to be able to verify that the claims made by somebody are true, or that the things other people are seeing you can see too.

Rowell: I’d agree with all of that. I think it’s very interesting when you consider that actually both of these things are observability problems and also performance problems in different ways. Just to give you some information on my background: prior to doing performance work, I did cryptography. In cryptography, you very much want your algorithms to always run at exactly the same speed no matter what. The reason why is because there are these very clever attacks where if you use certain instructions, you can essentially leak private information. That’s also a performance problem, but it’s a very different kind of performance problem. It’s not maximum throughput. It’s not having always the best you can possibly do. It’s just, don’t leak information. In that sense, observability is good and also bad, because if you can observe that, that’s the backdoor. It’s clear that observability is really very important to all of this.

Lawall: Performance is one thing that you could observe, like you could observe something else. Maybe other issues can also arise that might indicate security considerations. I would be inclined to put more emphasis on specifications and being able to continue to ensure that the specifications are matching what the software does as a way of somehow ensuring that things are going in the same direction. Definitely, one needs to bring everything together.

Performance Engineering – Looking into the Future

Dullien: If you were to give somebody that embarks on a career in performance engineering some advice about what to look into for the next couple years, which is always terrible, because the nature of any advice is telling people about a world that doesn’t exist yet and you have no idea what’s going to happen. If you try to give yourself advice, or somebody who’s coming into the field now, what would your advice be about where to put emphasis in the next couple years with regards to performance engineering?

Lawall: Take inspiration from the people who work on data because there’s a lot of opportunity to make incorrect conclusions, and if you have bad methodologies to believe that the performance is improving or decreasing or whatever, based on insufficient information. I think the data science people have a lot of interesting answers that we should be looking into.

Fleming: It’s kind of a golden answer or evergreen answer, which is, look for the places where people haven’t reassessed systems in a long time. I think that’s the interesting place to be. This happens in cycles, and you get it. Database is having a resurgence at the moment, there are people that are reevaluating the way you design databases. I would urge them to look for adjacent types of systems where maybe we haven’t reevaluated the way they’re designed.

Rowell: I think I’ll go in a slightly different direction by repeating the phrase that if you have a hammer, everything is a nail. I’d recommend that you all just learn random things. Because actually, in my own personal experience, oftentimes I’ve tried to apply standard tools like perf, or pahole, or whatever, to looking at a problem. Actually, normally, the insight that has helped me has been something completely random in the back of my head that I never would have told anyone else to learn. It’s important when you’re doing anything that has such general impact, to try to be a generalist in some way. Try to learn about as many different things as you can.

Dullien: It’s one of the nice things about the full-stack list of performance, you get an excuse to scavenge in everybody’s library.

Unikernels

Unikernels used to be a thing that was quite heavily discussed a couple years ago. The idea of a unikernel is essentially specializing a kernel to a particular workload with some support from the hypervisor to then run an operating system specialized for a particular workload. What has happened to it? Where has this gone? Is this still a thing?

Lawall: From what I know about the work on unikernels, the idea is more about like taking out certain components that are not relevant. If you are not doing the networking, it’s better to take out the networking code, because that code might be vulnerable, or might be doing some polling, which is time consuming, and so on. Things like list scheduling, perhaps also memory management or something like that. This is very tightly intertwined subsystems that are not designed in a very modular way.

I think it would be hard to meaningfully just be extracting things from the scheduler to get a scheduler that perhaps has no preemption, or something like that, if you don’t need preemption. It seems like the direction people might be going in was to provide some kind of rewritten specialized scheduler for this particular purpose. There, you would want to be adding more interfaces to core operating system services, but with the caveats that I mentioned before, the interface might not give all the expressivity that is wanted.

Fleming: I think the unikernel folks would argue that unikernels are definitely still in vogue. I think in a world where most of our software runs on cloud systems we don’t own, but we rent, and it comes with various operating system images that maybe we don’t inspect very well, I think there’s a case for building custom operating system images. From a performance perspective, you get a non-negligible amount of noise from services that run as part of the base OS image. That’s one of the reasons that I’m looking at unikernels now, is that it’s a nice idea to basically strip all that out, like Julia said, and have just the application and the essential libraries and operating systems required. I think there’s still a case for this.

Dullien: One of the things I’ve seen a little bit is not so much the unikernel deployment in production in large numbers, but people trying to get the kernel out of the way, so having user space talk directly to the NIC kernel bypass. I think user space talk more or less directly to the storage infrastructure. We might not necessarily get the real unikernels as we imagine of them, but we may get a system where more pieces of the system talk to each other directly without going through the kernel on the way by just having shared memory between them.

Tools For Diagnosing and Debugging Surprising Production Problems

Recommendations on tooling for diagnosing and debugging surprising production problems.

Fleming: I don’t have a specific recommendation for a tool. In my experience, you need to have either the ability to replay the traffic or have something that’s continuously on and is low overhead, which sounds like a weird answer, but you need one or the other. Because this idea that you can diagnose things after the fact without having enough information, in my experience, just doesn’t work, and you will miss performance regressions. I don’t have a recommendation, but something low overhead that is on all the time, or the ability to replay traffic with shadow traffic or something.

Rowell: I’d echo that point. In my experience with any continuous profiling, for me, it’s really been about going, “That looks weird. That’s slower than I expected”. Then trying as hard as I possibly can to make a reproducible case. Actually, turns out most of the time I do need to go to replay network traffic. That would be my advice, would be, I tend to use it to catch first problems and then try very hard to reproduce.

Dullien: Having a continuous profiler in any form, and then in combination with bpftrace, there used to be a Kubernetes plugin called kubectl trace, which essentially allows you to schedule a bpftrace program on any node. You have a continuous profiler to provide profiling data continuously all the time, and then you can dig into a particular node by putting a kprobe somewhere in the kernel to measure what’s going on. I found that combination to be very useful, of course, not solving all my problems, but it solves the first 40% of my problems, and then I’ve got new ones.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Presentation: Leveraging Internal Developer Portals to Achieve Strategic Initiatives

MMS Founder
MMS Frank Fodera

Article originally posted on InfoQ. Visit InfoQ

Transcript

Fodera: I’m going to be talking about how CarGurus leverages our internal developer portal to achieve our strategic initiatives. I want to know, how many folks actually know what an internal developer portal is? An internal developer portal is really like a centralized hub that allows us to improve developer experience, developer efficiency by reducing a whole bunch of cognitive load. It’s usually internal, but the whole point is to centralize information into it.

We’ll talk about launching Showroom. Showroom is our internal developer portal that we built at CarGurus. How we actually achieved critical mass of adoption with our internal developers. Then, the foundation that we really built that helped us to set ourselves up for the future of our strategic initiatives that we’re trying to accomplish.

My name is Frank Fodera. I’m a director of engineering of developer experience. I’ve been at the company for about six years. My background is primarily in backend development, architecture, and platform engineering. Sometimes I’ll jump into the frontend as needed. I always found myself on customer facing teams, but CarGurus really helped me find my passion for improving the developer experience. That’s currently where I’m at. I do love staying technical. I found enjoyment in coaching and helping others grow, as well as achieving their strategic vision. Really like making a wide impact at the company, so I unexpectedly started gravitating towards leadership roles.

Developer Experience (DevX) – Our Mission

Before we jump into the actual tool, I want to talk a little bit about developer experience and what we’re trying to accomplish with our goals. Our mission statement was really to improve the developer experience at CarGurus by enabling team autonomy and optimizing product development. We do this in a couple of different ways. We have an architecture team that really invests into making sure that we have scalable architecture system design, and they do that for both the frontend and backend.

We have Platform as a Service team which really invests into providing a platform offering which provides a great developer experience for the developer workflows, our environments, and really all the day-to-day that we’re doing with how we’re working. We have a pipelines and automation team which focuses on build and delivery and how we actually get everything into production, and then all the quality gates that we’re investing into in order to make sure that we do that very seamlessly. Then we have our tools team, which is really the focus of this talk, which is internal developer portal, and the internal tools that are helping us improve that developer experience at our company.

Launching Showroom

Launching Showroom. When we first started Showroom, it really was just a catalog. We knew we had a problem, but we found that it eventually evolved over time, into what we call an internal developer portal, so it’s our homegrown solution. In this presentation, we’re going to talk a little bit about what problems justified why we created this ourselves. What outcomes did those solutions provide? How did we actually achieve critical mass of getting people to use this product? Then throughout, we’ll talk about a lot of these strategic initiatives that we piggybacked off of in order to invest into this product. Then at the end, we’ll wrap it up with a little bit of a foundation that we really created that helps us to continue to invest into this, as well as leverage this to move faster on achieving those strategic initiatives.

Our journey really started in 2019. Many of the talks have talked about tech modernization, or trying to optimize the way that we’re doing stuff into the cloud, and that’s where we started with our journey. We wanted to start moving into microservices. We called it our decomposition initiative. Our monolith was starting to get slower to develop in. We wanted to actually develop more features, but we found it difficult. We knew something needed to change. We needed to transform the way that we approached our development. Thus, we embarked on our microservice journey. Our problems that we had were that we knew we were going to have hundreds of services. We already had thousands of jobs, and ownership was unknown, and some of them were even unclear.

One of the engineers on my team actually said something in our Slack, where he said, everything that has no owner is going to eventually be owned by the infrastructure. We were an infrastructure team, so it was definitely a motivation for us to make sure that we had clear ownership, because we didn’t want to end up owning everything. Production readiness was another problem we had. We didn’t always know what platform integrations, if we were ready for production as we were trying to introduce these new services. There really wasn’t much transparency into that. Overall, we found that it was very difficult for us to create new artifacts. It was a very heavy platform investment. It took us a lot of handholding to get it across the line. It was very time consuming. We knew that there were some big problems that we wanted to solve.

This is actually what our monolith looks like. We actually bought a tool to try to go and look at our dependency graph within our monolithic architecture. We had all of these packages, all these modules, and we had almost no restrictions on how they could actually call each other, and we ended up with a big ball of mud. We knew we were dealing with something that was pretty bad, and we knew it was a difficult challenge for us to actually go and solve this. We thought our architecture looked like this.

We thought, yes, we have this monolith which has this frontend, it has all these packages it can call, and then it relies on a database. Nice and clean. That wasn’t the reality, as we just saw. What we were targeting was actually what we call these vertical slices. These vertical slices had everything that it needed from the frontend all the way to the database to really do what it needed to do, but it really only depended on the minimum set of things. We wanted those to be more isolated. We wanted to go into that microservice architecture, and provide a more decoupled way of operating.

We also knew that we were going to have a lot of these services getting created, so we prepared for the service explosion by making it more clear who owns things, but also enforcing that we had registration and ownership. We started very basic. We started with our catalog. We started with our service catalog here. It was a JavaScript React based frontend. Developers could go in there, see what things were owned, pulled out a whole bunch of information for them. They needed to contact the team, they even had a link to the channel in Slack to go and talk with them. We had an API layer that was REST based with good documentation from Swagger. It was talking to a containerized Java Spring Boot application, a service that was running in Docker and Kubernetes. Then, at the end, we actually had a MySQL database in RDS.

That really isn’t anything special. It’s pretty basic. Allowed us to catalog things, but really didn’t provide anything other than just centralizing that information. Where the big secret came in was this concept called RoadTests. RoadTests was something that we introduced into our CI system, which is Concourse, and it ran when you actually opened a PR. What this did was it used one of the APIs on our cataloging system and said, is this new artifact that you’re generating in our monorepo? We actually have a monorepo, so that worked to our advantage here. Said, is that service actually registered in our catalog? We use the concept of canonical names.

Those canonical names enforce that, if it’s not registered, we’re actually going to block your PR from getting merged in. You need to go into the catalog, register your service. We made it easy. We didn’t want to actually add too much overhead to developers, so it was a few button clicks. You could register it right there. This actually helped us to maintain and enforce that ownership as we were continuing to develop hundreds of new microservices.

We talked about jobs. Jobs was another issue that we had where we actually did have a whole bunch of jobs that were cataloged, but they were spread across four different instances between regions or environments. They were in this system called Rundeck, which is where we ran a lot of these batch jobs. What we did was we leveraged the Rundeck APIs, and said, we’re going to take a different approach. Instead of actually having these manually be added in as individuals were adding new ones, we already have a system that has all these, but you have to remember which ones to look into. We scheduled a nightly job. We used the Spring Batch framework within our code base, had a few APIs that pulled out the various pieces of information that we felt like it was going to be relevant for our developers.

On a nightly basis, we had a sync. It just synced all of those, put them into our catalog. Really what helped us with this is that we actually developed a way to intelligently classify our ownership. We did it with pretty good accuracy. We had about a 90% accuracy rating on how many we were actually able to classify the ownership with. Then those just ended up in our catalog. If we weren’t able to, we still have this banner at the top, which, as developers were going into the UI, they could go and say, there’s some jobs that actually don’t have ownership. Maybe I should go and classify it. Click on that link, and then they could see all the ones that don’t have owners, and they can manually claim them.

What did we accomplish with this? We knew our problem, services and jobs were unknown. We had unclear ownership. What were the outcomes? One-hundred percent of our services and jobs were now cataloged. We had zero undefined ownership, meaning infrastructure doesn’t own everything now. Service registration was enforced and job catalog was automatically synced. This is where we really started with our first two pillars. Our first two pillars of Showroom were discoverability, the ability to have that consolidated critical information in one place, so our services and jobs catalog. Then our governance, our RoadTests. Those RoadTests allowed us to ensure that we were continuing to maintain that ownership as we were continuing to develop at a pretty rapid pace.

Achieving Critical Mass

That didn’t help us to achieve critical mass. That was all great. It’s a great foundation. If you’re not looking for information about what services or jobs or owners, you’re not really going to be enticed to go into that UI. What did we have to do in order to achieve critical mass? We focused on another problem, still within that decomposition initiative. We had manual checklists that people were going through, either Excel or wiki docs, where they’re saying, have you actually done all of these checks? Are you actually going and incorporating all of these different things before you bring your service to production? We said, we know we can do better. We use that same Spring Batch framework. We introduced something called compliance rules. These compliance rules ran on a nightly basis as well.

It would check things like, have you actually documented your APIs? If you’re in a separate repository, are you using the right repo settings? Or, if you’re in the monorepo, are you actually in the right folder to make it easy to find where your service exists? Is your pipeline configured appropriately? Are you reporting your errors in all of the different environments to our Sentry system? Are you using sane memory defaults or memory settings for Kubernetes? What’s your test coverage like? Are you actually reporting test coverage? Where are your metrics going? Do you have metrics? Are you flying blind on observability? What really made this powerful was the fact that we made it extremely pluggable, so anything that had an API, you could go and easily extend this and introduce a new rule.

That made it so anybody, our own team who’s maintaining this, and even external developers to our team, were able to introduce these as we were going through and trying to come up with things that we were considering more golden path, best practices, things that you want to make sure you’re actually designing for and incorporating before you go into production. This really helped us to focus on that standardization, so now we knew who was actually following the best practices and who were not. You got this compliance score right in this UI. What we found was our developers cared about the score. They thought of it as a little bit of a game. They wanted to get to that 100%. They wanted to get to the high green 90s in here, and that helped to bring a little bit more traffic to our product.

What did this actually accomplish? Integrations were now transparent. Services were automatically scored. No longer did we have to keep these checklists. Production readiness was seen upfront. You didn’t actually have to do this after you were in production and check everything. You got to see this right as you were introducing the service, and the first time that this service was actually going and being run in production hardware. This was an enhancement to our governance pillar. Really making sure that as we’re developing new features into our internal developer portal, are we actually investing into things that we feel like should be there. We were like, yes, this made sense. It’s governance. It wasn’t as much of a hard enforcement as the RoadTests. You still were allowed to merge in. You still were allowed to go into production. We found that this was very useful, because our developers cared about making sure that they could observe their systems, that they had proper logs, as they were going into production.

Next, we started off with a feature called workflows, and this was all about self-service. We had that problem where it was taking very long to get our services into production, and we wanted to make that faster. What we did was we introduced this concept of workflows where it consisted of steps. Propose your service at the beginning and bring it all the way to production in an automated fashion. What we would do is use a Spring Batch framework here as well, so that way you can keep track of the progress as you’re going through all of these different steps, because there’s a lot of things to do as we’re going through here. We’d start off with collecting information. You no longer had to worry about saving your service into the catalog, because this automatically did it for you. If you needed approval from your manager, if you wanted to make sure your service canonical name was actually accurate and that you weren’t going to change that, they would check it upfront.

If you needed additional approvals, we can go and incorporate that. Then, best of all, it would notify others that this new service is being created, so it provided visibility into all of these new services. You want to move forward. Your stuff looks good. You get your approvals. We provided some templates to go and make it a little bit easier, so you didn’t actually have to worry as much as time went on about those platform integrations, because our templates provided a lot of those out of the box. If you’re using our best practices, using our templates, you get a lot of those features automatically. We cloned that template for you. We go and update those variables, set up your development environment, and say, it’s ready to start using it. Start testing it out, make sure it works as you want. Then you can move forward when you’re ready. We’re ready to move forward, so we go into our staging environment.

We automatically generate that pipeline for you. We verify that that pipeline is going to be successful, and we let it deploy into staging. We sync a whole bunch of data, and we’ll talk a little bit more about why that’s important later on. Then say, yes, go and start using staging. Make sure everything looks good. Then when you’re ready, come back and move into production. They come back and say, is your service going to be P1? P1 is priority 1, has a little bit of additional checks that you’re going to need to introduce. If so, if the person tells us, yes, we believe that my service is going to be P1, we’ll add that label for you, and it triggers off a whole bunch of other process. We’ll verify that the production pipeline is set up. We’ll use ourselves to actually deploy your service into production, verify that it’s up and running in Kubernetes. Then if you told us, I needed a database, we’ll actually go and create a database schema for you. Then notify everyone, this brand-new service is in production.

What did this accomplish? We knew that it was complex to introduce new artifacts. We know it required heavy platform integration, and it was very time consuming. We brought service production time from 75 days to introduce a new service down to under 7 days. It was completely self-service: minimal handholding, no tickets. Nobody needed to depend on another team just to get your service in production, fully self-service. It was really great. Helped with developer happiness. You could innovate faster. You can introduce your services into production and get them right and rolling. This is our third pillar. We said, self-service. We wanted to make sure that our internal developer portal actually invested into team autonomy. We said what our mission was: team autonomy, productivity. This allowed for faster iteration. This was our third pillar here, self-serviceability.

Our next main initiative was our data center and cloud migrations. A bit unexpectedly, we ended up having to migrate out of our data center in 2019. However, we knew that wasn’t our long-term play. We knew we wanted to be in the cloud, so we ended up doing a lift and shift model into a new data center to get us there. Then we lifted and shifted again into getting us into AWS. That way we had that time in between 2019 and 2022, when we finally moved to AWS, to really prepare ourselves to do that. Some of our problems that we faced were, we were going to be changing host names quite often, because we were going from data center 1 to data center 2, and then eventually into the cloud.

Our developers used a lot of these bookmarks and things like that, which would help them find their services, but that was going to become stale very quickly. We knew that was going to be a problem. Deployments were very error prone, and we were now going to be deploying in twice as many regions, across multiple data centers. We were going to experience even more issues with human error and actually causing deployments to be complicated. Then, what we realized from our data center 1 to data center 2 migration was we really lacked the ability to have more dynamic and configuration management because host name changes were actually complex. We invested into that in between our second data center to our cloud.

First, we started off with data collection. I talked about bookmarks. Our data collection feature is essentially a centralized bookmark repository that’s visible to everybody. What we did, we leveraged that Spring Batch framework, kicked off a nightly job, or in this case, you could actually run it on demand. You click into a service, into what we call our service details page, and you would go and find that it would collect all of this information for you. Where is my pipeline found? What’s my repo, or what folder am I in within the monorepo? Are there any associated jobs that are connected to the service that I should be aware of? Where do my Sentry errors go? Where am I reporting metrics to? How do I find my logs for all of the different configurations that I have, for all of the different environments, the regions? Where can I find that? What are my host names, my internal host names that I can use to start testing?

Once again, we made this very easy to extend so really anything with an API, we can go and start collecting this information. That wasn’t all. We could collect a whole bunch of information in an automated fashion, but we also had the ability for individuals to go into this page and add some custom bookmarks. Maybe there’s a really important documentation page that we wanted to actually have. What this allowed us to do is have those developers pay it forward. Next time your teammate was looking for that critical information, or a runbook, or what happens when the service goes down, you could have a link to that actual page that goes and tells you, here’s how you can go and start triaging things. Your mind is not really all there when you’re in this emergency situation, but if you know you have the centralized place to go to, you can find all of that critical information. It was really very helpful for our developers.

What did this develop? We knew information was quickly becoming stale. What were the outcomes that we were able to accomplish with this? We automatically collected thousands of relevant links and provided it all in one spot, relevant to the specific service that you wanted to look at. We had all these services, thousands of links, you could search, filter, find which exact one you wanted. It was extremely helpful. No longer had to bookmark things. No longer had to worry about remembering the syntax query for which log statement you were trying to find. It was all just there for you. This was our fourth pillar, transparency. Providing transparency in a single pane of glass for awareness and visibility, and data collection was our first feature of it.

Next was deployments. Deployments, we talked about a lot of human error. What we found was that when people were trying to roll back, sometimes they chose the wrong build. When people were trying to choose build, sometimes they didn’t check if it had actually passed all of our integration tests or all of our different checks that we had, in an automated fashion. What we did was we integrated with GitHub to get all the list of the builds. You could even view your impactful commits. It got even more complicated in monorepo, because in monorepo, when you’re making a commit, it actually needed to be intelligent to know which artifacts are you impacting. We actually had a very intelligent way to determine that. Now developers could click this link, see exactly what changes they’re going to be deploying in that build, less likely to deploy something that they didn’t want to.

They could very easily check integrating with our CI system, has it actually passed all of the different checks? Alex talked a little bit about CarGurus concept called Pit Stop Days. Pit Stop Days is where we have one day about a month, where we really allow developers to brainstorm new ideas and innovate. The funny thing is this project actually started with that. We brainstormed this particular feature and said, this is a big problem. We know that we could do better eliminating this human error, so we invested into a design. We got a team together in our next hackathon that we had, we actually invested into this. We talked about the value that it was going to be providing. We talked about how it could benefit our strategic initiative that we’re investing into. It was extremely successful in the hackathon, we actually got a very functional prototype. Then we were given that buy-in as part of the strategic initiative to go and invest into this.

A developer comes in here, hits deploy. What happens under the covers? Once again, we use Spring Batch, but this time it was a little bit more complicated. We now were living in an environment where you had monolith and microservices. Depending on which one you were in, you actually would use a different system to deploy. You would deploy either through Rundeck or deploy either through Concourse. To the developer, it didn’t matter. We were able to completely abstract that away and provide the exact same developer experience to them, regardless of whether you are working on a monolith, working on a microservice, and then later on, even working into a separate repository. We provided a lot of convenience features. You wanted to see your logs, you know you were deploying a canary server for this service at this time, with this build, that log link dynamically generates it for you. You could see the logs right in the UI.

If anything was looking bad at the end of your deployment, you’d have this rollback button. It’s as simple as that. You would just click, roll back. Picks the build for you, knows exactly what build you were previously on. If everything looks good, you make sure your service is up and running in Kubernetes, you could go and proceed forward. We also had this culture at CarGurus where developers really wanted to get notified through Slack. We have a very Slack heavy culture, so we also integrated with notifications, where, as you were progressing through the various phases, we would notify you with good notifications, custom that we’ve made in Slack about what commits, who’s impacted it, how many commits are going out, with a link back to our service to actually go and help us.

This not only had people familiar with their Slack workflow, but encouraged them to come back to our UI to look at this as a visual status, rather than as a Slack notification. What did this accomplish? We actually eliminated human error during deployments almost wholesale. We found that there was almost no human error because we were able to design it away. Saved us about 7000 developer hours just in the time that it was launched. Really a huge success, and something that we’ll talk about why this was so critical to achieving critical mass.

This is our last pillar. The last pillar was operational efficiency. We really wanted to minimize fragmentation and cognitive load. We didn’t want developers to have to remember to go to all of these different regions to deploy. We really wanted to make sure that they were deploying by just clicking a button. They had the commits that were going out. They knew that upfront. They got to choose which commits were going out, rather than just blindly picking one because it happened to be the latest version, which may or may not have actually passed its checks. Then we provided really good ease for the log statement, so that way you could actually see those right in the UI.

We talked about configuration management. We knew host names were going to be changing. We knew that it wasn’t easy to actually manage these. We went through that painful process, through our first data center migration. What we did was we introduced this concept of configuration management. We introduced a service, primarily through CLI called Mach5. We like our car puns and naming. We had this Mach5 service which did really three things. It managed our environments for us, automated our dev deployments, and actually staging and production, so that way we contained parity. Then it introduced the concept of configuration management, both static and dynamic.

We introduced this UI within our internal developer portal that allowed developers to go in and change their configuration for the things that were static, things that weren’t going to change across the different environments, the different regions that you were running. This was just injected right into your service as we were starting up. The Mach5 service handled that for you. Then we also had dynamic configuration. The dynamic configuration really took away the whole need to even know or care about what your host name was going to be for a particular service in an environment within a region. It just said, I’m deploying in North America. I’m deploying for my production environment. I depend on service x, and it automatically knows what the host name is for service x. Completely obfuscated that away.

For our development workflows, it provided us this opportunity to make it so we had development, staging, and production all deploying in Kubernetes in a very similar way, and having environmental parity as close as we could get. We didn’t get perfect environmental parity, but as close as we could get. That allowed us to find a lot of these issues that prior to this were just, it worked in staging, it worked in development, but now it didn’t work in production. It eliminated a lot of that. Then these environments were fully managed for you, so you didn’t have to worry about it. That’s where we created the second feature here, which is the environmental visibility.

Because Mach5 was primarily CLI based, we had the ability to show visually, what services do you have deployed in your personal namespace? What services does your service depend on? You can click a button, click into your service and say, I’m experiencing an issue with my service right now. Is it actually some of my code that I wrote, or is a service that I depend on actually having an issue at the moment? Visually you could see, there’s a big red dot right there. That service is probably having an issue. Let me actually click into that service and see who owns it, so I can go and talk to them and see if their stable version that I’m depending on is not stable at the moment.

What did this accomplish for us? We had now proper configuration management enabled for all host names to be dynamic. Our static configuration was all centralized, so it was one place. We actually eliminated the pull request process. It was fully self-service as well. We launched three successful migrations, one for North America, one for EU, and then, once again, into AWS. Three successful migrations, which is a pretty huge feat. We didn’t miss anything because we had everything cataloged and we knew what we had to lift and shift. That also was huge. This is where we enhanced two more, as we were going through and developing these new features, we constantly had to ask ourselves, did it align with what our vision was for this internal developer portal? Yes, these did. It provided a self-service ability to configurations. It eliminated that manual process that we had to do to go and approve your pull requests. It provided transparency into your service, so you actually had now visibility into your environment, and you knew exactly what was running, what you were depending on, if that service was having an issue. We provided more insights to those developers.

One of the more recent initiatives that we launched, and we primarily were operating in a monorepo, but we really wanted to move to what we called multi-repo, so multiple repositories. It was 2022, we were officially in the cloud. We had many microservices at this point, but coupling remained to be an issue. We were like, I thought microservices were going to solve everything for us. No, that’s not what happened. We actually did make a good amount of attempts to go and ensure that proper build and compile time was isolated, but it was still proving difficult while everyone was really operating on a single monorepo. We found that more microservices made it difficult to find real-time information. We couldn’t easily create new libraries and repositories. It was complex and time consuming.

Overall, most importantly, it was proving to be very inefficient from a build, deploy, and development perspective, to operate in a single repository. We had this architecture. In monorepo, we introduced this concept that we called embankment, which was really trying to mimic a multi-repo environment in a monorepo. It encouraged us to prevent ourselves from having dependencies on what we called mainline, which was where all of our existing artifacts had lived. It also allowed us to introduce reusable libraries that were more properly versioned. It wasn’t good enough. We wanted to move to what we call this multi-repo, where you had each artifact being produced out of one repository. Those artifacts could depend on each other. Then you could depend on a whole bunch of reusable libraries that are properly SemVered.

We had this real-time information. Services were now further spread apart from each other, and we really wanted to provide a single pane of glass for all the information that you needed. We said, let’s go and integrate with Prometheus to find how much CPU memory your service is using. If we could find your service in Kubernetes, we’d go and tell you how many pods you have. What the status of those pods are. Are they restarting? We also wanted teams to feel empowered to improve their workflows and get more efficient. We actually implemented all four DORA metrics and provided visibility wholesale for all of the microservices, so now they could see what my change failure rate was, what my lead time for change is, and so on. We also allowed you to see, are you on the latest build?

Did somebody check in something and then nobody really remembered to deploy it, because we still were deploying manually. We also cared very deeply about security and quality, so we integrated with our security system and our quality system to go and provide that all upfront. Now you could see, do I have vulnerabilities? Do I have good coverage? Are there bugs that I have that I could go and fix? Really centralizing all of this. Now you had health, DORA metrics, build, security, quality metrics all available and easy to find, all in one place. This once again aligned with our pillars here. We had our governance, which was security analysis, allowed us to go and have that visibility more upfront, shifting further left. We had those statistics which was all about transparency, really reinforcing what we had.

Then we had a reusable workflows framework. This looks very familiar. We had steps and tasks, but what we learned this time was that we could generalize it, so we invested into refactoring this. We made it more robust, easier to extend by having these options to really introduce and ask any questions, collect any information, catalog that information, seek any approvals, execute any type of thing, notify. You could verify that things were actually set up successful. Really, you could run any arbitrary task, and that allowed us to introduce a whole bunch of new workflows for self-service, things like creating a new library, creating a new backend service directly in multi-repo.

New application, that’s what Alex’s talk was all about. All about creating those new applications in Remix. We actually even provided an ease of use to say, during a certain period of time, we’re going to allow you to have an automated way, as automated as we could, to migrate yourself to multi-repo. Then, if you forgot to actually say that you needed a database, or you didn’t know when you were introducing the service, you can now self-service that at any point in time and create a database or create a dashboard for yourself.

This allowed us to actually further enhance that time to production for services and libraries. We went from that 7 days down to 2.5. Library were manually created, taking about an average of 10 days from start to publishing it. We saw that it was taking one day now to introduce a bare bones library that was published. Really, some huge successful wins. Once again, going into our discoverability pillar, really helping us to go and invest into that cataloging. We introduced library cataloging, and then actually team cataloging too. Overall, though, I do want to talk a little bit about what this multi-repo project provided from an outcomes perspective. We helped to accomplish a lot of these with this internal developer portal.

A lot of work was put in across all of the engineers at CarGurus to really invest into this tech modernization. What did we accomplish? Our lead time for changes went down by 60%. Change failure rate dropped to 5% from 25% for our monolith. Build times were 96% faster because we didn’t have to worry about that centralized pipeline that was on the monorepo. Deploy times were on average, 70% faster. Best of all, we found that our developers were actually 220% more efficient. They were happier because they were able to move faster, accomplish more, less roadblocks, less overhead, very powerful for them.

Foundation for the Future

Now we had a foundation. We had these five pillars. These five pillars really allowed us to continue to invest. You’ll see, I’ve added a few more here that I haven’t talked about in our talk, but we had a whole bunch of different features that we’ve invested into that really kept aligning with these five pillars. We’ve stayed true to that to continue to do that. We’re not just adding everything because we have a centralized portal, but because it aligns with our mission and our vision of providing that autonomy and providing that productivity boost for our developers.

I also want to talk a little bit about what we’re currently working on. We’re currently working on another initiative called time to market acceleration. The problems here is, yes, it’s 60% faster to get services into production, but it still takes days from commit to production. Quality issues are often found too late in our development cycle. Deploying to production is still manual. You still have to click a few buttons to do it. We plan on heavily leveraging our compliance rules to determine if you’re actually ready for CI/CD. We plan to leverage labels within our catalogs to track the migration of who’s moving over to this new full CI/CD model, which is the goal of this project.

Then continuing to provide a great experience by having that single pane of glass, regardless of what type of deployment model you’re in. The outcomes we predict here are getting our lead time to changes down to under 60 minutes. Maintaining a change failure rate of 5% despite moving multiple times faster. We’re hoping to lower our defect escape rate. Then have an improved developer efficiency by eliminating that manual deploy step that we have to do at the end.

Secondarily, we talked about how we lifted and shifted into the cloud from our data center. Not great, but it helped us do it very quickly and very successfully. Another initiative that we’re launching is cloud maturity. We’re operating fully in AWS, but we’re not really fully leveraging all the offerings that we have. Our services are not actually always built with the cloud-first mindset, so we can do better there. It’s actually difficult to understand the cost implications. It’s hard to understand the cost implications of a design decision now that we’re operating in the cloud. We plan to use our catalog to know what’s available, so you can reuse stuff instead of developing net-new. We plan to invest more into our workflows to help us self-service infrastructure provisioning, making it easier for developers, while still providing that 20% for those power users who want it.

Data collection and real-time statistics to provide cost transparency, hopefully even upfront, although we learned how difficult that might be. Then integrations with our catalog to ensure that we’re doing proper cost attribution as we’re investing into more cloud offerings. We hope to accomplish faster adoption of cloud features, more services built with a cloud-first mindset, cost transparency upfront, which hopefully should overall reduce the cost of operating in AWS. Faster time to market, by easier provisioning of that infrastructure. Then, overall, once again, our goal is always improved developer efficiency and experience.

The big question we always get, though, is, what would you have done differently? There’s two things, find that daily feature sooner. It really wasn’t until we released that deployments feature that we achieved critical mass. Because that is a daily activity that developers had to do, so it provided them to go into that experience in UI every day. My recommendation would be, find that feature that makes sense to invest into, that’s going to drive traffic in there on a daily basis. I still strongly believe that the right foundation is starting with those catalogs, because that’s how you’re going to actually know about all of those systems and provide that value. I think getting to that daily feature sooner is really important.

Secondarily, minimize the usage of team names. Teams change. What we found in our experience was that service canonical name, which we embedded everywhere, was very likely to not change. It actually stayed pretty consistent. Whereas teams changed, a reorg happens. People change their team names. They shift under different managers, and all of a sudden you find that your infrastructure where you’re organizing things is out of sync with your actual catalog and system of record. I’d recommend, really lean into service canonical names and minimize your dependency on team canonical names, so that way it’s just easier. This is everything from Kubernetes to even just how you organize things in folders.

Questions and Answers

Participant 1: The numbers and the outcomes that you’ve shown were nothing short of amazing, the testimonials too. In hindsight, everything is 20/20. What was your process to handle pushback, especially in the beginning of this process?

Fodera: Early on, we started little. We actually only had one developer working on this for, I think, all of 2019. We had some spot assistance from a frontend developer to help us. If you’re getting resistance from investment into this, how did we continue to do this? I think that it’s really important to start small. Don’t try to sit there and say, this is a six-person team. We’re going to invest out of it all at the gate. I need a couple million dollars to do this. That’s not going to win. What we found was, leverage our innovation days. I talked about how we used the hackathon to prove the value of how important it is to eliminate human error.

Also, piggybacking off of the initiatives that the investment’s already being made into, and showing how you can accelerate those initiatives. If you can show that we’re already investing into a data center migration. If you go and approach your leadership and say, I can make that a lot smoother, higher chance of success or faster by investing into this feature in parallel, you’ll help with getting that investment.

Participant 2: In one of the slides, you show that developer productivity increased by 220%. How do you measure that?

Fodera: We did leverage the DORA metrics pretty heavily to show, from a flow perspective, as a team, how much faster you’re working. We got a lot of developer testimonials as well, which, from a qualitative perspective, would allow us to do that. What we found was that if given the exact same task that you needed to do in the monolith or even monorepo, versus that exact same task having to be done in a multi-repo service, they were able to do it about two times more quickly. That was pretty much how we leaned into it. A lot of that was eliminating what we call developer waste. We also outlined generally, what it would look like working on a feature in a sprint, and how much faster could you accomplish that with removing a lot of that developer waste.

Participant 3: How do you incorporate operations into this? When I say operations, I’m talking about infrastructure, infrastructure of services, architecture. Do you incorporate any service templates or architecture templates into this developer platform that speeds up the teams?

Fodera: We have the advantage that our team is part of our platform and infrastructure team, so we sit very closely with a peer of mine who runs more of the cloud infrastructure, so I’m constantly collaborating with them. I think that collaboration helps us really go in lockstep. I think what was really most important is that, like our templates, we did ensure that we had all those integrations out of the box. As we were having those templates be created, we made sure that it worked well from an infrastructure perspective. Really staying in lockstep: he’s a peer of mine, and he works very closely with me. I think that also helped from that perspective, from an organizational perspective, where we were set up for success.

Participant 4: I noticed that you showed a lot of UI based tools. However, I also know that infrastructure as code is important, especially when it comes to deployments and configuration, in which cases did you use the UI or infrastructure as code, and how did you combine the two?

Fodera: The cloud maturity one is an initiative that we’re actively working on, and that is a question that actually comes up. There’s actually a great talk by another company that talks about how you want to go with that 80/20 rule. What we found, and this was actually still true with our developers as we’re working with them. One, talk with your customers, who are your developers, in this case, and see what they want. It’s not going to be a one-size-fits-all for all companies.

What we found at CarGurus was that about 80% of people just wanted few button clicks to go and introduce a new service or get some database or whatever, and they really didn’t want to have to worry about learning Terraform, which is what we’re using under the covers for infrastructure provisioning. Lean into that 80/20 rule: 80%, whatever your 80% of your customers want, cater to that. If they want the UI, lean into that. If not, go with that approach of providing them the ability to self-service. If you have a company that everybody knows Terraform, probably not worth abstracting it away with the UI.

Participant 5: A part of your journey was migration from a monolith to microservices. Can you tell us a little bit more about your journey and what went well, and lessons learned? What recommendation could you give to other people who are going through this journey right now.

Fodera: Actually, the vertical slice model that I showed, that actually didn’t work well. I actually have a blog post that talks about how we failed a few times in our microservice journey, on cargurus.dev. It actually talks about our journey specifically for the monolith to microservices. The vertical slice approach didn’t work. That was trying to actually go and make it so we could vertically slice, detangle that big ball of mud. That actually proved to be very inefficient. That’s where we started with more of the strangler fig pattern, where we made it very easy to introduce new services instead of trying to detangle the existing.

Then we tried to enforce a culture where we said, as you’re introducing new features, do you actually need to introduce it into the monolith, or could you introduce it as a new service? Then we started with backend services only. That worked really well. Then we used the embankment approach that I talked about to help with the frontend services, and that helped a little bit. Then our shift to multi-repo, where we invested into a Remix template, was really that solidifying factor to help us decouple from a frontend perspective.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.