November 2023 - Page 7 of 23 - Mobile Monitoring Solutions

Uncategorized

Google AlloyDB Omni: PostgreSQL-Compatible Database for On-Premises and VMware Cloud Foundation

MMS • Renato Losio

Google Cloud recently announced the general availability of AlloyDB Omni, a downloadable version of the PostgreSQL-compatible database service AlloyDB. The new version is designed to run on-premises, has built-in support for generative AI, and will be available on VMware Cloud Foundation.

Initially introduced in preview last March, AlloyDB Omni maintains compatibility with PostgreSQL, enabling customers to deploy a containerized AlloyDB database engine in their Linux-based computing environment. Although it leverages the same engine as AlloyDB for PostgreSQL, it offers a subset of functionalities, including the index advisor, the AlloyDB columnar engine, automatic memory management, and adaptive autovacuum of stale data.

AlloyDB Omni operates within a container, which is installed using a command-line program provided by Google, and it does not support features that rely on operation within Google Cloud. However, the new option supports vector embeddings and AI, taking advantage of the recently announced AlloyDB AI, a set of capabilities separately covered on InfoQ. Kevin Jernigan, senior product manager at Google, explains:

At GA, AlloyDB Omni includes support for AlloyDB AI, an integrated set of capabilities built into AlloyDB for PostgreSQL, to help developers build enterprise-grade gen AI applications using their operational data. AlloyDB AI delivers 10x faster vector queries than standard PostgreSQL, easy embeddings generation using SQL, and integrations with the Google Cloud AI ecosystem including the Vertex AI Model Garden and open-source gen AI tools.

Following the GA of AlloyDB Omni, the cloud provider announced the preview of the AlloyDB Omni Kubernetes operator, which simplifies database provisioning, backups, secure connectivity, and observability, allowing customers to run AlloyDB Omni in most CNCF-compliant Kubernetes environments.

At the recent VMware Explore in Barcelona, Google Cloud and VMware announced a partnership to bring AlloyDB Omni to VMware Cloud Foundation, starting with on-premises private clouds. Thomas Kurian, CEO of Google Cloud, writes:

We’re expanding our partnership with VMware to deliver Google Cloud’s AlloyDB Omni database on VMware Cloud Foundation, making it even easier for customers to modernize their workloads and build gen AI applications.

Taylor Justin Stacey, senior solutions engineer at SADA, highlights the value of the developer edition:

What’s extra cool about Omni is there’s a quick way to answer, “Is AlloyDB worth the switch?” Set it up with the developer edition (for free) then test your queries with the google_columnar_engine.enabled flag set to on or off. See how the performance compares.

For production usage, AlloyDB Omni is available with a monthly subscription starting at 1295 USD per month for a 16-vCPU, with 100-vCPU blocks for 6995 USD per month and discounts available for 1-year and 3-year commitments. Jernigan adds:

The vCPUs in each pack can be used across multiple servers or virtual machines as needed; there is no requirement to use all of the vCPUs in a pack on a set number of servers, nor on a single server.

AlloyDB Omni offers enterprise support and software updates for security patches and new features.

About the Author

Renato Losio

Show moreShow less

Uncategorized

KubeCon NA 2023: Ishan Sharma on Real-Time Generative AI for Gaming Apps Running on Kubernetes

MMS • Srini Penchikala

Kubernetes provides a great platform for applications using generative artificial intelligence (GenAI) for both game development and gameplay. Ishan Sharma from Google spoke at the recent KubeCon CloudNativeCon NA 2023 Conference about real-time GenAI inference integrated with distributed game servers running on Kubernetes.

With the launch of ChatGPT and Bard the term GenAI has become mainstream, not just in the technical community. Over the last decade AI and ML technologies have been steadily improving and AI has been beating humans in perception tests in domains such as handwriting recognition, speech recognition, image recognition, reading comprehension and language understanding. Generative capabilities of AI have also improved a lot in the last 9 years. From generating very pixelated black and white images in 2014 to very realistic images in just three years (2017) and by 2021 it was possible for text-to-image generation using prompts, AI has come a long way.

GenAI offers a lot of support for online gaming based applications. With the help of a chart showing global GenAI prediction in the gaming market from 2022-2032, Sharma said GenAI is being used in game development use cases first and will be eclipsed by new game experiences such as smart non-player characters, level generation, image enhancement, scenarios and stories. In game development, the applications of GenAI are boundless: creating art assets, auto-generating game code, life-like conversations with bots, and generating levels from player input.

Generative AI is evolving the games industry and will transform live service games into living games. From boxed software games in the past to live service games today to evolve in the near future into what are called the living games. In living games, three aspects – Developer, Game, and Player – will interact with each other to enrich the user experience. Here, the game developers will need to develop AI responsibly and safely by protecting the intellectual property while at the same time respecting the player’s privacy and safety.

Classification of GenAI use cases in games includes two categories: improving productivity during game development and improving player experience during gameplay.

In the game development phase, we can use GenAI to accelerate time-to-launch and time-to-market by creating content and simplifying development. This includes development of game assets such as characters, props, audio and video. Turnkey APIs like VertexAI, Amazon’s Sagemaker, and ChatGPT can help in this category.

In the second category, run-time gameplay phase, we can use AI/ML & GenAI to adapt the gameplay and empower players to generate game content in real time. These capabilities include smart NPCs (bots), dynamic in-game content, and customized player experiences. GenAI during gameplay brings demanding requirements like low latency, high performance, fast scalability, and low cost. Runtime gameplay environment can use platforms like Google Kubernetes Engine (GKE) to host the gaming apps.

Based on user research that his team conducted across SME’s in the gaming industry, Sharma discussed user pain points for GenAI in games in three different categories: platform, AI maturity, and Gameplay. In the platform category, we need at-scale cost efficiency to ensure financial feasibility for popular (AAA) games. Also, for a seamless player experience, low latency and lag are essential to ensure smooth gameplay. Lag can hurt the success of games where even sub-second latency is not acceptable. And the platforms with performance, & access to run state-of-art models without vendor lock-in will drive the platform decisions.

For the pain points in the AI maturity category, LLM Unpredictability is a big concern. We need a coherent, relevant, and contextually appropriate inference over and over again that’s repeatable. The models should not promote AI biases and stereotypes. Content filtering and moderation is needed to ensure safe & inclusive gameplay environment for the players.

In the third category of gameplay, we need to balance user generated content with game lore & structure (creativity). Some games need content for gameplay which LLMs filter out so we need to keep in mind the GenAI constraints. Also, procedural generation with GenAI still requires human supervision in the near future as we continue to evolve with GenAI and LLM’s.

Sharma mentioned Kubernetes is a good computing solution for games as it solves majority of the IT operations problems like scheduling, health-checking, deployment methods, autoscaling and rollbacks, centralized logging & monitoring, declarative paradigm and primitives for isolation. But the challenge is that Kubernetes, on its own, does not understand how game servers work. For game servers, we need additional capabilities like maintaining in-memory state, starting and shutting down game servers on demand and protecting the running servers from shutting down (even for upgrades!) will result in poor player experience.

Agones open source framework can help with these game server scaling and orchestration requirements. It was developed in 2017 with a partnership between Google and Ubisoft. Agones makes it possible to get all the benefits of Kubernetes operations, but now for game servers as well, including better understanding of game matches and sessions, seamless scaling with player loads, multiple UDP/TCP ports per node and hot-spares with tunable warm-up parameters.

Sharma discussed the high-level architecture of a live service game with a use case of multi-player based game session. Core components of the solution like Game Frontend, Matchmaker Service to direct the player to connect to a dedicated server where they can connect with other players in a shared environment and shared experience, and Player Profile Service, can all be hosted on a Kubernetes cluster. Game servers also run on K8s and are orchestrated by Agones.

When it comes to integrating GenAI inference with gameservers, development teams have a few different options. Similar to game development options, turnkey solutions like VertexAI, Sagemaker, and Stable Diffusion API can be used for gameplay environments. Second approach is a DIY solution with k8s where dedicated GenAI Inference servers would run on Kubernetes Nodes. These servers can leverage infrastructure hardware options like GPU’s or high-performance CPU’s. Another approach is to run GenAI inference servers as sidecar components within the same pod where a dedicated inference server is needed for each Game Server. The underlying hardware is optimal for both Agones Game Server and the GenAI Inference Server. The teams should find the right balance between raw performance and cost when choosing any of these options.

He discussed advantages of different options in integrating GenAI inference with game servers. Advantages of using a turnkey solution include out of the box game development use-cases, improving time-to-value, and some specific models are only available through Turnkey APIs, not openly available where you can containerize them.

DIY solution with Kubernetes for GenAI in games include openly available models that can run in containers, k8s can be more cost-effective than pay-per-use APIs for high usage scenarios (game launches where you see an influx of a lot of concurrent users in a short amount of time). Also, dedicated inference k8s nodes are easy to set up with k8s features such as horizontal pod autoscaling (HPA), scheduling with taints/tolerations.

They ran some tests using Stable Diffusion (for image generation) and Bloom (text generation). A slight better performance was observed when using the sidecars. In general, inference latency overpowers any difference between different Kubernetes deployment methods. Dedicated inference k8s nodes provide the most versatility, ease of use, and flexibility.

In his conclusion of the talk, Sharma highlighted the advantages of using Kubernetes for GenAI in games, in the areas of portability, flexibility, scalability & performance, and cost & efficiency. There is also a decent ecosystem of frameworks from which to choose, which includes frameworks like Spark, Beam, Dask, Ray, Rapids, and XGBoost.

Sharma ended the presentation with a demo of integrated GenAI into a multiplayer game with real-time image generation. Demo app is hosted on GKE’s GenAI inference cluster on Google Cloud and uses a dedicated nodes option. There is a GenAI API component that routes traffic to different models. The logic layer consists of NPC for test pre/post processing for dialog, image generation logic that handles image pre/post processing, and VertexAI services for LLM pre/post processing and talks to VertexAI LLM endpoints. In terms of models, LLAMA 2 model was used for text generation and Stable Diffusion was used for image generation.

For more information on KubeCon NA 2023, check out the conference website and the complete program schedule as well as Data and AI/ML specific session catalog.

About the Author

Srini Penchikala

Show moreShow less

Uncategorized

MongoDB, Inc. (NASDAQ:MDB) Given Average Rating of “Moderate Buy” by Brokerages

MMS • RSS

Shares of MongoDB, Inc. (NASDAQ:MDB – Get Free Report) have been assigned an average recommendation of “Moderate Buy” from the twenty-seven brokerages that are currently covering the firm, Marketbeat.com reports. One research analyst has rated the stock with a sell recommendation, two have given a hold recommendation and twenty-four have assigned a buy recommendation to the company. The average 12 month target price among brokers that have issued a report on the stock in the last year is $419.74.

A number of equities research analysts have issued reports on MDB shares. Oppenheimer increased their price target on MongoDB from $430.00 to $480.00 and gave the stock an “outperform” rating in a research note on Friday, September 1st. Piper Sandler increased their price objective on shares of MongoDB from $400.00 to $425.00 and gave the company an “overweight” rating in a report on Friday, September 1st. Mizuho increased their price target on shares of MongoDB from $240.00 to $260.00 in a report on Friday, September 1st. Scotiabank initiated coverage on shares of MongoDB in a report on Tuesday, October 10th. They issued a “sector perform” rating and a $335.00 price target for the company. Finally, JMP Securities increased their price target on shares of MongoDB from $425.00 to $440.00 and gave the company a “market outperform” rating in a report on Friday, September 1st.

View Our Latest Stock Analysis on MongoDB

MongoDB Stock Down 0.5 %

NASDAQ MDB opened at $405.51 on Wednesday. The company has a debt-to-equity ratio of 1.29, a current ratio of 4.48 and a quick ratio of 4.48. The stock has a market capitalization of $28.93 billion, a PE ratio of -117.20 and a beta of 1.16. MongoDB has a 1 year low of $137.70 and a 1 year high of $439.00. The business’s 50-day moving average is $355.28 and its 200 day moving average is $361.23.

MongoDB (NASDAQ:MDB – Get Free Report) last issued its quarterly earnings data on Thursday, August 31st. The company reported ($0.63) EPS for the quarter, topping the consensus estimate of ($0.70) by $0.07. MongoDB had a negative net margin of 16.21% and a negative return on equity of 29.69%. The business had revenue of $423.79 million during the quarter, compared to analyst estimates of $389.93 million. Sell-side analysts forecast that MongoDB will post -2.17 earnings per share for the current year.

Insiders Place Their Bets

In other MongoDB news, Director Dwight A. Merriman sold 6,000 shares of the business’s stock in a transaction on Tuesday, September 5th. The stock was sold at an average price of $389.50, for a total value of $2,337,000.00. Following the sale, the director now directly owns 1,201,159 shares in the company, valued at $467,851,430.50. The sale was disclosed in a filing with the SEC, which is accessible through this link. In other news, Director Dwight A. Merriman sold 6,000 shares of the company’s stock in a transaction dated Tuesday, September 5th. The stock was sold at an average price of $389.50, for a total value of $2,337,000.00. Following the completion of the transaction, the director now directly owns 1,201,159 shares in the company, valued at approximately $467,851,430.50. The transaction was disclosed in a document filed with the Securities & Exchange Commission, which is available through the SEC website. Also, CAO Thomas Bull sold 518 shares of the stock in a transaction dated Monday, October 2nd. The shares were sold at an average price of $342.41, for a total value of $177,368.38. Following the sale, the chief accounting officer now owns 16,672 shares of the company’s stock, valued at $5,708,659.52. The disclosure for this sale can be found here. Over the last quarter, insiders have sold 289,484 shares of company stock valued at $101,547,167. Insiders own 4.80% of the company’s stock.

Hedge Funds Weigh In On MongoDB

Large investors have recently made changes to their positions in the business. Jacobs Levy Equity Management Inc. bought a new position in shares of MongoDB during the third quarter valued at approximately $2,453,000. Creative Planning boosted its position in MongoDB by 2.3% during the 3rd quarter. Creative Planning now owns 5,139 shares of the company’s stock valued at $1,777,000 after acquiring an additional 114 shares in the last quarter. Jag Capital Management LLC bought a new stake in MongoDB during the 3rd quarter valued at $424,000. Mercer Global Advisors Inc. ADV boosted its position in MongoDB by 10.1% during the 3rd quarter. Mercer Global Advisors Inc. ADV now owns 11,521 shares of the company’s stock valued at $3,985,000 after acquiring an additional 1,060 shares in the last quarter. Finally, Toroso Investments LLC lifted its holdings in shares of MongoDB by 10.3% during the 3rd quarter. Toroso Investments LLC now owns 3,010 shares of the company’s stock valued at $1,041,000 after purchasing an additional 281 shares during the last quarter. 88.89% of the stock is currently owned by hedge funds and other institutional investors.

MongoDB Company Profile

(Get Free Report

MongoDB, Inc provides general purpose database platform worldwide. The company offers MongoDB Atlas, a hosted multi-cloud database-as-a-service solution; MongoDB Enterprise Advanced, a commercial database server for enterprise customers to run in the cloud, on-premise, or in a hybrid environment; and Community Server, a free-to-download version of its database, which includes the functionality that developers need to get started with MongoDB.

Google Announces New DeepMind Model, Lyria, to Generate High-Quality Music

MMS • Sergio De Simone

Google has introduced Google DeepMind’s Lyria, an AI music generation model able to generate vocals, lyrics, and background tracks mimicking the style of popular artists. The model is experimentally available on YouTube through two distinct AI experiments.

One of the experiments, Dream Track, aims to let creators generate short music clips using the voices and styles of collaborating artists such as Alec Benjamin, Charlie Puth, Charli XCX, Demi Lovato, John Legend, Sia, T-Pain, Troye Sivan, and Papoose.

Dream Track users can simply enter a topic and choose an artist from the carousel to generate a 30 second soundtrack for their Short. Using our Lyria model, Dream Track simultaneously generates the lyrics, backing track, and AI-generated voice in the style of the participating artist selected.

The second experiment, named Music AI tools, aims to expand the possibilities of using AI throughout the creative process.

With our music AI tools, users can create new music or instrumental sections from scratch, transform audio from one music style or instrument to another, and create instrumental and vocal accompaniments.

For example, says Google, Lyria can transform a melody you hum into a saxophone line, MIDI chords into a vocal choir, or add background instrumental music to a vocal track. Lyria also supports specifying the elements that should go into the generated piece, such as instruments, effects, and techniques, using a textual prompt.

Alongside Lyria’s music generation capabilities, Google is also exploring new ways to make AI creation deployment socially responsible. Specifically, Lyria will watermark any content it generates by embedding the watermark directly into the generated waveform using SynthID. Currently in beta, SynthID is a tool for watermarking and identifying AI-generated content.

With this tool, users can embed a digital watermark directly into AI-generated images or audio they create. This watermark is imperceptible to humans, but detectable for identification.

SynthID uses two deep learning models, one for watermarking and the other for identifying. While SynthID is only the first step in the direction towards making AI artifacts traceable, in Google’s view it responds to a critical need when it comes to being able to identify AI-generating content to protect against misinformation.

Lyria is still experimental and very much a work in progress. Google says that early Music AI Incubator participants will be able to test it out later this year.

About the Author

Sergio De Simone

Show moreShow less

Uncategorized

Presentation: Providing a Personalized Experience to Millions of Users @BBC

MMS • Manisha Lopes

Transcript

Lopes: Let’s play out a scenario. Imagine yourself going to an online streaming platform of your choice, be it BBC iPlayer, Netflix, Disney, or any others. Here’s the catch, imagine that none of these platforms support user personalization. What does that mean? It means that when you log into any of these platforms, you need to remember the list of the programs that you’ve been watching. I don’t know about you, but I tend to have at least two or three programs in play at any given time on any of these platforms. Many of these programs are available only on a certain platform.

Even if you had a very photographic memory to remember the exact mapping of the program to the platform, what about the watching progress? Would you really remember that? Imagine how frustrating that would be. Let’s imagine there’s another online streaming platform in the market, which remembers the list of the programs that you’ve been watching, along with your watching progress.

Which platform would you prefer? It would be a no-brainer for me to go to the new one. In this simple scenario, we have seen how important personalization is to all of us. No industry has remained immune to it, be it groceries, drugstore websites, travel, leisure, you name it. The very fact that all of you are sitting here in this audience, again goes to emphasize how important personalization is to us.

Background, and Outline

I’m Manisha Lopes. I’m a Principal Software Engineer at the BBC. I’ll be talking about providing a personalized experience to millions of users at BBC. First, we’ll look at what is personalization. In here, I will introduce you to some of the BBC products, and that will help you to get the context for the talk.

Next, we will look at the architecture and the integration of the system, introducing you to different services that we make use of in our personalization product. We will then look at the different steps that we’ve taken to improve the performance and the capacity of personalization product. We will then look at the tooling that has enabled and helped us through this journey. Finally, we’ll talk about how to manage your stakeholder expectations.

What Is Personalization?

What is personalization? Personalization is a process that enables businesses to tailor customer journeys and experiences to the needs and preferences of each customer. Businesses and products that provide a personalized experience to the customer help in saving the time and the effort that the customer spends in finding what they want, thereby increasing the experience and the satisfaction of the customer with that product.

Here we have some of the BBC products listed. Most of you might be familiar with the BBC News which is quite popular. BBC does have other products like the iPlayer, which is our live streaming service for video where you can watch BBC programs on-demand as well as live events. Sounds is the live streaming service for radio, and you can also listen to podcasts. We have Sports, Weather, Bitesize.

We have other products like CBeebies, which is dedicated to children. BBC is a public service, and whatever we do has to benefit you, our end users, and to provide more of what you love we need to understand you better. To do that, we ask our users to sign in to some of our products and services.

The data that you share with us is never sold or shared with any third-party product. It is used to drive a more personal BBC for you, be it recommending programs, or showing you content that is relevant to you. As the BBC, and as a public service, we need to ensure that we make something for everyone, thus making a better BBC for everyone. If you’d like to read more about the BBC privacy promise, I have a link provided.

User activity service, also called UAS, is a highly performant, real-time service that remembers a person’s activity while they’re making use of the BBC. It allows for the collection of the user’s activities and the retrieval of those activities to drive or provide a personalized experience to that user. On the right-hand side, I have a screenshot of the iPlayer homepage. The second rail here, the continue watching, shows you the list of the programs that are being watched by this user along with their watching progress.

To put things in context, we receive around 15,000 to 30,000 transactions per second in UAS. Approximately 150 million activities are stored in a day. Approximately 75 user experiences are powered across different products by UAS. These are the survey results of a survey conducted by Ofcom, that shows the top 10 online video services used by the UK audiences. As you can see, at the very top we have iPlayer at 25%. The second slot we have Prime Video at 18%. The third, we have Sky Sports at 11%. You can really see that we do have a lot of people coming to the BBC iPlayer.

System Integration and Architecture

Let’s now look at the system integration and the architecture. It is a very simple integration, as you can see, between iPlayer and user activity service, where iPlayer web, TV, and mobile interact with UAS via the iPlayer backend. Diving a little further into the UAS architecture. On the right-hand side, we have clients such as iPlayer, Sounds, and the others. There is a common point of entry into UAS, the gateway.

Depending upon HTTP work, the requests are then directed across to the appropriate path. UAS is a microservice based architecture hosted on virtual machines. You make use of topics, and queues, and also a NoSQL database. As you might have noticed, we have a synchronous path and an asynchronous path. Where the user lands on the iPlayer homepage, there’s a request sent to read the user’s activities, or to read by the synchronous path. However, when a user, for example, watches something on iPlayer, the user’s activities will be sent from iPlayer to UAS, which will be eventually persisted into a NoSQL DynamoDB.

Steps for Improving System Performance

When we talk about the steps for improving the performance and capacity of our system, it is basically a three-step process. You need to first identify what are the bottlenecks in the system in order to then go and fix them. It’s an iterative process. We first identify the bottlenecks. We then go and fix them. Then, again, you do some testing to see whether the fixing of those bottlenecks have actually made any impact.

Have they improved the performance of the system? This is something that we have been doing over the years. Finally, we’ll be looking at aligning your data model. As we know, data is central to drive personalization. It’s very important to ensure that your data model is in alignment with access patterns of the data. Here I’ll be sharing with you the different levers that we have used, and also, how did we go about aligning our data model with access patterns?

I did say initially that we have been iterating over making UAS better or improving the performance of UAS for quite some time. Always, you have some surprises. That surprise came to us in the form of lockdown. This is the time when people were confined to their homes with a lot of time on their hands, and with that extra time on their hands, we saw people coming to the iPlayer in record numbers. BBC iPlayer’s biggest ever day was May 10th, and this was the prime minister’s statement regarding the lockdown.

To, again, show you the data, when we looked at the data, we realized that it was almost 61% higher than the same 7-week period the previous year. What happened on 10th of May, when people came to us in record numbers? iPlayer crashed. We had a momentary incident on iPlayer. When we looked at the iPlayer backend external dependencies, this one shows you the interactions with user activity service. You can see around about 7 p.m., there were a lot of errors coming out of UAS.

Digging a little deeper into UAS, we saw there was a huge spike on our DynamoDB. This dashboard shows you the read capacity on one of our tables. The blue data points indicate the consumed capacity. The red line indicates the provisioned capacity or the max threshold that is available at that particular point in time. As you can see, there’s a big spike around 7 p.m., which resulted in UAS throwing errors, and those errors cascaded to iPlayer which resulted in the crashing of iPlayer.

The very first one I will talk about here is simulation exercises. We had an incident on 10th of May. What we did on the back of the incident is, iPlayer backend and UAS had a simulation game. These are nothing but exercises that are carried for the purpose of training, and they’re designed to imitate a real incident. What we did in this simulation exercise is we replicated the 10th May incident. We knew that the bottleneck or the problem for that incident was the capacity available on our DynamoDB.

Once we replicated the incident, we then increased the capacity on our DynamoDB, and we really saw the problems resolved and iPlayer was up and running. Performance testing is another tool that we extensively use for identifying bottlenecks. It is a testing practice wherein you push some load onto a system and then see how the system performs in terms of responsiveness and stability. It helps to investigate and measure stability, reliability, and resource usage. It’s a very effective tool to identify the bottlenecks.

Again, as I was saying, it’s an iterative process. You identify the bottlenecks. You resolve it. You then run the performance test to check whether that bottleneck has indeed been resolved. Another thing to call out is, although you might see there are two or three bottlenecks, it’s always better to make one change at a time rather than making more than one change and not knowing what has boosted and what has affected the performance for them.

We’re going to be talking about the different levers that have helped us and the levers that we have tuned on our application. The very first one is a circuit breaker. This is a very common design pattern in microservices. It is based on the concept of an electrical circuit. If you have a service 1 and a service 2, service 1 will keep sending requests to service 2 as long as service 2 responds with successful responses.

If service 2 starts failing and the failure rate reaches the failure threshold in service 1, service 1 will break the connection or open the connection from service 2. After some predefined period of time, service 1 then will again start sending a few requests to service 2, to check whether service 2 has been restored. If it responds with success, service 1 will then close the connection with service 2. I would like to call out that when we had the incident on 10th of May, we did have a circuit breaker in place in iPlayer.

However, there was a misconfiguration due to which it did not get fired. It’s important to have the circuit breaker but equally important to ensure that it’s up and running. It’s very important to test it and see that it’s actually ok. Now that we have a circuit breaker properly configured, rather than the iPlayer crashing at a time when UAS result in errors, we now provide a degraded service or a non-personalized service to our end users.

Retry requests. In most of our services, we do have a retry with exponential backoff, and that means retrying with increasing waiting times between the retries. Retry request is suitable for those requests which would have failed due to, for example, network glitch. It’s important to remember to set a threshold for the number of retries because you do not want to try indefinitely. Also, important to distinguish between triable errors and non-triable errors.

Triable error, an example would be a network glitch. However, non-triable error would be something like trying to access something for which a user does not have permissions, or sending in a request with an incorrect payload. Batch APIs. This is a nice figure to explain the concept of batch APIs.

Rather than sending single and separate requests to the service, and the service then processing them separately and sending you separate responses, we found that it is much more performant if you batch those requests together, send them to the service. The service will then service them for a system as a batch and then provide you a batch response. Again, it depends upon your use case, and whether it makes sense to have batch APIs for that particular use case.

Web server tuning. Of course, what levers are available depends upon the web server that you’re using. In most cases, once you upgrade to the next stable version, it is bound to give you a performance boost. We make use of Apache Tomcat. When we did upgrade to the next stable version, it definitely gave us a performance boost. The other thing that we changed was we played around with the number of threads available in Tomcat. Once we had that optimal setting in place, it definitely boosted our performance. You might want to check what levers are available for the web server you’re making use of. Token validation, this might be a little subjective.

To put things in perspective, when iPlayer makes a request to UAS, we receive a token as part of that request. The token is used to validate whether it is a valid user request. Historically, UAS made use of online token validation. When we got a request from iPlayer, that token was then sent to another service that, say, is a part of another team, and let’s call it the authentication service. The token was then validated by the authentication service and then passed back to UAS, and then the flow carried out.

After a lot of deliberation, of course, doing threat modeling and checking the security implications, we decided to move this authentication service in-house into UAS, now called offline token validation, where the token is validated and the user authenticated inside UAS. Thus, giving us no coupling, increased reliability because we are no longer dependent on the other authentication service. Of course, low latency because there’s one less hop. It’s definitely worthwhile calling out that changing this would require you doing a threat model, and considering the security implications of this change.

Load balancer migration. Again, you need to check what levers or what configurations are available for you to tune a load balancer. UAS is hosted in AWS. When we started off, we started off with a classic load balancer, which was great, but it was good for handling traffic that grew gradually or organically, but it wasn’t great at handling spiky traffic. Once we migrated to the application load balancer and the network load balancer, it gave us a capacity to handle spiky traffic really well.

General compute, if your performance tests do highlight a bottleneck in your instances, then the decision would be whether you want to scale horizontally or vertically. By horizontal scaling, it means increasing the number of instances in your fleet. Vertical scaling implies going to the next bigger instance. We also migrated to new generation instances, and we did find that this migration to new generation instances was not only performant, but it also was cost effective.

Reviewing your autoscaling policies. Autoscaling is a process that constantly monitors your application, and will readjust or automatically readjust the capacity of your application to provide that stability required to be available and reliable. In autoscaling policies, you have a criteria, and you make use of a metric. You might want to review which metric would make sense in your case. Again, if you make use of a metric like CPU utilization, then you have different thresholds.

Like you have, what is the value of the CPU utilization threshold? If it’s at 60%, and if we want to have a more aggressive scaling, you might want to reduce it and say 50%. Or, what is the evaluation period? How long does that autoscaling criteria need to be satisfied? Or, you could have aggressive scaling in the sense of number of instances added when that criteria is satisfied. Do you just want one, or you want to it to be more aggressive and add two or three instances, depending upon the criticality of your application?

Another thing that we make use of is step scaling. In our application, we know what are the peak periods and what non-peak periods are. During the peak periods, we have a Cron job, or step scaling as we call it, which moves or increases the number of instances available for the period of that peak time. During non-peak periods such as the night time, we downscale the number of instances.

Client timeouts. Client timeouts, again, if you have a service 1 and a service 2, service 1 would send a request to service 2, and a timeout is the amount of time that service 1 is allowed to wait until it receives a response from service 2. Let’s say if service 1 has a timeout of 200 milliseconds. Once it sends a request to service 2, if service 2 does not respond within that 200-millisecond period, service 1 will break its connection from service 2. This ensures that the core services work always even when the dependent services are not available. It prevents services from waiting on it indefinitely. Also, it prevents any blockage of threads.

With this, we have come to an end of the different levers, and now we’re going to look at aligning your data model. Once we had all the levers that I’ve spoken about in place, and again, when we executed the performance test in UAS, we identified that a bottleneck was now on our DynamoDB. What we realized is, when we had a thundering herd event on our DynamoDB, it was not fast enough to scale. Although we have autoscaling set up on our DynamoDB, it’s not scaling up fast enough to handle that incoming traffic, the spikes that we were getting growing within a couple of seconds.

The very first step, what we did was we looked at the pricing model or the different modes that are available on DynamoDB. In UAS, we make use of provision mode, and that satisfies and it works very well for most of our use cases except with spiky traffic. Then we moved on to something called on-demand, which is meant to be really good for spiky traffic. However, on-demand is supposed to be almost seven times the cost of provisioned mode.

In BBC, we have to operate within the limits of a restricted budget. We have to be cost effective. Hence, we decided to stay on the provisioning mode and then check what our options might be with provisioning mode. That’s when we started looking into indexing. We went back to the drawing board, we started to understand, what are the use cases of UAS?

What is the data being used for? Is it being retrieved efficiently? We started talking to our stakeholders. That’s when we realized that the data model is not aligned with access patterns of the data. The answer we found is in the form of indexing. Indexes as in relational database, it basically gives you a better retrieval speed, or makes your retrieval more efficient.

Before we went to the new index, the way we were retrieving information is based on the user ID and the last modified time. In NoSQL, you can retrieve information only based on the key attributes. You can, of course, based on non-key attributes, but you would have to scan the entire table in order to get the data that you want, which is a very expensive operation. The most efficient way of retrieving information is to base your retrieval or query on the key attributes.

Previously, we were querying the data based on the user ID and the last modified time, but when iPlayer wanted the user activities that were bound to the iPlayer, what we were doing in this case, is we’re retrieving all the products, whether it be iPlayer, Sounds, and all the BBC products that send the user activities. Once the data was batched, we were then filtering it out on the iPlayer and returning that to iPlayer.

Now what we decided to do is, we decided to make the product domain a part of our key attribute. Now when iPlayer sends a request asking for the user activities, we retrieve the information only that is associated with iPlayer. Now we reduced the amount of data that is fetched and is fetched only for the product that has requested it. As I said, it reduces the amount of data fetched. It prevents a thundering herd event on our DynamoDB. Less data implies less capacity and hence less time taken, which improves the availability and reliability of our system. Of course, less capacity implies cost savings, which is a win-win.

Of course, there are always challenges. These challenges come in the form of high-profile events like Wimbledon, elections, the world cup, breaking news, and, of course, the state funeral last year. These are events that drive a lot of users to the BBC. Most often what we see is, for a high-profile event like this, you will get a huge spike in the traffic with people coming to the iPlayer, hence our requests coming to UAS to retrieve these users’ activities.

If you’re talking about an event like a World Cup, of course, we would get a spike right at the beginning or around the beginning of the match. Whenever there is a goal, then you’d see spikes at those times. Some of the levers for high-profile events would be pre-scaling your components, because of the type of spikes that you get, the spikes grow within really seconds, and your autoscaling criteria might not satisfy that, to autoscale your application.

You might want to pre-scale ahead of the event. Or you might want to set some scheduling actions on your DynamoDB to increase the capacity for the duration of the event. Of course, monitor your application and make the necessary changes. One of the biggest or the highest high-profile events last year was the state funeral, the Queen’s death. We got around 28,000 requests per second at UAS around 11:57. If you look at the response time, it was around 20 milliseconds. We were really happy with that.

Tooling

Now, going on to the tooling that has enabled us and helped us through this journey. First of all, we have the CI/CD pipeline, this continuous integration and continuous deployment pipeline. It’s a very good practice to treat the infrastructure as code. Previously, we had a problem where people would manually make a copy or duplicate an existing pipeline, and then make the changes manually. However, there was always a problem that there could be some misconfiguration.

There was a lot of time lost there in order to get it working again. Now that we are treating the infrastructure as code, if there’s any problem with the pipeline, we just redeploy it, and everything is as good as new. Resilient one-click deployments. You want to reduce the time between code build to deployment. It’s an iterative process. You identify a bottleneck, you fix it, you deploy it, you then run the performance. You want to be as quick as possible in the cycle. You do not want to have weeks’ delays, or days’ delays. You want to be having really one-click deployments. Of course, more frequent deployments of smaller changes, one change at a time.

Performance testing. The different types of load tests that we perform are benchmarking load tests. First of all, identify what is the capacity of your application without making any changes. BAU load test, business as usual load test, that’s your normal traffic pattern. Identify how your system is performing with the normal traffic pattern. Spike load test, it’s good to know what is the level of a spike that you can handle on top of your BAU load.

Soak test, so running the load test for a couple of hours to see whether any of your services would be stressed out or strained with running for a couple of hours. Of course, it’s really good to know the breaking limits of your system. That’s the uppermost threshold that your system can handle. This information is really vital, especially for high-profile events so that you know what is the capacity, and then you can make those pre-scaling decisions.

Another huge thing that we made was creating real-time load test reports. Previously, we used to run a load test, and then we had to wait for the entire load test to be finished and then have the reports generated. Now, we’ve managed to create real-time reports so that we can see the health of our system, the performance of our system as the load test or the load is being pushed.

Monitoring dashboards and notifications. I think this has been an absolute game changer for us. Of course, we did have monitoring dashboards previously, but now we have increased the granularity and the detailed dashboards that we have. If somebody does ask us how was UAS, or is UAS having any issues? We can go to these dashboards, and we can really drill down and see if there are issues, and where the issues are.

It really helps to understand the health of your system and to track your performance metrics. In addition to the dashboards, we also have alarms and notifications in the form of emails and Slack. Of course, if these alarms get fired, we have well-defined actions mentioned in our runbook. There’s no time loss in trying to understand what is the next step, we know exactly what needs to be done. Hence the fix is much quicker.

Managing Stakeholder Expectations

Let’s look at managing stakeholder expectations. You need to have a good understanding of your complete use case, user journeys from start to finish. In big organizations, there’s often so many teams, and these teams tend to operate in silos. You tend to just work within the bounds of your team. Hence, there is a lot of information and there’s a lot of context that’s lost. Always try to understand the user journey right from the start to the finish, even if there are lots of teams involved.

Have those open channels of communication between the teams. Understand the traffic profile. Of course, know what is the critical path and what is the non-critical path. This will help you in using patterns like retries and circuit breakers and the others. Some tools for collaboration and communication that we use are Slack, which is really real time. We do have product help channels. For example, we have help UAS, where we have all our stakeholders of UAS, and they talk to us. They ask us questions. They get in touch with us if they see any issues with our service. We’re trying to increase and have these communications with our stakeholders. Of course, we use Dropbox for asynchronous communication as well.

Foster a culture of experimentation, be it in the form of simulation exercises as we have seen before, or fire drills. Fire drills could be even a paper exercise. For example, if you want to list, ok, what will happen in the case your database goes down? What is the disaster recovery process? You can have different teams involved. You can have a well-defined process in place with well-defined actions, and the ownership defined. If the situation arises, there is no pressure, and you know you have the set of actions to be performed, and what sequence.

We also have good forums for knowledge transfer and collaboration. We have a monthly session between UAS and our stakeholders, called data and personalization, wherein we go and talk about UAS as a product. We talk about the different endpoints. We talk about our roadmap. We give a chance to our stakeholders to come and ask us questions, to talk to us about their grievances. Also, it’s a good place for stakeholders to talk to each other and share the best practices of what’s working, or what’s not working, or how do they get around this problem. It’s really important to have those channels to transfer knowledge and talk to each other.

Some other considerations that are worth pondering, because we have data, we have databases, so think about the data retention policies. How much value is there in retaining data older than 10 years? Forecast your future demand, not just short-term, but 10 years, long-term, midterm, as well as short-term. Review your architecture and infrastructure. Understand whether it is sustainable or can it handle that forecasted traffic.

Cultural change, incident washups, root cause analysis, are really important. Of course, actions on the back of this root cause analysis to fix that problem so that it does not reoccur again. Of course, no-blame culture. You don’t want to be pointing fingers, even if there’s a failure. What you want to do is make it better. You want people to communicate. You want them to have that safe space for discussion. If you point fingers, then it becomes very restrictive, and people become resolved. Of course, have a well-defined disaster recovery process in place.

Recap

We looked at steps for improving system performance. We looked at how to identify system performance bottlenecks. We looked at some of the levers for tuning your application. Finally, an important thing is also to make sure your data model is aligned with access patterns of the data. We looked at some of the enablement tooling, CI/CD pipeline, performance tests, and monitoring dashboards. Finally, we looked at some ways to manage our stakeholder expectations.

Summary

Understand the complete user journey to know the implications of a change. Break the silos between teams. Open the channels of communication and collaboration. Know what your stakeholders want. Do not make any assumptions. Technology without communication will solve only one-half of the problem. It has to go hand-in-hand. Review, reform, reflect. Look at your architecture, look at your infrastructure, identify the bottlenecks, make the change. Then, again, see whether it has made any impact. This has to be an ongoing process, an iterative process.

See more presentations with transcripts

Uncategorized

Article: How to work with Your Auditors to Influence a Better Audit Experience

MMS • Clarissa Lucas

Key Takeaways

Today’s interaction model between auditors and audit clients, such as technology product owners and developers, is often ineffective and inefficient, resulting in a lack of clear value from audit work, wasted time, and fear of auditors (or a general dislike of auditors at a minimum).
Many organizations and technology product teams leverage evolved ways of working, such as making work visible, iterative delivery, feedback loops, and limiting work in process, but older auditing approaches don’t account for these newer operating models.
Auditing with agility consists of three core components: value-driven auditing, integrated auditing, and adaptable auditing, which result in more valuable audit outcomes, less unplanned work for you as product owner or developer, and the ability to change course during or stop an audit when it is no longer valuable to continue on the current course.
Investing time throughout an audit to help the auditors understand what’s valuable to you is a worthwhile investment, resulting in more valuable audit outcomes and less time wasted.
As a technology product owner or developer, you significantly influence a more valuable audit experience. You can work with your auditors to define what holds the most value, integrate audit planning into your daily work, and remain adaptable to changes as needed.

When you think of an internal audit or being audited, your thoughts might elicit emotions like fear, anxiety, or annoyance. With the unplanned work the audit adds to your plate, the unexpected findings, and the results that don’t seem of much value to you, it’s no wonder you’re not necessarily thrilled to get a visit from your auditors.

But what if it could be different? What if instead of dreading the arrival of your auditors, you’d be as excited to see them as you would the Amazon delivery driver bringing your latest purchase?

It is possible to influence a better audit experience, transforming it from a check-the-box exercise with little perceived value to one of actual value that helps set you and your product or development team up for success with way less pain.

Why Are Audits So Painful?

To unlock a better audit experience, it’s important to understand why audits are so painful. For at least the past thirteen years (as long as I’ve been in the audit game), audits have been performed using a waterfall approach. That means audits leverage a staged, sequential methodology, where one stage must be completed before moving on to the next stage.

A typical audit comprises three main stages: Planning, Fieldwork, and Reporting. Auditors using the waterfall methodology must finish Planning before beginning Fieldwork and Fieldwork before starting Reporting. This approach should be familiar to those with experience in the history of technology program management or software development methodologies.

When applied to internal audits, the waterfall can be fairly effective in certain scenarios– primarily those in which the type of work is known and stable. This was quite common decades ago and even a few years ago in certain situations; however, that’s not the environment we operate in today. Now, most of our work is either unknown or dynamic (or both). As a result of this mismatch between how an audit is run and the environment in which organizations operate, audits have become painful.

Some parts of organizations, like IT, have adapted to the current, dynamic environment by adopting better ways of working. Still, many internal audit teams stick to older methods of working that can’t keep up with today’s pace of change. The auditors’ older ways of working often don’t provide enough visibility for you as to what work is coming your way in the next few weeks or months. The auditors schedule what seems like surprise meetings with you that you need to attend. They also send you long lists of documentation they need you to provide. What does this mean for you as developers? Unplanned work piled onto your plate and time spent on an audit that doesn’t provide enough value.

The problem is exacerbated by strained or adversarial relationships between auditors and developers (those subjected to being audited). The interaction model often looks like this — auditors announce their arrival and ask many questions. Then, they go away for a little while. When they return, they present you with a plan. Then, the real fun begins. Over the next few weeks, you answer question after question and respond to endless requests for documentation.

At last, the end of the audit arrives, and the auditors unveil the results. Unfortunately, the audit report is full of findings that don’t matter to you. They could be gaps you were already aware of or gaps that aren’t important to your business.

This typically happens when the auditors don’t have the audit scope focused on the areas of most importance to your business or don’t quite understand how critical or non-critical the gaps are to your business. Or it could be that the findings represent critical gaps that aren’t articulated in a clear way.

Now, you’ve got to allocate time to fix these gaps — even those that don’t matter to you. It’s no wonder you don’t love your auditor! Today’s interaction model between auditors and developers, paired with the mismatch between how an audit is performed and how the organization works, is often ineffective and inefficient.

What Does a Better Audit Experience Look Like?

One of my favorite questions to ask my clients is this: “If you had a magic wand and could use it to make a better audit, what would that better audit look like?” The responses include things like:

Less time wasted
Results that mean something to me
Results that are timely and not stale (not communicated months after they’re identified)
Auditors who understand my business
An audit that happens with me, not to me
NO SURPRISES!

But that can’t happen in real life, can it? It absolutely can, through what’s called auditing with agility. Auditing with agility is a flexible, customizable approach to auditing that borrows concepts from Agile and DevOps.

Remember before DevOps how technology development teams and operations teams used to get in each other’s way? Those two teams weren’t incentivized to work together. So things resulted in strained or even adversarial relationships. That sounds a lot like what’s happened with auditors and developers. With the introduction of a DevOps operating model, way of working, and culture, Dev teams and Ops teams were now able to (and incentivized to) work together toward a common goal.

Auditing with agility does for auditors and developers what DevOps does for Dev and Ops teams. When applying these better ways of working through the audit process, you can experience the following:

Greater efficiency (less time wasted)
Better alignment between audit work and organizational value
Greater ability to respond to change during an audit (e.g., pivoting the audit’s focus to account for a changing business environment or stop auditing when the remaining audit work will no longer provide value)
More timely results
Greater buy-in from you and your team
Stronger working relationships between you and your auditors.

Your experience shifts from unplanned, non-value-added work to planned work aligned with value. You work with your auditors to help them understand your business. They provide timely, valuable, and actionable results that don’t surprise you since you’ve worked so closely with them throughout the entire audit process. You are an active participant in the audit rather than an innocent bystander. The audit happens with you, not to you.

That’s what a better audit experience looks like. Now, you just need to know how to get there.

How to Influence a Better Audit Experience

You might think, “I know I need a better audit experience, but what can I do about it as a technology product owner or developer? Isn’t that all within the auditors’ control?” While the auditors control the audit process, you can certainly influence a more valuable audit that happens with you, not to you. Auditing with agility helps you do just that.

Remember, auditing with agility is a flexible, customizable audit approach that leverages concepts from agile and DevOps to create a more value-added and efficient audit. There are three core components to auditing with agility:

Value-driven auditing, where the scope of audit work is driven by what’s most important to the organization
Integrated auditing, where audit work is integrated with your daily work
Adaptable auditing, where audits become nimble and can adapt to change

Each core component has practices associated with it. For example, practices associated with value-driven auditing include satisfying stakeholders through value delivery. In my book, Beyond Agile Auditing, I state that stakeholders “value audit work that is focused on the highest, most relevant risks and the areas that are important to achieving the organization’s objectives.^[1]” As an auditor, I like to ask my clients questions like “What absolutely needs to go right for you (or your business) to be successful?” or “What can’t go wrong for you (or your business) to be successful?” I do this to help identify what matters and what is most valuable to my client’s business.

What can you, as a product owner, architect, or developer, do to ensure your efforts during the audit are in support of the most relevant risks? Help your auditors understand what value looks like to you. Help them understand what you and your team are trying to accomplish and what has to go right for that to happen. This will help the auditors make a well-informed decision about what to focus on during the audit. When the audit is focused on areas of value to you and your organization, the audit results (assurance that things are as you expect or need them to be, or awareness of critical gaps that could prevent you from achieving your objectives) are more valuable to you.

Here’s what that might look like in practice. Let’s say you heavily leverage a third-party Software as a Service (SaaS) solution for key aspects of your business, such as network security. You depend on this third party to keep the solution’s baseline configurations and patches current. You also depend on the third party to follow appropriate change management practices when changing the SaaS solution. If the third party fails to deliver as expected, you may run into some huge problems; in this instance, those problems could be vulnerabilities in your network’s security. As the auditors come in to audit your business or product, it would be very valuable if they could provide you with some assurance whether the risk of the third party failing to deliver is managed effectively. You’d want to know about it if it isn’t, right?

Here’s where you can help. You’ll want to invest some time explaining your business to the auditors and helping them understand how important it is that the dependence on that third party is effectively managed. You can even specifically ask your auditors to look at this during the audit—ask them to provide insights as to whether the safeguards in place are effective in managing the risk of the third party failing to deliver. Then, the audit results delivered to you will be of utmost value because they’ll be focused on what matters most to your business.

Let’s move on to the second core component of auditing with agility: integrated auditing. Integrated auditing is where audit work is integrated into your daily work. A key practice of integrated auditing is integrated planning.

Before exploring integrated planning, let’s reflect on the last time you were audited, which probably leveraged the waterfall audit approach. I would wager that it went something like this: the auditors had a couple of meetings with you to get a high-level understanding of your product. You explained to them that one risk is errors making it into production, which could disrupt your product or the business that depends on your product. The auditors left for a few days or weeks. When they returned, they told you what they would be auditing. They also provided you with a request list asking for the names of people who have both developer access to your product and access enabling them to promote code into production. They’re looking to test traditional segregation of duties (SOD) controls. But you stopped managing the risk of errors making it into production through segregating traditional access roles a while ago. Now, you manage that through automated tests in the development pipeline. Alas, the auditors aren’t auditing what matters to your organization, and the request list doesn’t seem to make sense. That’s frustrating, and unfortunately, it’s not uncommon. Luckily, integrated planning can solve that problem by helping the auditors focus on what matters and create a mutual understanding of the documentation needed to complete the audit.

So what is integrated planning? “Integrated planning includes the entire audit team assigned to the audit, as well as the audit clients (e.g., developers), in identifying key risks, key controls, and testing procedures.” Think about it — if you’re more involved in the audit planning process, there’s a better opportunity for you to educate your auditors about what’s important to you, help them understand your business or product, and understand why the auditors have defined the scope as they have. Instead of the auditors coming in with checklists, you’ll work together to create a plan for the audit that makes sense to you and your auditors. You’ll also work together to develop the documentation needed to complete the audit. Because you’re doing this with the auditors, you’ll both understand what is requested. You’ll know exactly what you need to provide, and the auditors will know exactly what they will receive. In the example above, you’ll help your auditors understand that testing traditional SOD controls won’t work because you changed the way you manage the risk that used to be controlled via SOD controls. You’ll explain to them how today’s automated tests in the development pipeline manage that risk. Then, you’ll work together to articulate what evidence they can review to determine whether those automated tests are effective — and it will likely not be access lists like they requested in the past. That saves a LOT of time and frustration.

There are a few key factors that drive successful integrated planning. While these success factors apply across all three core components of auditing with agility (value-driven, integrated, and adaptable auditing) and the associated practices, let’s take a quick detour to cover it now. The Three Ways of DevOps are principles that form the foundation of DevOps. They include flow/systems thinking, amplifying feedback loops, and a culture of continual experimentation and learning.

Borrowed from the Three Ways of DevOps, these better ways of working require a culture of organizational learning and safety. Auditors need to be open to new ways of auditing and listening to their clients and leveraging their clients’ knowledge to help them make well-informed decisions about the audit scope. As developers, you must be open to trying a different way of working during an audit and be willing to invest time to partner with your auditors and help them understand what’s valuable to you. Instead of spending as little time as possible with your auditors, particularly during planning, investing more time in activities like integrated planning yields worthwhile results. Finally, auditors and developers also need to give each other grace as they navigate these new ways of working together.

It’s important to note that auditors must maintain appropriate independence and objectivity. Because of this, they’ll retain final decision rights on the audit’s scope and other key decisions related to the audit. Now that you’re working closely with your auditors to develop the scope, if they include something that doesn’t make sense to you (“Why would they want to audit that?”), work with them to understand why they’re including it in the audit scope. Perhaps it is a regulatory requirement that they include it. Or maybe the CEO requested it. Perhaps they see value in it and can help you understand their perspective. Or maybe they misunderstood or didn’t realize it was valuable. Staying involved and integrating yourself into the planning process not only improves the relationship between you and your auditors, but also cultivates greater buy-in on the audit’s scope (and the results).

Finally, with adaptable auditing, you and your auditors continue to work together throughout the audit and intentionally watch for the need to change. If something happens while the audit is in process (e.g., the organization’s operating environment changes or you learn something during the audit that might cause you to modify the audit’s scope), you and your auditors re-evaluate whether to continue the current course, change course, or stop auditing. This is similar to the Agile Principle (from the Agile Manifesto, which was created by the Agile Alliance to bring agility to software development) of embracing changes, even if they occur later in the development process. When auditing with agility, we embrace changes, even if they occur later in the audit process, rather than blindly sticking to the original plan when the original plan no longer adds the most value.

Now that you’re working with your auditors to help them focus their audit on what’s most valuable, integrating yourself into the audit process — beginning with integrating into the planning activities and pivoting to adapt to change — you’re more likely to support the audit, experience efficiencies, and get more value from your investment of time. You’ll truly experience the benefits of having an independent partner bring a fresh perspective to your product and help set you up for success. Instead of hoping the auditors leave you alone, you’ll be proactively reaching out to them, asking them for their perspective.

Conclusion

While audits may have been painful in the past, you no longer have to sit by and ensure those types of experiences. With the help of auditing with agility, you can cultivate a much better working relationship — even a partnership — with your auditors. You can influence a better audit experience by helping your auditors understand what’s most important to your product’s success, integrating their work into your daily work, and adapting to changes as needed.

As a developer or product owner, start today by calling your auditors and inviting them to join you for coffee (in-person or virtual). Start building that partnership by discussing your product and how it helps support the organization’s success. Ask them how they’re innovating in internal audit to stay current. Tell them about these better ways of working and offer to teach them how to apply them to an audit. You’ll be pleasantly surprised with the results when you do.

Your auditors may be a little apprehensive at first. After all, they may not even realize they can work differently for better outcomes. They also might not initially see how changing their ways of working can lead to better outcomes. With time and commitment (from both parties), your auditors should see the benefits of applying these better ways of working to the audit process. It’ll strengthen your partnership and unlock value neither you nor your auditors ever imagined.

References

Lucas, Clarissa. Beyond Agile Auditing: Three Core Components to Revolutionize Your Internal Audit Practices. IT Revolution, 2023. P. 73.
IT Revolution. The Three Ways: The Principles Underpinning DevOps | Gene Kim.
History: The Agile Manifesto.

About the Author

Clarissa Lucas

Show moreShow less

Uncategorized

MongoDB (MDB) Suffers a Larger Drop Than the General Market: Key Insights

MMS • RSS

In the latest market close, MongoDB (MDB) reached $405.51, with a -0.52% movement compared to the previous day. The stock’s change was less than the S&P 500’s daily loss of 0.2%. Meanwhile, the Dow experienced a drop of 0.18%, and the technology-dominated Nasdaq saw a decrease of 0.59%.

Shares of the database platform have appreciated by 19.09% over the course of the past month, outperforming the Computer and Technology sector’s gain of 10.56% and the S&P 500’s gain of 7.87%.

Market participants will be closely following the financial results of MongoDB in its upcoming release. The company plans to announce its earnings on December 5, 2023. The company is expected to report EPS of $0.49, up 113.04% from the prior-year quarter. Meanwhile, the latest consensus estimate predicts the revenue to be $402.75 million, indicating a 20.72% increase compared to the same quarter of the previous year.

MDB’s full-year Zacks Consensus Estimates are calling for earnings of $2.34 per share and revenue of $1.61 billion. These results would represent year-over-year changes of +188.89% and +25.06%, respectively.

Any recent changes to analyst estimates for MongoDB should also be noted by investors. Recent revisions tend to reflect the latest near-term business trends. As such, positive estimate revisions reflect analyst optimism about the company’s business and profitability.

Our research suggests that these changes in estimates have a direct relationship with upcoming stock price performance. We developed the Zacks Rank to capitalize on this phenomenon. Our system takes these estimate changes into account and delivers a clear, actionable rating model.

Ranging from #1 (Strong Buy) to #5 (Strong Sell), the Zacks Rank system has a proven, outside-audited track record of outperformance, with #1 stocks returning an average of +25% annually since 1988. The Zacks Consensus EPS estimate remained stagnant within the past month. At present, MongoDB boasts a Zacks Rank of #3 (Hold).

In terms of valuation, MongoDB is currently trading at a Forward P/E ratio of 174.36. This expresses a premium compared to the average Forward P/E of 35.92 of its industry.

The Internet – Software industry is part of the Computer and Technology sector. This industry, currently bearing a Zacks Industry Rank of 43, finds itself in the top 18% echelons of all 250+ industries.

The Zacks Industry Rank assesses the vigor of our specific industry groups by computing the average Zacks Rank of the individual stocks incorporated in the groups. Our research shows that the top 50% rated industries outperform the bottom half by a factor of 2 to 1.

Keep in mind to rely on Zacks.com to watch all these stock-impacting metrics, and more, in the succeeding trading sessions.

Want the latest recommendations from Zacks Investment Research? Today, you can download 7 Best Stocks for the Next 30 Days. Click to get this free report

MongoDB, Inc. (MDB) : Free Stock Analysis Report

To read this article on Zacks.com click here.

Zacks Investment Research

Uncategorized

Combining AI with React for a Smarter Frontend – The New Stack

MMS • RSS

Combining AI with React for a Smarter Frontend – The New Stack

2023-11-21 12:30:57

Combining AI with React for a Smarter Frontend

Jesse Hall, senior developer advocate with MongoDB, explained the building blocks for integrating artificial intelligence into React apps.

Nov 21st, 2023 12:30pm by

Loraine Lawson

Featued image for: Combining AI with React for a Smarter Frontend

Image via Unsplash

Frontend development will have to incorporate artificial intelligence sooner rather than later. The burning questions though are what does that even look like and must it be a chatbot?

“Almost every application going forward is going to use AI in some capacity, AI is going to wait for no one,” said Jesse Hall, a senior developer advocate at MongoDB, during last week’s second virtual day of React Summit US. “In order to stay competitive, we need to build intelligence into our applications in order to gain rich insights from our data.”

A Tech Stack for React AI Apps

First, developers can take custom data — image, blogs, videos, articles, PDFs, whatever — and generate embeddings using an embedding model, then store those embeddings in a vector database. It doesn’t necessitate LangChain, but that can be helpful in facilitating that process, he added. Once the embeddings are created, it’s possible to accept natural language queries to find relevant information from that custom data, he explained.

MongoDB Senior Developer Advocate Jesse Hall explains the RAG workflow.

“We send the user’s natural language query to an LLM, which vectorizes the query, then we use vector search to find information that is closely related — semantically related — to the user’s query, and then we return those results,” Hall said.

For example, the results might provide a text summary or links to specific document pages, he added.

“Imagine your React app has an intelligent chatbot with RAG [Retrieval Augmented Generation] and vector embeddings. This chatbot could pull in real-time data, maybe the latest product inventory, and offer it during a customer service interaction, [using] RAG and vector embeddings,” he said. “Your React app isn’t just smart, it’s adaptable, real-time and incredibly context-aware.”

To put a tech stack around that, he suggested developers could use Next.js version 13.5 with Vercel’s app router, then connect with OpenAI’s Chat GPT 3.5, Turbo and GPT 4. Then LangChain could be a crucial part of the stack because it helps with data pre-processing, routing data to the proper storage, and making the AI part of the app more efficient, he said. He also suggested using Vercel’s AI SDK, an open source library designed to build conversational, streaming user interfaces.

Then, not surprisingly for a MongoDB developer advocate, he suggested leveraging MongoDB to store the vector embeddings and MongoDB Atlas Vector Search.

“It’s a game changer for AI applications, enabling us to provide a more contextual and meaningful user experience by storing our vector embeddings directly in our application database, instead of bolting on yet another external service,” he said. “And it’s not just vector search. MongoDB Atlas itself brings a new level of power to our generative AI capabilities. “

When combined, this technology stack would enable smarter, more powerful React applications, he said.

“Remember, the future is not just about smarter AI, but also about how well it’s integrated into user-centric platforms like your next React-based project,” Hall said.

How to Approach GPTs

Hall, who also creates the YouTube show codeSTACKr, also broke down the terms and technology that developers need in order to incorporate artificial intelligence into their React applications, starting with what to do with general pre-trained models (GPTs).

“It’s not merely about leveraging the power of GPT in React. It’s about taking your React applications to the next level by making them intelligent and context-aware,” Hall said. “We’re not just integrating AI into React, we’re optimizing it to be as smart and context-aware as possible.”

There’s a huge demand for building intelligence into applications and to make faster, personalized experiences for users, he added. Smarter apps will use AI-powered models to take action autonomously for the user. That could look like a chatbot, but it could also look like personalized recommendations and fraud detection.

The results will be two-fold, Hall said.

“First, your apps drive competitive advantage by deepening user engagement and satisfaction as they interact with your application,” he explained. “Second, your apps unlock higher efficiency and profitability by making intelligent decisions faster on fresher, more accurate data.”

AI will be used to power the user-facing aspects of applications, but it will also lead to “fresh data and insights” from those interactions, which in turn will power a more efficient business decision model, he said.

GPTs, Meet React

Drilling down on GPTs, aka large language models, he noted that GPTs are not perfect.

“One of their key limitations is their static knowledge base,” he said. “They only know what they’ve been trained on. There are integrations with some models now that can search the internet for newer information. But how do we know that the information that they’re finding on the internet is accurate? They can hallucinate very confidently, I might add. So how can we minimize this?”

The models can be made to be real-time, adaptable and more aligned with specific needs by using React, large language models and RAG, he explained.

“We’re not just integrating AI into React, we’re optimizing it to be as smart and context-aware as possible,” he said.

He explained what’s involved with RAG, starting with vectors. Vectors are the building blocks that allow developers to represent complex multidimensional data in a format that’s easy to manipulate and understand. Sometimes, vectors are referred to as vector embeddings, or just embedding.

“Now the simplest explanation is a vector is a numerical representation of data and array of numbers. And these numbers are coordinates in an n-dimensional space, where n is the array length. So, however, how many numbers we have in the array is how many dimensions we have,” he explained.

For example, video games use 2-D and 3-D coordinates to know where objects are in the games world. But what makes vectors important in AI is that they enable semantic search, he said.

“In simpler terms, they let us find information that is contextually relevant, not just a keyword search,” Hall said. “And the data source is not just limited to text. It can also be images, video, or audio — these can all be converted to vectors.”

So step one would be creating vectors, and the way to do that is through an encoder. Encoders define how the information is organized in the virtual space, and there are different types of encoders that can organize vectors in different ways, Hall explained. For example, there are encoders for text, audio, images, etc. Most of the popular encoders can be found on Hugging Face or OpenAI, he added.

Finally, RAG comes into play. RAG is “an AI framework for retrieving acts from an external knowledge base to ground large language models (LLMs) on the most accurate, up-to-date information and to give users insight into LLMs’ generative process,” according to IBM.

It does so by bringing together generative models with vector databases and LangChain.

“RAG leverages vectors to pull in real-time, context-relevant data and to augment the capabilities of an LLM,” Hall explained. “Vector search capabilities can augment the performance and accuracy of GPT models by providing a memory or a ground truth to reduce hallucinations, provide up-to-date information, and allow access to private data.”

Group
Created with Sketch.

Loraine Lawson is a veteran technology reporter who has covered technology issues from data integration to security for 25 years. Before joining The New Stack, she served as the editor of the banking technology site, Bank Automation News. She has…

Uncategorized

William Blair Investment Management LLC Lowers Holdings in MongoDB, Inc. (NASDAQ:MDB)

MMS • RSS

William Blair Investment Management LLC trimmed its holdings in MongoDB, Inc. (NASDAQ:MDB – Free Report) by 47.5% during the 2nd quarter, according to its most recent Form 13F filing with the Securities & Exchange Commission. The institutional investor owned 202,654 shares of the company’s stock after selling 183,289 shares during the period. William Blair Investment Management LLC owned about 0.29% of MongoDB worth $83,289,000 as of its most recent filing with the Securities & Exchange Commission.

Several other large investors have also made changes to their positions in the business. Simplicity Solutions LLC increased its holdings in shares of MongoDB by 2.2% in the second quarter. Simplicity Solutions LLC now owns 1,169 shares of the company’s stock worth $480,000 after purchasing an additional 25 shares during the last quarter. AJ Wealth Strategies LLC increased its stake in MongoDB by 1.2% during the second quarter. AJ Wealth Strategies LLC now owns 2,390 shares of the company’s stock worth $982,000 after acquiring an additional 28 shares during the last quarter. Assenagon Asset Management S.A. increased its stake in MongoDB by 1.4% during the second quarter. Assenagon Asset Management S.A. now owns 2,239 shares of the company’s stock worth $920,000 after acquiring an additional 32 shares during the last quarter. Veritable L.P. increased its stake in MongoDB by 1.4% during the second quarter. Veritable L.P. now owns 2,321 shares of the company’s stock worth $954,000 after acquiring an additional 33 shares during the last quarter. Finally, Choreo LLC increased its stake in MongoDB by 3.5% during the second quarter. Choreo LLC now owns 1,040 shares of the company’s stock worth $427,000 after acquiring an additional 35 shares during the last quarter. 88.89% of the stock is currently owned by hedge funds and other institutional investors.

Analyst Ratings Changes

Several equities research analysts have weighed in on MDB shares. JMP Securities boosted their price objective on MongoDB from $425.00 to $440.00 and gave the stock a “market outperform” rating in a research note on Friday, September 1st. Oppenheimer upped their target price on MongoDB from $430.00 to $480.00 and gave the company an “outperform” rating in a research note on Friday, September 1st. Mizuho upped their target price on MongoDB from $240.00 to $260.00 in a research note on Friday, September 1st. Argus upped their target price on MongoDB from $435.00 to $484.00 and gave the company a “buy” rating in a research note on Tuesday, September 5th. Finally, Guggenheim upped their target price on MongoDB from $220.00 to $250.00 and gave the company a “sell” rating in a research note on Friday, September 1st. One investment analyst has rated the stock with a sell rating, two have issued a hold rating and twenty-four have assigned a buy rating to the company’s stock. According to MarketBeat, MongoDB has a consensus rating of “Moderate Buy” and a consensus target price of $419.74.

Read Our Latest Research Report on MDB

MongoDB Stock Performance

Shares of MDB opened at $410.37 on Tuesday. MongoDB, Inc. has a 12 month low of $137.70 and a 12 month high of $439.00. The stock has a fifty day moving average price of $354.67 and a 200-day moving average price of $360.07. The company has a current ratio of 4.48, a quick ratio of 4.48 and a debt-to-equity ratio of 1.29. The firm has a market cap of $29.28 billion, a PE ratio of -117.81 and a beta of 1.16.

MongoDB (NASDAQ:MDB – Get Free Report) last issued its quarterly earnings data on Thursday, August 31st. The company reported ($0.63) EPS for the quarter, beating the consensus estimate of ($0.70) by $0.07. MongoDB had a negative return on equity of 29.69% and a negative net margin of 16.21%. The firm had revenue of $423.79 million during the quarter, compared to analyst estimates of $389.93 million. Equities research analysts predict that MongoDB, Inc. will post -2.17 earnings per share for the current fiscal year.

Insider Activity at MongoDB

In related news, Director Dwight A. Merriman sold 1,000 shares of the company’s stock in a transaction on Friday, September 1st. The stock was sold at an average price of $395.01, for a total value of $395,010.00. Following the transaction, the director now directly owns 535,896 shares of the company’s stock, valued at approximately $211,684,278.96. The transaction was disclosed in a legal filing with the SEC, which is available through this hyperlink. In other MongoDB news, CRO Cedric Pech sold 16,143 shares of the stock in a transaction dated Thursday, September 7th. The stock was sold at an average price of $378.86, for a total transaction of $6,115,936.98. Following the sale, the executive now directly owns 34,418 shares of the company’s stock, valued at approximately $13,039,603.48. The sale was disclosed in a legal filing with the Securities & Exchange Commission, which is available through the SEC website. Also, Director Dwight A. Merriman sold 1,000 shares of the stock in a transaction dated Friday, September 1st. The stock was sold at an average price of $395.01, for a total value of $395,010.00. Following the sale, the director now directly owns 535,896 shares in the company, valued at $211,684,278.96. The disclosure for this sale can be found here. Insiders have sold a total of 289,484 shares of company stock valued at $101,547,167 in the last ninety days. Company insiders own 4.80% of the company’s stock.

MongoDB Company Profile

(Free Report)

Featured Stories

Want to see what other hedge funds are holding MDB? Visit HoldingsChannel.com to get the latest 13F filings and insider trades for MongoDB, Inc. (NASDAQ:MDB – Free Report).

Institutional Ownership by Quarter for MongoDB (NASDAQ:MDB)

Before you consider MongoDB, you’ll want to hear this.

While MongoDB currently has a “Moderate Buy” rating among analysts, top-rated analysts believe these five stocks are better buys.

View The Five Stocks Here

7 AI Stocks to Invest In: An Introduction to AI Investing For Self-Directed Investors Cover

As the AI market heats up, investors who have a vision for artificial intelligence have the potential to see real returns. Learn about the industry as a whole as well as seven companies that are getting work done with the power of AI.

Get This Free Report

Uncategorized

Presentation: Living on the Edge

MMS • Erica Pisani

Transcript

Pisani: Welcome to living on the edge, boosting your performance with edge computing. My name is Erica. I am a Senior Software Engineer at Netlify, working on the integrations team there.

We’re first going to talk about what the edge is. Some of you might already be familiar with it in a cloud computing context. Some of you might never have heard about it before this conference, that’s ok.

We’re going to start from ground zero and bring everyone up to the same level of understanding before switching to web, backend, and mobile application functionality on the edge. Then we’re going to talk about data on the edge. Then this fun section to wrap up called the edgiest part of the edge. I’ll leave that as a bit of a surprise.

What Is the Edge?

What is the edge? In order to understand that we need to take a quick step back and understand how cloud providers organize their data centers. On the broadest possible scope, they organize these things by region, so think your Canada Central, your U.S.-East 1. Within each of these regions, there are multiple availability zones. When we’re talking about origin servers, we’re typically talking about servers located in one of these availability zones.

The edge are data centers that live outside of an availability zone. An Edge Function is a function that’s run on one of these centers. Data on the edge is data that is cached, stored, or accessed at one of these data centers. Depending on the provider that you use, whether that’s AWS, Google Cloud Platform, or Microsoft Azure, you might see different terminology, points of presence is one, or POPs. Edge locations, they all mean the same thing at the end of the day. It’s just everyone want to use different terminology.

To get a sense of just how many more edge locations there are in the world, relative to availability zones, this is a map that I took off of AWS’s documentation that shows all of their availability zones within their global network. This next slide shows all their edge locations. The blue dots are individual edge locations, purple show multiple, and then those yellowish-orangish circles are regional edge caches, which gives you a bit more caching capability closer to your users without needing to go all the way to the origin server.

Just to look at these two things side by side, you can see at a glance that if you’re able to handle a user request in its entirety, or even part of it through the edge network, you can improve your services’ performance significantly due to the lower latency incurred from receiving and fulfilling the request.

In particular, I want to call out the Australia, Africa, and South America regions where they each have one availability zone, which you can see on the left-hand side image, but multiple edge locations. If for some reason you have some business requirement or regulatory requirement where you have to host a user’s data in the same region that they live in, and they live on the opposite end of that region relative to the origin server, the server’s performance of what you’re building can be significantly improved if you can just handle a few of the more popular requests at the edge.

To see where the edge fits into the lifecycle of a request, it should be noted here that the user, in this case, it could be someone making a request through a website. Maybe they’re using a mobile application on their phone. Maybe it’s an Internet of Things device that’s making this request. Whichever one it is, it’ll make a request. The edge location will be the first place to pick that up. As mentioned before, best case scenario, the edge location is able to handle the request in its entirety and send the response back to the user.

Let’s say that it can’t for whatever reason, let’s say a deploy recently went out and the cache wasn’t validated. Maybe this is a request with a lot of personalization so it’s something that you don’t generally cache, what will happen is the edge location will send a request to the origin server. The origin server will handle it as per usual, and then it will send back a response to the edge location.

If this is something that’s more generalized, you can choose to cache that response at the edge before sending the response back to the user. If this is the more generalized response that you want to cache, all these other users in the area that likely want the exact same response to that request will benefit from significantly faster responses. The price that you’re paying is just that initial cold start of having to reach the origin server and come back.

Now that we’ve done an overview of what the edge is, and where it fits in the lifecycle of a request, let’s look at some functionality running at the edge using Edge Functions. I thought it’d be easiest to demonstrate the performance boosting capabilities of the edge by looking at some common performance challenges. Let’s take a look at the first one. To set the scene here, rather than working as a software developer, I instead run a thriving global e-commerce business, perhaps a pet store.

I want to show different banner messages to users based on where they’re located in the world. At the moment, the pages with the banner messages would be created using server-side rendering. I’m looking for opportunities to remove load from the server and return cache responses without any major rearchitecting of my site and how it functions.

The solution where the edge can come into play to help boost the performance here is transitioning that server-side rendered page into a static one, so that that can be cached at build time on the Content Delivery Network, or CDN for faster access going forward, and removes the load on the origin server that existed previously, where it would have had to render the page based on the data it received on every single request.

To go through a code example of what this looks like, I’m using a Next.js application for my site. You’ll see on the left-hand side to accomplish this, I’m using some Next.js middleware. On the right-hand side, I’ve got my server-side rendered page. To walk through the code here, we’re getting the geo object off the request, we’re getting the country off the geo object. We’re taking the value of the country and adding it to the URL before rewriting the request. Then on the right-hand side, we’re getting the country value off the query object.

Then this is where I can show different banner messages depending on what I want to promote, whether it’s maybe a new storefront that’s opening in someone’s city, doing a sale, or just a friendly Hello World. In this case, I am showing most of the world a Hello World message. To celebrate me being here at QCon NYC, I’m giving my fellow Canadians a 50% off their order promo code.

As a refresher, the downsides to this approach are that the server has to render it every single time. This is not cacheable on the CDN. If this is a small, little fun hobby site, you could argue that you could make different static pages to route to in the middleware in order to make those pages cacheable. I think we’ve all had experience that even trying to duplicate a little bit, things get out of sync really quickly. We want to avoid that at all cost.

That brings us to our static and Edge Functions example. You’ll notice that the middleware has a little bit more code in there now. One thing to know that is not obvious from the code itself is that instead of running on the origin server, like what was happening in the first example, the middleware function is now running on the edge. To go through this code, again, we’re getting the geo object off the request, country off the geo object.

The const request = newMiddleware request, we’re going to put a pin in that for one moment, that functionality is going to come in at the end. The next line after that const response = await request.next, what’s happening there is that is making the outbound request to the origin server. Instead of actually hitting the origin server, it’s getting the cached asset of the static page from the CDN. At this point in time, the value of response is the HTML showing the Hello World message.

Now, because we need to inject our localized content, assuming that the user is in Canada, at this point, we’ll create the message we want to create. Then this is where that new middleware request variable comes in. The first thing we’re going to do is we’re going to call this replaceText method, which will replace the text in the HTML with the message we want to show. Then that setPageProp method right below it is updating the Next.js page props before sending the response back to the user.

There’s a bunch of little performance boosts that are happening as a result of this change. The first is that the middleware being on the edge means the request starts getting handled sooner. The page on the CDN means that that asset is cached and returned more quickly as well when you do have to make a request for it. Even though I’m not doing it explicitly here in this middleware function, you could cache that response at the edge, which will mean that future requests are even faster, like we saw in that sample request path slide.

Let’s say that some of these high traffic pages on my website require a user to have an account. I happen to notice in my search for easy performance boosting wins, that the user session validation is taking more time than I’d like for some groups of users, because they’re physically further away from the origin server where my site is hosted. It’s often the case that those users are not initially signed in. How can I use the edge in this case to improve my site’s performance for those users?

The answer would be moving that user session validation into an Edge Function. Let’s take a look at what that looks like. Building on the middleware example that we just saw, I’ve added Auth0 as an import statement and then with these two lines of code that I’ve added that are surrounded by the white boxes there, I’m protecting my whole site with Auth0. At the moment, this code is just requiring that users would need to create an account to access it, but you could do far more sophisticated logic. You could be checking for a role’s value on something like a JSON Web Token or a session cookie, and check that the user has the correct role before giving them access to particular pages on the site.

The last example I want to talk about in this section is this problem of routing a third-party integrations request to the correct region. This was something that I ran into in a previous role of mine. To get a good understanding of this, I want to walk through the problem with you all before talking about how the edge could be used to address this. We had two instances of our site. We had one in North America, which was our original instance, and one in the EU. Users and their data would live in one of these instances, but not both, because we were looking to comply with certain privacy laws and customer requests that involved hosting data in a particular region.

We also had third-party OAuth integrations that we need to support that we want to access or modify our users’ data. Because these integrations were potentially not located in the same region as our users, honestly, it was most of the time, we couldn’t use geo detecting to make an assumption about where the user’s data lives, such as what we saw in our earlier middleware example. To go through what an authorization request would have looked like, and we’re going to use an integration Australia really just to hammer home the points of just how bad this could get.

When the integration was being enabled by a user on the third-party integrations website, the request would go to my company’s original origin server first in North America. We would check at that point to see if the user existed within that region. If it didn’t, we would have to send a request to the EU instance to see if the user existed there. Assuming that the user existed, we would then send a response back to the integration with the data needed as part of the authorization flow.

For subsequent requests, it was expected by the integrations that they can make a request to either instance, and we would redirect the request to the correct instance on our end to access and modify the user’s data on their behalf. That is a lot of back and forth across the world. As a junior dev, I remember hearing as a rule of thumb for myself that a highly performant request is under 300 milliseconds, and every across the ocean request was 150 milliseconds of latency added to a request.

You can imagine by going through that diagram, that the request just hung there on the authorization. While I think as users, we’re trained to expect that that initial authorization might take a little bit more time, it was completely unacceptable to have that happen for subsequent requests. We really focused mainly on how to improve those ones.

The first thing we considered was returning a URL corresponding to the instance where the user’s data was hosted as part of the authorization response, so that the integration could then make requests directly to the correct instance without needing to be redirected. This meant leaking implementation details, which obviously isn’t ideal, especially as we added more instances going forward. We also considered encoding the region on a JSON Web Token as part of the OAuth authorization response.

That meant we didn’t need to query the database for the user to determine if the user was hosted in that region. We could take some load off of our database that way and save a little bit of time on the overall request. The downside to this approach is we would have to run proxy servers in both the North American and EU regions to route to the correct region if a decoded token showed that the user’s data belonged in the other region, than the one that was receiving the request. How could the edge have helped here?

That would be using the data that’s encoded in the JSON Web Token and read from an Edge Function rather than at an origin server. What this would look like is, with the Edge Function approach, we’re still responsible for routing. It would drastically reduce the latency from the initial request made by the integration because it’s not going to the EU and North America only to bounce somewhere else again.

It would just go to the edge location and then go directly to the EU, or directly to North America. If this request is something that’s made frequently, it’s a bit more general in nature and the response is with caching, we can cache at the edge so that future requests would be even far faster to fulfill still.

Backend and Mobile Development and The Edge

While the problems that we just looked at are a little more web development centric, the edge isn’t just for addressing challenges that web developers face, the edge can bring similar benefits to backend and mobile services the way that it does for web developers. They both have some similar challenges, so we’re going to quickly talk about backend applications first before switching to mobile. Backend applications may be handling direct requests from users.

As an example, let’s say we’re building an API service. This might be exposed to external users to consume, or it might be an internal API that’s maybe powering a user facing website or frontend application. In both cases, maintaining a consistent backwards compatible API is a requirement because you want to still be able to evolve your services without breaking things for your users. This backend application might be making multiple requests to the database as part of fulfilling a request.

Continuing the example of an API, it could be for an e-commerce service. This endpoint is fetching user data from one table in the database and then doing a query for all the orders that were made at a different table. Then they might also be making requests to other backend services. Continuing with our example, based on those recent orders, maybe the next request after that is being made to a ChatGPT powered AI service to ask for items that should be recommended to the user to buy next.

To switch to mobile development, there’s a couple of concerns that I have that I think they have more top of mind that maybe web or backend developers do. One of these is intermittent connectivity. We’ve likely all experienced the dead zone or some loss of signal inconvenient moments. With this in mind, the speed at which requests are fulfilled is crucial to ensuring with higher degrees of certainty that the user gets response to an action that they’ve taken. The other thing is that people don’t update their apps.

I think a number of us out there are famous for when they’re on work calls, they have the red prompt on Chrome to update their browser. We’ve gotten a lot better as an industry by automating this, but there’s still some users out there that they can’t. As an example, my younger sister will hold on to her phone for as long as possible. In general, my family does this. I think my mom made it 8 years on a swivel Sony Ericsson phone, if people remember those, from the early 2000s.

My sister could not update her apps because she had no memory left on her phone. Then the one before that, she was afraid to update her operating system because she had no memory again. She was worried that the additional hardware requirements of the operating system was essentially going to break her phone, and she was going to be without a phone, which she needed for work as most of us do.

In order to account for user behavior like this, developers need to build things such that as much as possible lives on servers so that they can deliver changes without relying on a user to take action. When a hotfix is required for a zero-day security issue, this architecture is particularly helpful. Because these developers are incentivized to deliver code from the server that they control, and building on the previous point around intermittent connectivity, the use of origin servers that are physically further away from the user could mean that the extra time to fulfill the request means that the request gets dropped because the user lost connection.

It’s more than just performance in this context, it’s actually a question of service reliability. The problems facing backend and mobile developers can be summed up as they want to fulfill requests as close as possible to the user, and minimizing latency as much as possible from the potentially multiple requests being made to backend services. They need to maintain backwards compatibility to evolve new services without requiring a user to take action.

How can the edge help mobile and backend developers achieve these objectives? One of the possible solutions is using an API layer in an Edge Function. This pattern isn’t novel, it’s been around for a while. Moving it to the edge from a data center in an availability zone can mean those faster responses without major rearchitecting on the maintaining developer’s part. This is just a very trivial example. I just put it together with this fun little API framework that a coworker of mine built.

If you were to change line 6 and line 10 to return something other than just a raw JSON body, it could be making a request to other services, or databases, or whatever you need to fulfill requests to the homepage or the login service. You can extend this further where rather than returning a raw API response, if it’s a mobile application that’s making this request, you can return view models that are easily mapped into native UI on the mobile client in order to reduce the need to update the mobile application itself even further.

Some things to consider with this approach, though, and this is a common theme throughout this talk so far, and it will continue to be so, where you want to favor caching generalized requests over more personalized ones. Especially because edge locations have smaller cache sizes compared to what’s available on an origin server. If you had the option to choose between REST and GraphQL, REST is a better approach here. Because with how GraphQL queries work, where folks can go really far into the nested data objects that they have, it’s a form of personalization. It means that the odds are higher that you will need to make a request to an origin server.

Then, depending on how many services you will need to make requests to when fulfilling the overall request, it may be more performant to have the Edge Functions placed closer to the backend services. This is a common enough problem that Cloudflare about a month ago released this new feature called Smart Placement, which is currently out in beta. It’s intended to address this problem, where functionality on the edge when it’s by default deployed closest to the client making the request, and the majority of requests are being made to a backend service, there’s a lot of latency incurred from those.

You can see that at a glance with this diagram here that I pulled from their documentation. Cloudflare Workers are Edge Functions, and when it’s by default deployed closest to the client, and let’s say this client is in Sydney, Australia, and they’re making a bunch of requests to a database in the EU, it would be better in this case to have the edge location closer to that database and that backend service.

How Smart Placement works is it behaves like normal. It places the worker initially closer to the user, but then it will analyze the worker requests that are being made to backend services when there’s more than one request round trip being made. Then based on what it’s seeing, it’ll make a best effort to place the worker in an optimal edge location relative to that backend. What you end up getting is something more like this where the edge location is now based in Germany, rather than in Australia.

Some limitations with this tool are that you have to opt in on a per worker basis. It doesn’t work with globally distributed services like CDNs or distributed databases and APIs. Depending on your context, this can actually still give you a bit of a boost in terms of performance, and it’ll help reduce the impact of the latency incurred from the multiple requests to the backend service.

Data On the Edge

For those who have been developing in the web ecosystem for a while, you’re probably familiar with the debates that have been made over the years about how to boost website performance through how a website is architected, and when and where data is fetched. Single page applications, islands architectures, and server-side rendering are part of those discussions.

Regardless of which camp you tend to gravitate towards here, it’s fair to say that having the ability to load data on a server physically closer to the user can be of help here in boosting your performance. Some of the historical challenges of hosting data on the edge can be generally summed up by the limited number of connections that a database has and data inconsistency. How do you ensure that you don’t run out of connections when you could suddenly be dealing with a spike in traffic that results in hundreds of thousands of serverless or Edge Functions trying to access your database at the same time, and it just falls over?

How do you ensure that when data is updated in one region or edge location that the cache values in other areas are invalidated and updated in a timely manner. We’re going to take a look at the limited number of connections problem first. One of the ways to help mitigate the potential for running out of database connections is through the use of a connection pool. For those who aren’t familiar with it, it’s a collection of open connections that are passed from operation to operation as needed. The benefit of doing this is it reduces the cost of having to open and close a brand new connection every time you’re performing an operation on the database.

An example of a tool that leverages this approach in this context of serverless and edge environments is Prisma Data Proxy. They create an external connection pool, and requests for the database need to go through the proxy that’s managing that pool before the request reaches the database. Another tool that uses the connection pooling approach, although they use it alongside an internal network, is PlanetScale.

We’ll get to talking about PlanetScale’s internal network. With respect to the connection pooling part, the use of Vitess, which is an open source database clustering system for MySQL under the hood, and they leverage its connection pooling at a pretty low level within Vitess, specifically the VTTablet level, for those who are interested. By doing this, they can scale the connection pooling with the database cluster. If you’re interested in learning more, the technical nitty-gritty details of this, I highly recommend reading a blog post that they have, where they talk about how they load tested this. They were able to open a million connections against their database without the database breaking a sweat. They could easily have handled far more traffic.

Switching gears to the challenge of data consistency, Cloudflare’s durable objects is one approach that’s taken to ensure that consistent data lives as close as possible to the user. Their approach involves having small logical units of data rather than a large monolithic database, much like serverless functions are to monolithic applications. When one of these is created, Cloudflare automatically determines the data center that the object will live in, which will be the one closest to the user.

That’s great, because as a developer, you don’t need to worry about which region to host the user’s data in for optimal performance. If you needed to for similar reasons that we saw for why Smart Placement is now a feature, these objects can be migrated between locations at a later time quite easily. These objects are globally unique, and they can only see and modify their own data to ensure that strong data consistency.

The side effect of this, though, is that this can lead to a little bit of work on the developer’s part because if you need data from multiple Cloudflare durable objects, you’ll need to write those requests into your web app. You’d be accessing them through Cloudflare’s internal network via Cloudflare Workers which are on the edge. That’ll be a far faster request to fulfill than your standard one that’s done to a third party on the public internet.

Coming back to PlanetScale and their internal network, they also use one to speed up data requests from edge and serverless environments. They describe their network as similar to a CDN, where a client connects to the closest geographic edge in PlanetScale’s network, and then that request is backhauled over long-held connection pools in their internal network to reach the actual destination where the data lives.

We’re going to take a look at fetching and caching data in an Edge Function using PlanetScale in this example. I tested this at Netlify, because that’s where I work. You might see some interesting lines there where that first import statement doesn’t quite look like node, it’s actually using Deno. You could make this the node equivalent by updating that import statement and replacing Deno.env with Process.env.

At the moment, this function will query my database for the top five most purchased items that my pet store still has in stock-in every single time that function is invoked. Given that something like this is likely going to appear on a high traffic web page, maybe like the homepage, caching would be very valuable here. This is that function with caching included. I have to call out that this caching is a little bit experimental on Netlify.

It’s not widely available yet. You can see just by using web standards on line 16 with the cache control header, that we can serve the result of the Edge Function directly from the cache, so that’ll mean faster responses and other requests can use the result as well. The other benefits of this, which is not insignificant is that serving from the cache bypasses the function invocation all together. If this is a high traffic page, that’ll save you a ton of money on the reduced Edge Function invocations.

The Edgiest of The Edge

I want to mention a product that is part of the edge, but not in the way that you think. With the exception of mobile applications, the main assumption that we’ve made in everything that we’ve talked about so far is that there’s always reliable internet access. What if that isn’t the case in a more extreme sense? What if what we’re dealing with isn’t occasional intermittent network access, but potentially non-existent internet. That brings us to talking about AWS Snowball Edge.

This one blew my mind when I first heard about it, because this is an edge offering from AWS as a way of making cloud computing available to places with potentially non-existent internet, or as a way of migrating data to the cloud if you’re limited on bandwidth. This is what that device looks like if you’ve seen a computer in the ’90s. To me, this reminds me of a computer in the ’90s. You order it online, it gets mailed to you, and you can run lambda functions on it when you’ve hooked it up to your servers.

When you’re trying to transfer a significant amount of data, you just load it onto the device, mail it back to AWS, and it gets uploaded to the cloud for you. Some of the locations that are listed as places to use this are ships, windmills, and remote factories, so places that maybe for even security reasons, in the case of ships or remote factories, they may be completely cut off from the internet. Given that there’s still a lot of places in the world that have unreliable or non-existent internet, we may need to consider delivering our products similar to how Snowball Edge delivers edge and compute capabilities to these areas, and be so on the edge that we are literally on our users’ doorsteps.

What Are the Limits of The Edge?

We’ve talked a lot about some of the positive things about the edge. What are the limits of it? Obviously, nothing’s ever perfect. The first thing to call out is the lower CPU time available. This depends on the vendor. It depends on how much money you’re willing to throw at the problem. What this means is it’s a little bit different than wall time. This just means that you can do less operations within that function compared to an origin serverless function.

It should be noted that time spent on network requests don’t count toward this. With that being said, the advantages of using an Edge Function might be lost when a network request is made, if that request is going to a very distant third-party origin server. There’s also limited integration with other cloud services. I’m just going to talk about AWS, because that’s what I’m more familiar with. In their marketing, they say that their serverless lambda functions have really tight integrations with hundreds of their services, but their edge offering only has tight integrations with maybe a few dozen.

If having a tight integration between one cloud provider service and the serverless functions that you’re using is a requirement for you, you may not be able to use the edge in your particular use case. Edge locations also have smaller caches than the origin. You can get around that a little bit by using regional edge caches to give yourself some more wiggle room there.

This might mean that you have to be a little bit more aggressive with what responses you want to cache at the edge versus what you would do at an origin server. It’s also very expensive to run these relative to running functions or serving data on an origin server. If cost is a concern for you, before you go like, “I’m going to put my whole application on the edge. It’s going to be so blazing fast.” Choose what you’re moving to the edge wisely, otherwise, our chief financial officers will be very upset with engineering.

Boosting Performance with The Edge

We’ve covered a lot on what edge capabilities are out there that can help boost performance for our applications. We’ve looked at a simple use case of serving localized content that went from being served by the origin server to being served on the CDN and at the edge using Edge Functions. We’ve also looked at running an API layer at the edge in order to minimize latency for mobile users so that the increased performance also translates to increased service reliability.

This API layer coupled with something like Cloudflare Smart Placement can help us improve performance for backend services by striking a balance between being closer to the client making the request, while not so close that too much latency is incurred from the multiple requests made to backend services.

We’ve looked at various tooling that allows us to access and cache data closer to our users distributed all over the globe, either through something like Cloudflare durable objects that allow us to no longer have to worry about which region to host the data in, or AWS Snowball, where storage and compute capabilities can literally be shipped to our user’s doorstep and operate in areas with completely nonexistent internet. While we’ve also looked at some of the limitations of the edge, we can see some of the use cases where we should feel comfortable handling requests at the edge as the preferred default rather than handling them at a potentially very distant origin server.

What You Can Do Today

Some things that you can do today if you’re hearing about the edge and you’re like, ok, I’m interested in seeing if I can incorporate this into my tech stack. One of the things you can do is take a look at some high traffic functions and see if they can become an Edge Function. Some really good candidates for this are validating user sessions, such as in the example we saw earlier. Maybe setting cookies or custom request headers that you need to do as part of fulfilling an overall request, and see if they can live on the edge as a way of easily boosting your performance with minimal changes needed to your architecture.

Then we’ll also take a look at the groups of users that reside in locations furthest away from your origin server instance, and experiment with handling some of the more popular requests at the edge in order to improve your performance with them. All this is to say I think that we’re starting to enter an edge-first future. This is really exciting because the more requests and data that you can serve closer to your users, the better the experience of your services will be regardless of where that user is located in the world.

See more presentations with transcripts

Google AlloyDB Omni: PostgreSQL-Compatible Database for On-Premises and VMware Cloud Foundation

MMS • Renato Losio

About the Author

Renato Losio

Subscribe for MMS Newsletter

Did you know...

KubeCon NA 2023: Ishan Sharma on Real-Time Generative AI for Gaming Apps Running on Kubernetes

MMS • Srini Penchikala

About the Author

Srini Penchikala

Subscribe for MMS Newsletter

Did you know...

MongoDB, Inc. (NASDAQ:MDB) Given Average Rating of “Moderate Buy” by Brokerages

MMS • RSS

MongoDB Stock Down 0.5 %

Insiders Place Their Bets

Hedge Funds Weigh In On MongoDB

MongoDB Company Profile

Further Reading

Subscribe for MMS Newsletter

Did you know...

Google Announces New DeepMind Model, Lyria, to Generate High-Quality Music

MMS • Sergio De Simone

About the Author

Sergio De Simone

Subscribe for MMS Newsletter

Did you know...

Presentation: Providing a Personalized Experience to Millions of Users @BBC

MMS • Manisha Lopes

Transcript

Background, and Outline

What Is Personalization?

System Integration and Architecture

Steps for Improving System Performance

Tooling

Managing Stakeholder Expectations

Recap

Summary

Subscribe for MMS Newsletter

Did you know...

Article: How to work with Your Auditors to Influence a Better Audit Experience

MMS • Clarissa Lucas

Key Takeaways

Related Sponsored Content

Why Are Audits So Painful?

What Does a Better Audit Experience Look Like?

How to Influence a Better Audit Experience

Conclusion

References

About the Author

Clarissa Lucas

Subscribe for MMS Newsletter

Did you know...

MongoDB (MDB) Suffers a Larger Drop Than the General Market: Key Insights

MMS • RSS

Subscribe for MMS Newsletter

Did you know...

Combining AI with React for a Smarter Frontend – The New Stack

MMS • RSS

A Tech Stack for React AI Apps

How to Approach GPTs

GPTs, Meet React

Subscribe for MMS Newsletter

Did you know...

William Blair Investment Management LLC Lowers Holdings in MongoDB, Inc. (NASDAQ:MDB)

MMS • RSS

Analyst Ratings Changes

MongoDB Stock Performance

Insider Activity at MongoDB

MongoDB Company Profile

Featured Stories

Subscribe for MMS Newsletter

Did you know...

Presentation: Living on the Edge

MMS • Erica Pisani

Transcript

What Is the Edge?

Backend and Mobile Development and The Edge

Data On the Edge

The Edgiest of The Edge