MMS • Yan Cui
Article originally posted on InfoQ. Visit InfoQ
Transcript
Cui: I’m here to share some of the patterns I’ve noticed, things that people struggle with when it comes to developing serverless applications, and some of the things you can do to make that development experience a lot easier. My name is Yan. I’ve been working with AWS for a very long time, since 2010. I went through the whole EC2 to containers to serverless journey over the years, and nowadays, I’m pretty much all fully focused on serverless. Nowadays, I spend maybe half of my time working with Lumigo as a developer advocate. Lumigo are, in my mind, the best observability platform for serverless, and they support containers as well. I haven’t really done much of containers work recently. I also do a lot of consulting work, as well as independent consultant, working with companies and helping them upskill on serverless, and also finding solutions for problems they have.
Testing
As a consultant, I see a lot of problems here running both for my students, but also for my clients, and I identify maybe three main areas that you need to have in order to have a smooth software development process when it comes to building serverless architectures. You probably have encountered some problems yourself as well in terms of, how do you have that fast feedback loop when testing your serverless applications. How do you efficiently and easily deploy your application? How do you manage your application environment across your AWS real estate? I think the problem by far, the most important one, and the one that most people struggle with, is, how do you get a good testing workflow?
Specifically, how do you achieve a fast feedback loop of making changes, testing them, without having to wait for deployment every single time you make a small code change. How do you make sure that everything you’re doing is well tested, so that you can find bugs before your customers do. This is a really common myth that still persists today, that there’s just no local development experience when it comes to serverless technologies. There’s quite a lot of different ways you can get a good and fast feedback loop with local development when you’re working with serverless technologies. I’m going to be telling you five of them.
Option 1: Hexagonal Architecture
Option one you have is using hexagonal architectures, which is a software pattern for creating loosely coupled components that can easily slot into different execution environments using abstraction layers called the port and adapters. It’s also why they often called ports and adapters architecture as well. It’s often drawn in a diagram like this, where you have your core business domain in the middle right there. You’re encapsulating to different modules and objects, and it’s hosted inside your application layer. When your application layer needs to accept requests from callers and from clients, and when it needs to call out to databases and other things to read data or to write data, they expose the ports.
Then, in order to connect your application layer to the outside world, say, a lambda function or a container, you create adapters so that, for example, you can allow different clients to call into the application layer by creating an adapter for one of the clients, which may be a lambda function. In which case your adapter needs to adapt lambda’s execution invocation signature of an event and a context into whatever domain object that your domain layer requires, or rather, your application layer requires, in order to invoke some business logic that’s been captured inside the core domain modules.
You got a lambda function execution environment figured out, working with this adapter, and you can also allow other clients, maybe a Fargate container, to run your application by creating another adapter that adapts whatever web framework you want to run in a container to your application layer, to convert Express.js request objects into those same domain objects that your application layer requires. This approach gives you really nice portabilities in terms of what execution environment you want to run in. It’s also very useful when it comes to situations where you’re just not sure if your workload is going to run in lambda functions or in containers, or maybe later on, you need to change your mind, because your architecture is going to have to evolve with your context.
Today, maybe you’re building something that maybe five people are going to use, but maybe in a year’s time, it got really popular, and now suddenly you have really high throughput. Your requirements are figured out so you no longer need to really rapidly add capabilities, and instead, you need to focus on efficiency and cost. You move your workload from lambda functions into containers. Similar to what the Prime Video team did at Amazon and wrote a blog post about. As to context changes, this architecture pattern allows you to have that flexibility to make changes without having to rewrite large parts of your application.
What about when it comes to your core domain that needs to read data from other databases or writing data to other places? That’s where you have also other ports for those. You create adapters that adapt those data fetch and writes to DynamoDB. To support local development, you can also have adapters that talk to mocks instead, so that you are now able to write your application code and test it locally without having to talk to any third-party services. You can do all of this. You can test your code. You can put breakpoints and steps through it. This also then allows you to then have a really nice local development experience.
If you change your mind later, that DynamoDB was fine, but our access pattern has now got more complex, and to support the same access patterns, we just have to get ourselves tangoing in the notes, trying to work out how to make it work with DynamoDB, it doesn’t make sense anymore, let’s just go to relational database. You can just change the adapter, and the application will still continue to work. It’s just that that’s the one part of your application you have to change. A hexagonal architecture is a really good way to structure and to modularize your application so that you have a lot of portability, but also you get a really good local development experience as well.
It’s something that you should consider especially when you have a more complex business domain or application you’re working with. Then the nice thing about it, besides portability and local development experience, is that it’s pretty universally applicable to any environment in terms of programming language, in terms of development framework you’re using. Doesn’t matter if you’re using CDK, serverless framework, SAM, SST, or whatever latest flavor of the day, they can always work, because ultimately, it’s about how you structure, organize, and modularize your code.
There’s a problem, because when we talk about serverless, we’re not just talking about lambda functions. Of course, lambda is what we consider serverless as a service, but serverless as a whole is much bigger than that. When I talk about serverless, I’m thinking about any technology that fits the number of criterias, for oneness, we don’t have to think about managing patching, configuring those servers that runs our application, and it also has autoscaling built in. Importantly, you need to be able to scale to zero. Ideally, you have usage-based pricing so you only pay for your application when someone does something, rather than you have to pay for them by the hour. This usage-based pricing is also going to be very important when we later on talk about, how can you use ephemeral environments, which is one of the most impactful or important practices that has evolved with serverless.
Of course, not every single service that we maybe consider serverless fits neatly and ticks every single one of these boxes, which is why many people actually consider the definition of serverless more as a spectrum, where on the one end you’ve got things like SNS, lambda, things that are definitely serverless. Then on the other end you’ve got things like EC2, which are definitely not serverless. Somewhere in the middle, you’ve got things maybe like Fargate, which allows you to run containers, but without having to think about or worry about managing the underlying servers that runs your containers. You got things like Kinesis, which has no management, but doesn’t quite scale to zero and still have uptime-based pricing as well. This will be somewhere in the middle of that spectrum.
In the early days of serverless, oftentimes you talk about how you should use lambda functions to transform data, not to transport data. That is still very much true today, that if you got business logic, you need to implement, put them in lambda functions, absolutely. If all you’re doing is shuffling data from one place to another, for example, from a database to respond to an API call, and you don’t add any value with any custom code, then all you’re doing with lambda function is to make a call to some service using the AWS SDK. Then you’re not adding any value with that lambda function there. Instead, it’s better to have the service to do the work for you instead. If you’re building a REST API and you’re running in the API gateway, you can have the API gateway talk to Cognito directly to do user authentication and authorization, rather than you having to write some code yourself and calling Cognito.
Similarly, again, if you’re just building a simple cloud endpoint, and all it’s doing is taking whatever’s in DynamoDB and returning as it is without any business logic in the middle, then you can have API gateway talk to DynamoDB directly without you putting a lambda function in there, which introduces additional moving parts, things that you need to manage, maintain over time, and also pay for it. Every component that goes onto the architecture has a cost. You should always be thinking about, does it make sense for me to add this other moving part? What value am I getting by having that thing in my architecture?
Keep going, if you’re making a change to your data in DynamoDB, and you want to capture those data change event and you want to make them available on the Event Bus so the other services can subscribe to them. Then, instead of having a lambda function just, again, shuffling that data from one DynamoDB stream to an Event Bus, you can nowadays use EventBridge Pipes. You can use lambda functions to do data transformation, but you don’t have to do that data shuffling yourself. EventBridge Pipes can also be used to call other third-party APIs like Stripe or other internal services, other APIs you have as well.
Again, there’s lots of ways to build serverless architectures without actually having to write custom code, ship them into lambda functions. That creates a problem for our hexagonal architecture, because there’s no code, nothing for us to test, nothing for us to put into port and adapters. There’s a lot of things that you want to be able to test with hexagonal architectures. Then the downside with using hexagonal architectures is that there is a bit of upfront work you have to do in terms of designing those abstraction layers, creating adapters and ports, even if you may not end up needing to use them later. There’s a bit of upfront work to do, which is going to make it easier for you to do, porting your application to other environments later. It does come with some cost.
Option 2: Local Simulation
What if we are going to use a lot of direct service integrations, how can we still get that local development experience? Another approach you can think about is using local simulation, by simulating those services locally. Probably the most used, widely adopted tool for this is LocalStack. I started using them about four or five years ago, when it was the open source before version 1, and it was really unstable at the time. I’ve pretty much stayed away from them ever since. In the last maybe 12 months or 18 months, they’ve come a really long way since becoming a proper official product, a commercial product. I had this chat with Waldemar Hummer, who’s the CTO of LocalStack, recently, about what’s happening in v3, and it’s now looking like a really good product and much more stable from what I can see.
Taking our example from earlier, we don’t have any lambda function in this architecture, but you can use LocalStack to simulate the entire application. With v3, they have now support for EventBridge Pipes as well. You’re able to take your application, deploy it against LocalStack, and it should basically run the entire application end-to-end inside LocalStack. Of course, with v3, they’re creating some additional features that add value to things that will be difficult to do otherwise, even when you run your application in the real AWS environment. When you have direct integrations like this, one thing that often trips people up, and myself included, is the permissions.
For example, if I have this thing set up and then I somehow misconfigure the permission between EventBridge Pipes and the EventBridge Bus, then those problems are really difficult to debug. One of the things that LocalStack v3 does now is you can enforce IAM permission checking so that when you make an API call against your local endpoint, that triggers events into EventBridge Pipes, and to EventBridge, and there’s a permission problem, you see that in your LocalStack logs, “There’s an IAM permission error.” With that, you can straight away identify, I’ve got a problem with my IAM setup for this part of my application. Whereas in the real thing it’s much more difficult for you to identify those problems.
Of course, if I do have a lambda function I want to test as well, LocalStack can also simulate lambda functions running locally on your machine. I’m not sure how good the ability to put breakpoints in your code is. I think they have that support. I remember seeing a demo of it. The nice thing about local simulation is that it’s much broader in terms of what you can do, and the things that you can cover. It’s not just about lambda functions. The downside is that with any simulation it’s never going to be 100% the exact simulation of the real thing. You’re always going to run the risk of running to some service that’s not properly supported, or some of the API for a service you’re using is not fully implemented, or just bugs, or worse, some subtle behavior differences that gives you false positives or false negatives when you’re running your test against the local simulator, but in the real thing, suddenly it breaks.
The problem is that, with these kinds of things, oftentimes you just need to run into one problem for a lot of your strategy to fall apart. Having said that, I think LocalStack is a really good product, you should definitely go check it out.
Option 3: Lambdalith
If you want to again, look at something else, then there’s also the option of writing your lambda functions as lambdaliths. A lambdalith is basically when you take an existing web application framework and you just run it inside a lambda function, and then have something that adapts lambda’s execution invocation signature event and context into whatever your web framework requires. There’s lots of tools that supports that out of the box nowadays as well. AWS has the AWS Lambda Web Adapter that you can bundle with your lambda function. I think it’s available as a lambda layer, which can then do the translation for you before it invokes your code. There’s also other frameworks that you can use to develop the application from the ground up using this approach, such as Serverless Express. There’s bref for PHP.
There’s, I think, Serverless Express mostly for JavaScript, bref is for PHP. Then you’ve got Zappa, which is for Python, which allows your thing to run the Flask applications inside a lambda function, and it does that converting. The nice thing about this approach is that you’re using familiar web frameworks already, so developers know what they’re doing. They know how to test it. They know how to run Express.js app on the local machine, and can test it. Also, it gives you that portability again, because it’s just an Express.js application. If tomorrow you want to move your workload from lambda to containers, you can always just do a minimum amount of work to then change how your Express.js app is called to handle those requests.
On the flip side, again, it’s just about lambda functions, just about your code. You can’t test things that you’re not writing custom code to do. Instead, you’re using AWS services to do the actual work for you. Now you can’t test those as part of the application. Also, one thing to consider when it comes to lambdaliths, is that because you’re running a full web application framework, sometimes those can be quite large, and so they can add a lot of bloat to your deployment package for your lambda function. The size of your deployment package is going to have an impact on the cold start performance, which for applications that experience a lot of cold start, this can be something that can really hurt your user experience.
Also, an old example I’ve talked about so far with a lambdalith, is just about web application, web APIs, and most of the framework tailored for that particular workload as well. This approach doesn’t work quite as out of the box with workloads that are not an API. With a lot of lambda functions that are used for data processing or doing event driven architectures, there’s no building framework so that you do lambdaliths for those. Also, from the security and operational point of view as well, that instead of having one lambda function for every single endpoint, and so every time, you can have even more fine-grained access control, when you have just one lambda function for the entire API, and you do all the routing internally inside your code, it means that you have less fine-grained access control for individual endpoints, because it’s one lambda function, one IAM role. It’s less fine-grained in those terms.
Also, in terms of monitoring and alerting as well. If I got different lambda functions for different endpoints, when there’s an error, it triggers an alert. I can straight away, from looking at the error, know which function that is and therefore which endpoint is having a problem. Rather than just having one function, one alert, one set of metrics, so that when there’s an alert in my whatever, notifications channel, I can’t tell if it’s the whole system is broken, is a big thing, or if it’s just one endpoint is failing. You have less granularity in terms of understanding the telemetries that you get from your system as well.
Option 4: Deployment Frameworks
Then there’s also deployment frameworks that you can use that’s got built-in support for some local development. AWS SAM has got that with sam local invoke, allows you to invoke your function locally. It does also sam sync, which allows you to basically sync up code changes in your lambda functions and updates your local changes to your modules, and updates the lambda function directly every time you make a change, so that it makes the test loop a bit quicker. Serverless frameworks have got the serverless invoke local as well. SST has got a sst dev command which allows you to live debug a lambda function again.
All of these tools that makes it easier for you to write some changes and then that gets updated in AWS, or you run your function locally. It’s like a lightweight simulator, just specific for your lambda functions. The nice thing about this approach is that, it’s just all done by the framework. You don’t have to write your code in a specific way. This all comes out of the box for you. The downside is that it’s also very specific to the framework that you’re using and potentially also to the language you’re using as well. SST is mostly targeting TypeScript. I think SAM, you’ve got local, only supports JavaScript, Python, and maybe something else. Same with serverless framework support as well. It’s not universal to different languages.
Also, again, it’s just looking at your lambda functions mostly, and so you can’t use it to test the things that you’re not writing custom code for. You should use lambda to transform data, not to transport data. When you’re running something like sam invoke local to test your code locally, that’s all running good when you’re developing and working on some changes, so that you can add breakpoints even to your code. It’s great for exploring your changes, what you’re doing, not so good for automated testing, so that you know you’re not introducing some regression. You’ve got lots of different functions, lots of different use cases. You don’t want to be manually running every single input every single time you make a small change. It’s just not feasible. Even if you use these tools to help you develop new things or making changes and having a feedback loop for them, you still need to have some suite of automated tests that allows you to catch any regressions and problems that you haven’t identified earlier.
Option 5: Remocal Testing
If you do have the right test anyway, my favorite approach for writing tests for lambda functions is what I call remocal testing. Basically, you combine running code, executing your code locally against remote services rather than mocks, therefore remocal. When you think about local testing, you think about executing your code locally against the mocks. Because you’re running code locally, you can put breakpoints, which allows you to step through the code to debug more complicated problems. You can’t just rely on checking your logs. You should definitely have a good logs and observability setup for metrics and everything else, and whatever it is you need to help you debug problems in production. It’s also helpful to be able to step through the code, especially if your application is fairly heavy on the lambda functions and custom code.
Importantly, you can test your changes without having to wait for a full deployment cycle every single time. That gives you a faster feedback loop. Problem with using mocks is there’s a subtle difference between testing against the real thing and asking, is my application actually working? Versus running tests against mocks, which is asking, is my application behaving the way I expect it to, given that whatever it is I’m calling or talking to, giving me this particular response. Because that given is based on your expectations and assumptions about how that other thing works. Whereas testing against the real services, you’re testing against how the real thing, it’s a reality.
The problem is that, oftentimes, we get our expectations or assumptions wrong. I can’t tell you how many times I’ve had code or looked at code that the auto test passed, calling DynamoDB, we use a mock for that. We write a test, everything passed. Then we run in the real world, and it fails straight away because we have a typo in our DynamoDB query syntax, which is a string, so our mocks doesn’t know that. We’re not really checking that. Our assumption is that our query is right, or we’re making a request based on some documentations which end up to be wrong, or something.
Those assumptions, you’re not able to test your assumptions with your mocks. Which is why using mocks and the local testing this way is prone to giving you false positives. Also, you’re not testing a lot of other things as part of your application, things like IAM permissions. Again, I can’t tell you how many times my test will run, running the real thing, realize, I’m missing IAM permissions, and my real thing fails.
Then, ultimately, your application is more than just your code. Your customer is not going to care if something breaks, and it’s because, it’s not my code, it’s some configuration of code that’s gone wrong. Customer doesn’t care. Your job is to make sure the application works, not just your code. When it comes to remote testing, you’re testing against a real thing. You’re testing your application in the cloud where they’re going to be executed. It’s a real thing. You’re using it just like your customer would. You’re going to get much realistic test results. You can derive higher confidence from the results of those tests than you can with local tests.
Because you’re executing your application as it is, you also want to cover everything that’s along the code path, in terms of IAM permissions, in terms of your calls from lambda functions to DynamoDB and whatnot. The problem is that, if we have to do a deployment to update your application every single time you make a small change, then that’s going to be really slow feedback. It’s going to be really painful for people to develop serverless applications. Now we’ve got two things that’s got different tradeoffs, and if we combine them and put them in the melting pot, and local testing and remote testing, we get the remocal testing so that you execute your code locally, but you’re talking to the real AWS services as much as possible, at least for the happy path. Because you’re testing your code locally, you can put breakpoints to debug through them, and you can change your code without having to do a full deployment every single time.
Because it’s calling the real thing, any mistakes in your request to DynamoDB, you’ll be able to figure them out pretty quickly, without having to wait for you to do the real thing in AWS. This kind of thinking is useful, but they’re still only just looking at your lambda functions and whatever services you’re integrating with. It’s not looking at anything that comes upstream of your function. The API gateway that’s calling your endpoint, that’s calling your function, or the EventBridge Bus that’s triggering your function, because all of those things, you can have mistakes in those. You can have a missed bug in your EventBridge pattern, so an event gets sent in, it doesn’t trigger your function because you’ve got a typo in your pattern there. Or maybe some permission related errors that you have in the API gateway, or some bug in your VTL code, or JavaScript, the resolver code in AppSync, and so on.
This approach is great in that it’s universally applicable to different languages and frameworks. I’ve personally used this approach for projects in JavaScript, TypeScript, Python, and Java as well. It’s, again, code level patterns that is really easy to apply to different contexts, but you still can’t test direct integrations. Even if you’re writing your remocal test for lambda functions, you still need to have probably a complete suite of the tests that also executes your application end-to-end as well. If I was to build something with API gateway, and look at all the different things API gateway can do for me, including calling lambda functions, which is just one part of it, you can also do things like validate my request or transform my response, or call other AWS services directly.
Also, you can have different authentication and authorization authorizer you can use out of the box. Anything that I’m using the AWS service is offering, I want to have a test that covers them. If I’ve got the lambda functions that’s got some domain logic, I can write unit tests to test them in isolation. That’s where I will use mocks or structure it in such a way that they only work with domain objects, so there’s no external dependencies. That’s where things like some lightweight patterns from hexagonal architectures can be really useful here. For the IAM permissions of my function, those will be covered by my end-to-end tests. If I just want to focus on testing my functions integration with other services, that’s where I will focus more on writing remocal tests.
Anything that I’m asking API gateway to do for me, so configuring serverless to do something like transforming responses or validating a request based on a schema, then I will cover those with my end-to-end tests. Again, if I’m using API gateway to authenticate the request with Cognito, then I’ll make sure those are covered in my end-to-end test as well. Interesting use case for lambda authorizer, because that is still a lambda function, so I can still test them in isolation using unit tests. If my lambda authorizer is there, so that I can integrate with Auth0 or some third-party identity provider, then I can also write my remocal test that calls Auth0 with some token to validate it, and all of that as well. Different things that I’m using, different features I’m using, can be covered by different kinds of tests.
Demo
To give you a sense of how they can all fit together, I prepared a short demo, so I can show you what that may look like in practice. I deployed this earlier to a new stage. It’s not dev stage. You can see that stage name is qcon. This is the approach, we’ve got some lambda functions. I’ve got a couple endpoints to get a list of restaurants, to create new restaurants with post, and to search for restaurants. This project is using the serverless framework. You can see, I’ve got my function definition here. This is the Add function endpoint that’s protected by AWS IAM, and this is the path and the post. Then I’ve got the direct integration to DynamoDB to do a get from a table, because I’m not really doing much with the response, I’m just taking it as it is from DynamoDB and returning it.
We’ve got some VTL response template to transform the response, including returning a 404 if the restaurant is not found. For the Add Restaurant function, I’ve got my table down here as well, my DynamoDB table and Cognito and whatnot. I also want to make sure that when someone calls my post endpoint to create a new restaurant, they are providing a schema for my function. This Add Restaurant Schema that just says that you have to provide a name, which is a string. Now we’ve got a mix of different cases here. We’ve got lambda functions that I want to test. We’ve got things that are only implemented by API gateways, it can only be tested with end-to-end test. For the schema checking, we’ve got direct integration as well. This can only be tested, at least in my setup, using end-to-end tests. I’ve prepared some test scripts. There are some test scripts which are run by integration of my remocal test as well as my end-to-end tests.
After I deploy this, I can show you my Add Restaurant test case here, which says, we generate some random restaurant name. Say, when we invoke the Add Restaurant endpoint or handler, we expect some response to come back as 200, and this new restaurant should match whatever data that we’re sending here. After the handler runs, there should be a restaurant by that ID in the database. Notice that this test is written in such a way that it is very high level. It doesn’t pertain to invoking the code locally. What I’ve actually done is I’m using a small library that allows me to tag the test as different test groups. If you look at this function here, this one step, it basically toggles between invoking your function locally, using this viaHandler helper, or calling an HTTP endpoint that’s been deployed. I can run this test case as both remocal tests, but once I deploy my endpoint, I can also run this test again as an end-to-end test, and the customer helps us to sign the HTTP request and all of that.
In this case, I can actually use the JavaScript terminal. I can put a breakpoint in my code. For the Add Restaurant function, I can test:int. This is going to run my test case. I can put a breakpoint in my function code, so I install. If there’s any problems, I can quickly just debug through it and figure out where the problem is. I think that was a timeout because it took too long to respond. The default timer on the JS is 5 seconds. If I want to say, I made a mistake, instead of returning the new restaurant data, I precipitate a change and I can run the test again. Now the test actually failed. I forgot to raise the timeout. Now it’s going to say the test is going to fail.
Actually, let’s run this here, less noise. Run this test. This test is going to fail because of that. Ok, now I figured out the problem, or I’m doing TDD, I start with a failing test, and then I implement the actual logic, which is what I should have done. I can now make the code do what it’s supposed to do, run the test, and then once the test has passed, then I’m able to then promote my changes. I’m going to deploy that to AWS, and then run the end-to-end test version of these tests instead, by just, in this case, setting an environment variable that runs my test run. This way of writing tests means that I have less tests I need to maintain over time, especially when I’m writing tests at a fairly high level.
That’s the example of what I could do with remocal testing, and also combine that with other tests that executes my application end-to-end, to test things that are outside of my code. With this line of testing, because you are having your function code talking to the real thing, it does mean that before you can do this, you do have to do one deployment, even if after you’ve deployed your DynamoDB tables, you can add new functions, talk to them, test them, before you even deploy your function to AWS, and your API endpoint into AWS, you do have to do that initial deployment.
The question becomes, what happens if you’ve got four people all working on the same system, the same API, who gets to deploy their half-baked changes to dev so that they can do their code changes? That’s where the use of temporary environments, or ephemeral environments, really comes in and be really helpful for that. I’m jumping ahead a little bit here. Before we get to that, I want to just again reiterate that the myth that there’s no local development experience with serverless is just no longer true. There are so many different ways to do that now. These are just five ideas I’ve come across oftentimes, and there’s different ways you can combine them as well. For example, you can write remocal tests with end-to-end tests against the local simulators. You can write the same end-to-end test, but instead of having to wait for deployment to AWS, you can deploy it to LocalStack and then run tests against that instead.
Deployment
Next, we want to talk about deployment. How can you make sure that your deployment is nice and simple and smooth, so that you don’t get tangled up as part of that, which I’ve seen a lot of clients and students get tripped up into making their deployment process much more complicated than it needs to be. I really stress this that keep your deployment as simple as it can be, but no more simpler. You don’t want to be oversimplified, which causes other problems elsewhere. Part of the problem here is actually that the lambda is just no longer a simple thing anymore. Back in 2014 maybe in 2016, ’17, lambda was a fairly simple thing.
Nowadays there are just so many more additional features that many people don’t need, things like, you got lambda layers for sharing code. You can package your application, your lambda functions as a container image. It’s not the same as running a container, but just using container image as a packaging format. You can ship your custom runtime. You can use the provisioned currency to reduce the likelihood you’re going to see cold starts, and use that with aliases and whatnot. All of these things are useful for different situations.
Just because they’re there, you don’t have to use them, and you absolutely don’t have to use all of them. In fact, I always go as far to say that a lot of these features are not something that you would need to use for 90% of the use cases, but for specific problems that you run into, they are more like medications. They’re medicines, they’re not supplements. They’re the things that you use when you have a problem, not something that you just take every day because it makes you feel good.
Again, for lambda layers, I’ve got a pet peeve for lambda layers that a lot of people get into problems because they’re using lambda layers to share code between different functions, instead of sharing their code through package managers like npm, or crate, or whatever. With lambda layers, you got a number of problems. Number one is that there’s no support for semantic versioning. The versions go from 1, 2, 3, 4, 5 onwards. There’s no semantic versioning to communicate if it is a breaking change, if it is a bug fix, if it’s a patch, if it is just me adding new things. Because it exists outside of your language ecosystem, security tools that scans your code and dependencies doesn’t know about them and can’t scan them properly.
You also limit it to only five lambda layers per function as well. There are so many things that are distributed as lambda layers nowadays that you can really quickly hit these five lambda layers limit per function. Even when you’re just putting things into lambda layers, they still count towards your 250 Meg size limit for your application, for a ZIP file once it’s been unzipped. It doesn’t help you work around this limit either. They make it more difficult for you to test your application, because part of your application, part of your code now exists outside of your language ecosystem, exists outside these other things. You require specific tooling support from SAM or SST or something else to then make sure that that code also exists locally when you’re running your tests locally as well. They also were invented for supporting Python.
It practically doesn’t work for compiled languages like Java or .NET. I have worked with a client that uses lambda layers to share their code before, and the actual process of updating your lambda layers and publishing them and making it available to other code you have, is actually more work than just using npm to begin with. There’s also no tree shaking, because this only applies to JavaScript. If part of your dependency only exists as a separate thing in the lambda layer as opposed to part of your language ecosystem that you’re bundling, then you also can’t tree shake them properly either. A lot of reasons not to use lambda layers. You definitely shouldn’t use lambda layers to share code, again, since too many people get tripped up by this.
If you’re building a JavaScript application, and you’ve got different functions in the same project, and you need to share some business logic between them, just put your shared code into some other folder, and let the deployment framework handle the referencing and bundling them for you. A serverless framework and SAM, by default, will just take a zip of your whole root directory for your project, so they’ll be included at runtime, so it’s accessible at runtime anyway.
Or you can use bundlers like esbuild so that all of that shared code will be bundled and tree shaked, and all of that as well. If you share code between different projects, then just do what you have been doing before and publish that shared code to npm so that your functions that need that shared code can just grab it from npm instead of going through lambda layers instead. That’s just about the lambda layers, which is something that I definitely recommend that you don’t use for sharing code. It’s useful as a way to distribute your code once it’s been packaged and all of that, but we can talk about that separately afterwards.
I mentioned earlier, you can package your function as a container image. I still prefer to use zip files because it’s easier. Also, with zip files, you’re going to be using the managed runtimes. With container image, you have to ship the runtime yourself, which, if you look at the different responsibilities you have versus AWS has, if you’ve shipped using container image, and you have to ship the runtime yourself, that means you’re on the hook for that. Lambda and other providers can also give you the container image versions of the runtimes that you can use, but it’s up to you to then update that, make sure that you get to always use the latest version and so on.
Whereas if you just use the zip file and use the managed runtimes, you can be sure that the runtime is constantly being updated behind the scenes without you having to take some action. Two things we want to take away about the deployment, which is, don’t use lambda layers to share code, and prefer to use zip files with managed runtimes. That’s not to say that never use container images for your lambda function. With container images, you can have a function that’s up to 10 gig in the deployment artifact. There are many use cases where you do need that. As your deployment package gets bigger, there’s also some code style performance improvements with container images because of some of the optimization that they’ve done for container images, specifically for the lambda platform.
Environments
Last thing I want to talk about is just about the environment, starting with your AWS account. As a minimum, you should have one account per main stage, so dev, test, staging, production should all be separate accounts, so that you have better isolation in terms of a security breach. Someone gets into your dev account, can’t access production data. Similarly, you can have a better way to monitor the cost for different environments and so on. You’ve got different accounts for different main stages, and then you use AWS organizations to help you manage the overall AWS environment. For larger organizations with lots of different teams, different workloads, it’s also better to even go a step further to say, have one account per team per stage.
Then if you’ve got the applications, or part of your applications that are more business critical than others, you may also even go further and say, put those services into their own set of accounts. Or if you’ve got services that’s got outsized throughput requirements that may impact other services, again, put those into separate accounts so that they are insulated from other things in your ecosystem. On top of that, that’s where you can use the ephemeral environment, or temporary environment that are brought up on-demand and then tear it down when you don’t need them anymore. The way it works can be very simple. Say, I walk into the office, I have my cup of tea, maybe a biscuit, and then I start to work on a feature. I get a ticket from Jira, and I create a new temporary environment using the serverless framework. That is just a case of saying serverless deploy.
Then say the stage name is going to be dev-my-feature, assuming that that’s the feature name. Once I’ve created my temporary environment for the service I’m working on, I can start to iterate on my code changes, run remocal tests against that. Then when I’m ready, I can create my commit and PR, and that’s going to run my changes in the CI/CD pipeline. When my feature is done, I can just delete that temporary environment I just created at the start, so this environment can be brought up and torn down with just one command with the serverless framework. You can do the same thing with CDK SAM as well. Again, this practice is not framework or language specific. The nice thing about doing this is that everyone is insulated from each other, so you can have multiple people working on the same API at the same time without stepping on each other’s toes. You can insulate each other.
Everyone’s got their own isolated environment to work in. Also, you avoid polluting your shared environment with lots of test data, especially when you’re working on a new thing, it’s not fully formed yet, so you’ve got lots of test data that that you don’t want to pollute your staging environments or whatever with. The nice thing about having the usage-based pricing we mentioned earlier, one of the key definitions for what is serverless is that it doesn’t matter how many of these environments you have, you only pay for usage. You can have 10 of them. You don’t do anything, you don’t pay for them, so there’s no cost overhead for having all of these environments in your account. If you do have to use the services like RDS and services that are charged by uptime, then there are some special things you have to do to handle that.
Another place we can use ephemeral environments is in the CI/CD pipeline, so you commit your code and now it runs in CI. CI can also just create a temporary environment. Run the test against that environment, and then tear it down afterwards. It’s going to make the pipeline slightly slower, but at the same time, you also avoid polluting your main stages with those test data. When I say environment, it’s not a one-to-one mapping to an AWS account. Yes, you should have multiple accounts, and so your main environment should be dev, test, staging, production, and so on. Each one of them have got their own account. Where your developers work is going to be in the dev account.
Inside that account you can run multiple temporary environments on demand. I can have one environment for myself and then someone else working on a feature may create new environments. What the environment consists of may depend on what tools you’re working with. It may be encapsulated. It may be represented as one CloudFormation stack or one CDK app with multiple stacks, or maybe a combination of a stack or CDK app with some other things that exist in the environment, like SSM parameters and references to other things, and API keys, and whatnot, secrets, and all that.
The question that people often ask me is that, if you’re running multiple environments in the same account, how can you avoid clashes of names and resources so that you get deployment errors? A couple of things we want to do. One is, don’t explicitly name resources unless you really have to. Some more services don’t force you to name the resources yourself. That way, CloudFormation is going to tag some random string at the end, so you know they’re not going to clash on names. If you do have to name the resources yourself, such as the EventBridge Bus, then make sure that you include the environment name, like Yan whatever environment, Dev Yan, as part of the resource name as well.
Again, that helps you avoid clashing on resource names. Using a temporary environment works really well when you’re working with remocal testing, because, again, you need to have some resource in AWS account for you to run your code against. This works really well when you are finishing a feature. You can just delete things, again, no polluting your environment.
Conclusion
There are three things to think about when it comes to creating a workflow that works well with a serverless development. Make sure that you’ve got a good, fast feedback loop for running tests. Make sure you’ve got a smooth deployment process. Try to manage your environment and complement that with using temporary environments. Here’s five different ways you can get good local development experience when it comes to serverless development.
See more presentations with transcripts