MMS • Renato Losio
Article originally posted on InfoQ. Visit InfoQ
Forrest Brazeal, head of content at Google Cloud, recently argued that serverless functions are the cloud’s biggest billing risk for developers as there’s no simple way to protect against recursive calls and they can scale out almost indefinitely on all the cloud providers.
Brazeal highlights articles written by cloud developers that describe mistakes that caused “run-away” serverless functions and huge bills. Among others, Aled Sage, VP engineering at Cloudsoft, reports an example of spiralling Lambda bill on AWS, Tom Wright describes a serverless horror story on Azure, and Sudeep Chauhan, founder of Milkie Way, explains how he burnt 72K USD testing Firebase and Cloud Run on Google Cloud. Brazeal warns:
It can happen so fast. It is the flash flood of cloud disasters. This is not like forgetting about a GPU instance and incurring a few dollars per hour in linearly increasing cost. You can go to bed with a $5 monthly bill and wake up with a $50,000 bill – all before your budget alerts have a chance to fire.
Discussing the specific limitations and protections of Google Cloud, AWS and Azure, the author argues that there is no safe way to protect against the risk as none of these providers has yet mechanisms to fully protect developers. Brazeal adds:
It is straightforward to protect yourself from spending too much on continuously-priced services like VMs (…) but there is just no good way to guarantee you won’t get clobbered by a surprise bill from functions (…)
AWS has a page dedicated to the recursive anti-pattern that causes run-away Lambda functions and acknowledges:
While the potential for infinite loops exists in most programming languages, this anti-pattern has the potential to consume more resources in serverless applications.
The concurrency limit on functions might help but can give a false sense of security to developers: it protects from a recursive fork-style scenario where functions scale out indefinitely but it cannot avoid large bills in a few hours, for example using the same bucket on S3 as source as destination for a function, as Sudhir Jonathan, technical architect at Qube Cinema, reported last year. James Beswick, principal developer advocate at AWS, wrote an article on how to avoid recursive invocation with Amazon S3 and AWS Lambda and explains:
If you trigger a recursive invocation loop accidentally, you can press the “Throttle” button in the Lambda console to scale the function concurrency down to zero and break the recursion cycle.
While there are many possible optimizations to save money on Lambda, as Yan Cui, cloud consultant and AWS Serverless Hero, recently demonstrated, there is no automatic circuit breaker on AWS when things go wrong. Among the possible mitigations that cloud providers could introduce, Brazeal suggests near real-time billing, hard caps on cloud billing and better automated anomaly detection and remediation for recursive workloads.
While the author focuses on the three main providers, Corey Quinn, cloud economist at The Duckbill Group, comments in his newsletter:
Oracle Cloud does in fact have a “we are bloody serious about the free tier and will not let you run up a charge until and unless you affirmatively upgrade.” It’s one of the best things about their platform.
Last year Brazeal, then director of content and community at A Cloud Guru, highlighted the lack of sandbox accounts and hard billing limits on AWS.