MMS • Steef-Jan Wiggers
Article originally posted on InfoQ. Visit InfoQ
AWS recently announced that Amazon Elastic MapReduce (EMR) Serverless is generally available (GA). The offering is a serverless deployment option for customers to run big data analytics applications using open-source frameworks like Apache Spark and Hive without configuring, managing, and scaling clusters or servers.
At re:Invent last year, the company introduced three new serverless options for data analytics services – Amazon MSK Serverless, which has been generally available since last month; Amazon RedShift Serverless (still in public preview), and Amazon EMR Serverless, which is GA now.
Amazon EMR offers various deployment options to run applications to fit varied needs, such as EMR clusters on Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Kubernetes Service (Amazon EKS) clusters, AWS Outposts, or EMR Serverless. For instance, the deployment of EMR on Amazon EC2 is suitable for customers who need maximum control and flexibility over how to run their applications. Or deploying EMR on Kubernetes for customers who want to standardize on EKS to manage clusters across applications or use different versions of an open-source framework on the same cluster.
To start an EMR Serverless job, customers select the open-source framework they want to use and then trigger their application to run using either APIs, CLIs, the AWS Management Console, or the Amazon EMR Studio.
With Amazon EMR, the company claims that customers can run analytics workloads at any scale with automatic scaling that resizes resources in seconds to meet changing data volumes and processing requirements. With autoscaling, customers can run the analytics workloads more cost-effectively.
Channy Yun, a principal developer advocate for AWS, wrote in an AWS News blog post on the GA release of Amazon EMR:
During the preview, we heard from customers that EMR Serverless is cost-effective because they do not incur cost from having to overprovision resources to deal with demand spikes.
In addition, Marius Karma, a technology enthusiast, tweeted:
With Amazon EMR Serverless, now GA, you can run & scale Apache Spark & Hive without managing clusters or servers. That’s simplicity at scale.
Yet a respondent on a Reddit thread was less enthusiastic:
Unless this is substantially improved from the beta offering, this thing was nowhere NEAR ready for prime time: No SPOT pricing, no bootstrap scripts, and no VPC access. Not even going to give it a glance before those three are done.
Amazon EMR Serverless is currently available in the North Virginia, Oregon, Ireland, and Tokyo AWS regions. Customers will pay for vCPU, memory, and storage resources consumed by their applications. The pricing details of the offering are available on the pricing page.