MMS • Steef-Jan Wiggers
Article originally posted on InfoQ. Visit InfoQ
AWS recently announced that Amazon RDS for MySQL zero-ETL integration with Amazon Redshift is generally available. This feature enables near real-time analytics and machine learning on transactional data. It allows multiple integrations from a single RDS database and provides data filtering for customized replication.
The GA release of Amazon RDS for MySQL zero-ETL integration with Amazon Redshift follows the earlier releases of zero-ETL integration with Amazon Redshift for Amazon Aurora MySQL-Compatible Edition and preview releases of Aurora PostgreSQL-Compatible Edition, Amazon DynamoDB, and RDS for MySQL. With the GA release, users can expect features like configuring zero-ETL integrations with AWS CloudFormation, configuring multiple integrations from a source database to up to five Amazon Redshift data warehouses, and data filtering.
Matheus Guimaraes, a senior developer advocate at AWS, writes regarding the data filtering:
Most companies, no matter the size, can benefit from adding filtering to their ETL jobs. A typical use case is to reduce data processing and storage costs by selecting only the subset of data needed to replicate from their production databases. Another is to exclude personally identifiable information (PII) from a report’s dataset.
Users can create a zero-ETL integration to replicate data from an RDS database into Amazon Redshift, enabling near real-time analytics, ML, and AI workloads using Amazon Redshift’s built-in capabilities such as machine learning, materialized views, data sharing, federated access to multiple data stores and data lakes, and integrations with Amazon SageMaker, Amazon QuickSight, and other AWS services.
To create a zero-ETL integration by using the AWS Management Console, AWS Command Line Interface (AWS CLI), or an AWS SDK, users specify an RDS database as the source and an Amazon Redshift data warehouse as the target. The integration replicates data from the source database into the target data warehouse.
(Source: AWS Documentation)
In a medium blog post on Zero-ETL, Rajas Walavalkar, a technical architect at Quantiphi Analytics, explains why Zero-ETL Data pipelines can be beneficial to organizations:
- Real-Time Analytics: Businesses rely on real-time insights for timely decisions. Zero ETL enables near real-time analytics by transferring data directly from Aurora MySQL to Redshift, giving organizations a competitive edge.
- Data Freshness: Zero ETL maintains data freshness, which is crucial for accurate insights by ingesting data into Redshift without delay.
- Capturing Data History: Analyzing trends requires maintaining data history for the constant CRUD operations in operational databases.
- Scalability and Flexibility: Zero ETL architectures facilitate seamless scalability, allowing organizations to adapt to changing business needs without traditional ETL constraints.
Lastly, the zero-ETL integration is available for RDS for MySQL versions 8.0.32 and later, Amazon Redshift Serverless and Amazon Redshift RA3 instance types in supported AWS Regions.