MMS • Daniel Dominguez
Article originally posted on InfoQ. Visit InfoQ
At AWS re:Invent, Amazon Web Services announced Amazon DataZone, a new data management service that makes it faster and easier for customers to catalog, discover, share, and govern data stored across AWS, on-premises, and third-party sources.
With Amazon DataZone, administrators and data stewards who oversee an organization’s data assets can manage and govern access to data using fine-grained controls to ensure it is accessed with the right level of privileges and in the right context.
Amazon DataZone makes it easy for engineers, data scientists, product managers, analysts, and business users to access data throughout an organization so they can discover, use, and collaborate with data to derive insights. All the data in DataZone is governed by access and use policies that the organization can define, and data lineage is also tracked.
By defining their data taxonomy, configuring governance policies, and connecting to a variety of AWS services like Amazon S3 and Amazon Redshift, partner solutions, and on-premises systems, data producers can set up their own business data catalog using the Amazon DataZone web portal. By utilizing machine learning to gather and recommend metadata such as origin and data type for each dataset and by training on a customer’s taxonomy and preferences to improve over time, Amazon DataZone eliminates the labor-intensive tasks associated with maintaining a catalog.
After the catalog is created, data users can search for and find data assets, look up metadata for context, and request access to datasets using the Amazon DataZone web interface. When a data consumer is prepared to begin data analysis, they create an Amazon DataZone Data Project. This shared area in the web portal enables users to access a variety of datasets, share access with coworkers, and work together on analysis.
As a result of the integration between Amazon DataZone and AWS analytics services like Amazon Redshift, Amazon Athena, and Amazon QuickSight, data users can access these services as part of their data project without having to keep track of separate login information. Their data is also automatically accessible in these services.
According to Swami Sivasubramanian, vice president of Databases, Analytics, and Machine Learning at AWS, good governance is the foundation that makes data accessible to the entire organization, but it is difficult to strike the right balance between making data discoverable and maintaining control. Amazon DataZone sets data free across the organization, so every employee can help drive new insights to maximize its value.