AWS Announces Amazon OpenSearch Ingestion for Streamlined Data Ingestion

MMS Founder
MMS Steef-Jan Wiggers

Article originally posted on InfoQ. Visit InfoQ

AWS recently announced Amazon OpenSearch Ingestion, a capability of Amazon OpenSearch Service that provides a serverless, auto-scaled, managed data collector that receives, transforms, and delivers data to Amazon OpenSearch Service domains or Amazon OpenSearch Serverless collections.

Muthu Pitchaimani, a Search Specialist with Amazon OpenSearch Service, explains in an AWS Big Data blog post:

OpenSearch Ingestion is powered by Data Prepper, an open-source, streaming ETL (extract, transform, and load) solution that’s part of the OpenSearch project. When you use OpenSearch Ingestion, you don’t need to maintain self-managed data pipelines to ingest logs, traces, metrics, and other data with OpenSearch Service. Amazon OpenSearch Ingestion responds to changing volumes of data, automatically scaling your ingest pipeline.

Source: https://aws.amazon.com/opensearch-service/features/ingestion/

The company claims that OpenSearch Ingestion can be an alternative for managing Logstash or other streaming data pipelines since it removes the complexities of managing a multi-node cluster for data ingestion like choosing the suitable instance types, applying security patches, and adding or removing nodes to optimize for data volume fluctuations.

Besides scaling and not having to manage the infrastructure for data ingestion, the service is also cost-effective; users only pay for the amount of data ingested and stored. In addition, the service provides robust security features, including data encryption at rest and in transit, enabling users to maintain the security of their data. And finally, Amazon OpenSearch Ingestion can easily integrate with other AWS services, such as Amazon S3 and AWS Lambda.

Developers can create a pipeline in the AWS Management Console and define their source, processors, and destination cluster or collection. In addition, they can also start from a blueprint for the most common ingestion use cases.

Source: https://aws.amazon.com/blogs/big-data/top-strategies-for-high-volume-tracing-with-amazon-opensearch-ingestion/

During creating the pipeline, a developer can specify the capacity value for Ingestion- OpenSearch Compute Units (OCU) and edit the hosts, aws.sts_role_arn, and region fields of the OpenSearch Service sink. A sample trace pipeline can look like this:

version: "2"

entry-pipeline:
  source:
    otel_trace_source:
      path: "/${pipelineName}/v1/traces"
  processor:
    - trace_peer_forwarder:
  sink:
    - pipeline:
        name: "span-pipeline"
    - pipeline:
        name: "service-map-pipeline"

span-pipeline:
  source:
    pipeline:
      name: "entry-pipeline"
  processor:
    - otel_trace_raw:
  sink:
    - opensearch:
        hosts: [ "https://search-mydomain-1a2a3a4a5a6a7a8a9a0a9a8a7a.us-east-1.es.amazonaws.com" ]
        aws:
          sts_role_arn: "arn:aws:iam::123456789012:role/Example-Role"
          region: "us-east-1"
        index_type: "trace-analytics-raw"

service-map-pipeline:
  source:
    pipeline:
      name: "entry-pipeline"
  processor:
    - service_map_stateful:
  sink:
    - opensearch:
        hosts: [ "https://search-mydomain-1a2a3a4a5a6a7a8a9a0a9a8a7a.us-east-1.es.amazonaws.com" ]
        aws:
          sts_role_arn: "arn:aws:iam::123456789012:role/Example-Role"
          region: "us-east-1"
        index_type: "trace-analytics-service-map"

Note that there are several competing services in the market that offer similar functionality to Amazon OpenSearch Ingestion such as Google Cloud Dataflow, Microsoft Azure Stream Analytics, Confluent, and Elastic Cloud.

Amazon OpenSearch Ingestion is currently available in several AWS regions globally, and the pricing details can be found on the Amazon OpenSearch pricing page (Ingestion section). In addition, developers can find more guidance through the getting started tutorials.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.