Documentation is now available for the Fall 2020 Update release!

AWS Glue

Leave Feedback

Introduction

AWS Glue is a fully managed ETL (extract, transform, and load) service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores.

AWS Glue consists of a central metadata repository known as the AWS Glue Data Catalog, an ETL engine that automatically generates Python or Scala code, and a flexible scheduler that handles dependency resolution, job monitoring, and retries. AWS Glue is serverless, so there’s no infrastructure to set up or manage.

AWS Glue is designed to work with semi-structured data. It introduces a component called a dynamic frame, which you can use in your ETL scripts. A dynamic frame is similar to an Apache Spark dataframe, except that each record is self-describing, so no schema is required initially. With dynamic frames, you get schema flexibility and a set of advanced transformations specifically designed for dynamic frames.

You can convert between dynamic frames and Spark dataframes, so that you can take advantage of both AWS Glue and Spark transformations to do the kinds of analysis that you want.

Setup

To set up the OpsRamp AWS integration and discover the AWS service, go to AWS Integration Discovery Profile and select GLUE.
AWS Glue databases, tables, crawlers, jobs, DevEndpoints, and MLTransforms are discovered.

Metrics

OpsRamp MetricMetric Display NameUnitAggregation TypeDescription
aws_glue_glue_jvm_heap_usageglue jvm heap usageNoneAverageNumber of memory bytes used by the JVM heap for the driver, the executor identified by executorId, or ALL executors.
aws_glue_glue_jvm_heap_usedglue jvm heap usedNoneAverageNumber of memory bytes used by the JVM heap for the driver, the executor identified by executorId, or ALL executors.
aws_glue_glue_s3_filesystem_read_bytesglue s3 file system read bytesCountAverageNumber of bytes read from Amazon S3 by the driver, an executor identified by executorId, or ALL executors since the previous report (aggregated by the AWS Glue Metrics Dashboard as the number of bytes read during the previous minute).
aws_glue_glue_s3_filesystem_write_bytesglue s3 filesystem write bytesCountAverageNumber of bytes written to Amazon S3 by the driver, an executor identified by executorId, or ALL executors since the previous report (aggregated by the AWS Glue Metrics Dashboard as the number of bytes written during the previous minute).
aws_glue_glue_system_cpuSystemLoadglue system cpu System LoadNoneAverageThe fraction of CPU system load used (scale: 0-1) by the driver, an executor identified by executorId, or ALL executors.

Event support

CloudTrail event support

  • Supported
  • Configurable in OpsRamp AWS Integration Discovery Profile.

CloudWatch alarm support

  • Supported
  • Configurable in OpsRamp AWS Integration Discovery Profile.

External reference