Amazon EMR is a managed cluster platform that simplifies running big data frameworks (such as Apache Hadoop and Apache Spark) on AWS to process and analyze vast amounts of data.

By using these frameworks and related open-source projects (such as Apache Hive and Apache Pig), you can:

  • Process data for analytics purposes and business intelligence workloads.
  • Use Amazon EMR to transform and move large amounts of data into and out of other AWS data stores and databases. For example, Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.

Use the OpsRamp AWS public cloud integration to discover and collect metrics against the AWS service.

Setup

To set up the OpsRamp AWS integration and discover the AWS service, go to AWS Integration Discovery Profile and select EMR.

Metrics

OpsRamp MetricMetric Display NameUnitAggregation Type
aws_elasticmapreduce_IsIdle

Indicates that a cluster is no longer performing work, but is still alive and accruing charges. Set to 1 if no tasks and jobs are running; set to 0 otherwise.
IsIdleCountAverage
aws_elasticmapreduce_ContainerAllocated

Number of resource containers allocated by the ResourceManager.
ContainerAllocatedCountAverage
aws_elasticmapreduce_ContainerReserved

Number of containers reserved.
ContainerReservedCountAverage
aws_elasticmapreduce_ContainerPending

Number of containers in the queue that have not yet been allocated.
ContainerPendingCountAverage
aws_elasticmapreduce_AppsCompleted

Number of applications submitted to YARN (Hadoop generation)) that have completed.
AppsCompletedCountAverage
aws_elasticmapreduce_AppsKilled

Number of applications submitted to YARN (Hadoop generation)) that have been killed.
AppsKilledCountAverage
aws_elasticmapreduce_AppsPending

Number of applications submitted to YARN (Hadoop generation) that are in a pending state.
AppsPendingCountAverage
aws_elasticmapreduce_AppsRunning

Number of applications submitted to YARN (Hadoop generation) that are running.
AppsRunningCountAverage
aws_elasticmapreduce_AppsSubmitted

Number of applications submitted to YARN (Hadoop generation).
AppsSubmittedCountAverage
aws_elasticmapreduce_CapacityRemainingGB

Amount of remaining HDFS disk capacity.
CapacityRemainingGBBytesAverage
aws_elasticmapreduce_CoreNodesRunning

Number of core nodes working. Data points for this metric are reported only when a corresponding instance group exists.
CoreNodesRunningCountAverage
aws_elasticmapreduce_CoreNodesPending

Number of core nodes waiting to be assigned. All of the core nodes requested may not be immediately available; this metric reports the pending requests.
CoreNodesPendingCountAverage
aws_elasticmapreduce_CorruptBlocks

Gives the big picture about what is going on with cluster and can provide insight into what is causing the slow down in processing.
CorruptBlocksCountAverage
aws_elasticmapreduce_HDFSUtilization

Percentage of HDFS storage currently used.
HDFSUtilizationPercentAverage
aws_elasticmapreduce_HDFSBytesRead

Number of bytes read from HDFS.
HDFSBytesReadBytes ReadAverage
aws_elasticmapreduce_HDFSBytesWritten

Number of bytes written to HDFS.
HDFSBytesWrittenBytes WrittenAverage
aws_elasticmapreduce_LiveDataNodes

Percentage of data nodes that are receiving work from Hadoop.
LiveDataNodesPercentAverage
aws_elasticmapreduce_MRTotalNodes

Number of nodes presently available to MapReduce jobs.
MRTotalNodesCountAverage
aws_elasticmapreduce_MRActiveNodes

Number of nodes presently running MapReduce tasks or jobs.
MRActiveNodesCountAverage
aws_elasticmapreduce_MRLostNodes

Number of nodes allocated to MapReduce that have been marked in a LOST state.
MRLostNodesCountAverage
aws_elasticmapreduce_MRUnhealthyNodes

Number of nodes available to MapReduce jobs marked in an UNHEALTHY state.
MRUnhealthyNodesAverage
aws_elasticmapreduce_MRDecommissionedNodes

Number of nodes allocated to MapReduce applications that have been marked in a DECOMMISSIONED state.
MRDecommissionedNodesCountAverage
aws_elasticmapreduce_MRRebootedNodes

Number of nodes available to MapReduce that have been rebooted and marked in a REBOOTED state.
MRRebootedNodesCountAverage
aws_elasticmapreduce_S3BytesWritten

Number of bytes written to Amazon S3.
S3BytesWrittenBytes WrittenAverage
aws_elasticmapreduce_S3BytesRead

Number of bytes read from Amazon S3.
S3BytesReadBytes ReadAverage
aws_elasticmapreduce_MissingBlocks

Number of blocks in which HDFS has no replicas. These might be corrupt blocks.
MissingBlocksCountAverage
aws_elasticmapreduce_TotalLoad

Total number of concurrent data transfers.
TotalLoadCountAverage
aws_elasticmapreduce_MemoryTotalMB

Total amount of memory in the cluster.
MemoryTotalMBBytesAverage
aws_elasticmapreduce_MemoryReservedMB

Amount of memory reserved.
MemoryReservedMBBytesAverage
aws_elasticmapreduce_MemoryAvailableMB

Amount of memory available to be allocated.
MemoryAvailableMBBytesAverage
aws_elasticmapreduce_MemoryAllocatedMB

Amount of memory allocated to the cluster.
MemoryAllocatedMBBytesAverage
aws_elasticmapreduce_PendingDeletionBlocks

Number of blocks marked for deletion.
PendingDeletionBlocksCountAverage
aws_elasticmapreduce_UnderReplicatedBlocks

Number of blocks that need to be replicated one or more times.
UnderReplicatedBlocksCountAverage
aws_elasticmapreduce_dfs_FSNamesystem_PendingReplicationBlocks

Status of block replication: blocks being replicated, age of replication requests, and unsuccessful replication requests.
dfs.FSNamesystem.PendingReplicationBlocksCountAverage
aws_elasticmapreduce_ContainerPendingRatio

Ratio of pending containers to containers allocated (ContainerPendingRatio = ContainerPending / ContainerAllocated). If ContainerAllocated = 0, then ContainerPendingRatio = ContainerPending. The value of ContainerPendingRatio represents a number, not a percentage. This value is useful for scaling cluster resources based on container allocation behavior.
Container Pending RatioCountAverage
aws_elasticmapreduce_AppsFailed

Number of applications submitted to YARN that have failed to complete.
Apps FailedCountAverage
aws_elasticmapreduce_YARNMemoryAvailablePercentage

Percentage of remaining memory available to YARN (YARNMemoryAvailablePercentage = MemoryAvailableMB / MemoryTotalMB). This value is useful for scaling cluster resources based on YARN memory usage.
YARN Memory Available PercentagePercentAverage
cloud.instance.state

n/a
Status/Staten/an/a

Event support

CloudTrail event support

  • Supported
  • Configurable in OpsRamp AWS Integration Discovery Profile.

CloudWatch alarm support

  • Supported
  • Configurable in OpsRamp AWS Integration Discovery Profile.

External reference