Cloud Dataproc is a managed Apache Spark and Apache Hadoop service that lets you take advantage of open source data tools for batch processing, querying, streaming, and machine learning.

Cloud Dataproc automation helps you create clusters quickly, manage them easily, and save money by turning clusters off when you do not need them. With less time and money spent on administration, you can focus on your jobs and your data.

Setup

To set up the OpsRamp Google integration and discover the Google service, go to Google Integration Discovery Profile and select GOOGLE/Dataproc Cluster.

Metrics

OpsRamp MetricMetric Display NameUnitAggregation Type
google_dataproc_cluster_hdfs_datanodes

Number of HDFS DataNodes that are running inside a cluster.
Cluster Hdfs DatanodesCountAverage
google_dataproc_cluster_hdfs_storage_capacity

Capacity of HDFS system running on cluster in GB.
Cluster Hdfs Storage CapacityCountAverage
google_dataproc_cluster_hdfs_storage_utilization

Percentage of HDFS storage currently used.
Cluster Hdfs Storage UtilizationCountAverage
google_dataproc_cluster_hdfs_unhealthy_blocks

Number of unhealthy blocks inside the cluster.
Cluster Hdfs Unhealthy BlocksCountAverage
google_dataproc_cluster_job_completion_time

Amount of time that jobs took to complete from the time the user submits a job to the time Dataproc reports it is completed.
Cluster Job Completion TimeCountAverage
google_dataproc_cluster_job_duration

Amount of time that jobs have spent in a given state.
Cluster Job DurationCountAverage
google_dataproc_cluster_job_failed_count

Number of jobs that have failed on a cluster.
Cluster Job Failed CountCountAverage
google_dataproc_cluster_job_running_count

Number of jobs that are running on a cluster.
Cluster Job Running CountCountAverage
google_dataproc_cluster_job_submitted_count

Number of jobs that have been submitted to a cluster.
Cluster Job Submitted CountCountAverage
google_dataproc_cluster_operation_completion_time

Amount of time that operations took to complete from the time the user submits a operation to the time Dataproc reports it is completed.
Cluster Operation Completion TimeCountAverage
google_dataproc_cluster_operation_duration

Amount of time that operations have spent in a given state.
Cluster Operation DurationCountAverage
google_dataproc_cluster_operation_failed_count

Number of operations that have failed on a cluster.
Cluster Operation Failed CountCountAverage
google_dataproc_cluster_operation_running_count

Number of operations that are running on a cluster.
Cluster Operation Running CountCountAverage
google_dataproc_cluster_operation_submitted_count

Number of operations that have been submitted to a cluster.
Cluster Operation Submitted CountCountAverage
google_dataproc_cluster_yarn_allocated_memory_percentage

Percentage of YARN memory is allocated.
Cluster Yarn Allocated Memory PercentageCountAverage
google_dataproc_cluster_yarn_apps

Number of active YARN applications.
Cluster Yarn AppsCountAverage
google_dataproc_cluster_yarn_containers

Number of YARN containers.
Cluster Yarn ContainersCountAverage
google_dataproc_cluster_yarn_memory_size

YARN memory size in GB.
Cluster Yarn Memory SizeCountAverage
google_dataproc_cluster_yarn_nodemanagers

Number of YARN NodeManagers running inside cluster.
Cluster Yarn NodemanagersCountAverage
google_dataproc_cluster_yarn_pending_memory_size

Current memory request, in GB, that is pending to be fulfilled by the scheduler.
Cluster Yarn Pending Memory SizeCountAverage
google_dataproc_cluster_yarn_virtual_cores

Number of virtual cores in YARN.
Cluster Yarn Virtual CoresCountAverage

Event support

  • Not supported

External reference