Apache Spark is an open-source distributed general-purpose cluster-computing framework. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Prerequisites

  1. Configure the following endpoints to collect the respective metrics:
stats-metrics : http://<ip_addr>:<port>/json/

app-url : http://<ip_addr>:<port>/app/?appId=<app-id>

job-metrics : http://<ip_addr>:<port>/api/v1/applications/<app-id>/jobs

stage-metrics : http://<ip_addr>:<port>/api/v1/applications/<app-id>/stages

storage-metrics : http://<ip_addr>:<port>/api/v1/applications/<app-id>/storage/rdd

executor-metrics : http://<ip_addr>:<port>/api/v1/applications/<app-id>/executors

streaming-metrics : http://<ip_addr>:<port>/api/v1/applications/<app-id>/streaming/statistics
  1. For Virtual Machines, install the Linux Agent.

Configuring the credentials

Configure the credentials in the directory /opt/opsramp/agent/conf/app.d/creds.yaml

spark:
- name: spark
  user: <username>
  pwd: <Password>
  encoding-type: plain
  labels:
    key1: val1
    key2: val2

Configuring the application

Virtual machine

Configure the application in the directory /opt/opsramp/agent/conf/app/discovery/auto-detection.yaml

- name: spark
  instance-checks:
    process-check:
      - spark
    port-check:
      - 8080

Docker environment

Configure the application in the directory /opt/opsramp/agent/conf/app/discovery/auto-container-detection.yaml

- name: spark
  container-checks:
    image-check:
      - spark
    port-check:
      - 8080

Kubernetes environment

Configure the application in config.yaml

- name: spark
  container-checks:
    image-check:
      - spark
    port-check:
      - 8080

The specified port is used to fetch all the URLs for each app.

Validate

Go to Resources under the Infrastructure tab to check if your resources are onboarded and the metrics are collected.

Metrics

OpsRamp MetricMetric Display NameUnit
spark_workers

Number of workers connected to the master
Workers
spark_cores

Number of CPUs available for all workers
Cores
spark_cores_used

Number of CPUs used for all applications
Cores Used
spark_applications_active

Number of applications waiting or running
Applications Active
spark_applications_completed

Number of application completed
Applications Completed
spark_drivers_active

Number of drivers available
Drivers Active
spark_status

Available status of spark master. For example, alive
Status
spark_memory

Calculates the total memory available on Spark Master
Memorymegabytes
spark_memory_used

Calculates the memory used by the applications on Spark Master
Memory Usedmegabytes
spark_job_count

Number of jobs
Jobs
spark_job_num_tasks

Number of tasks in the application (different instances are shown using AppID_jobID)
Tasks
spark_job_num_active_tasks

Number of active tasks in the application (different instances are shown using AppID_jobID)
Active Tasks
spark_job_num_skipped_tasks

Number of skipped tasks in the application (different instances are shown using AppID_jobID)
Skipped Tasks
spark_job_num_failed_tasks

Number of failed tasks in the application (different instances are shown using AppID_jobID)
Failed Tasks
spark_job_num_completed_tasks

Number of completed tasks in the application (different instances are shown using AppID_jobID)
Completed Tasks
spark_job_num_active_stages

Number of active stages in the application (different instances are shown using AppID_jobID)
Active Stages
spark_job_num_completed_stages

Number of completed stages in the application (different instances are shown using AppID_jobID)
Completed Stages
spark_job_num_skipped_stages

Number of skipped stages in the application (different instances are shown using AppID_jobID)
Skipped Stages
spark_job_num_failed_stages

Number of failed stages in the application (different instances are shown using AppID_jobID)
Failed Stages
spark_stage_count

Number of stages (different instances are shown using AppID_stageID)
Stage Count
spark_stage_num_active_tasks

Number of active tasks in the application's stages (different instances are shown using AppID_stageID)
Stage Num Active Tasks
spark_stage_num_complete_tasks

Number of complete tasks in the application's stages (different instances are shown using AppID_stageID)
Stage Num Complete Tasks
spark_stage_num_failed_tasks

Number of failed tasks in the application's stages (different instances are shown using AppID_stageID)
Stage Num Failed Tasks
spark_stage_executor_run_time

Time spent by the executor in the application's stages (different instances are shown using AppID_stageID)
Stage Executor Run Time
spark_stage_input_bytes

Input bytes in the application's stages (different instances are shown using AppID_stageID)
Stage Input Bytesbytes
spark_stage_input_records

Input records in the application's stages (different instances are shown using AppID_stageID)
Stage Input Records
spark_stage_output_bytes

Output bytes in the application's stages (different instances are shown using AppID_stageID)
Stage Output Bytesbytes
spark_stage_output_records

Output records in the application's stages (different instances are shown using AppID_stageID)
Stage Output Records
spark_stage_shuffle_read_bytes

Number of bytes read during a shuffle in the application's stages (different instances are shown using AppID_stageID)
Stage Shuffle Read Bytesbytes
spark_stage_shuffle_read_records

Number of records read during a shuffle in the application's stages (different instances are shown using AppID_stageID)
Stage Shuffle Read Records
spark_stage_shuffle_write_bytes

Number of shuffled bytes in the application's stages (different instances are shown using AppID_stageID)
Stage Shuffle Write Bytesbytes
spark_stage_shuffle_write_records

Number of shuffled records in the application's stages (different instances are shown using AppID_stageID)
Stage Shuffle Write Records
spark_stage_memory_bytes_spilled

Number of bytes spilled to disk in the application's stages (different instances are shown using AppID_stageID)
Stage Memory Bytes Spilledbytes
spark_stage_disk_bytes_spilled

Maximum size on disk of the spilled bytes in the application's stages (different instances are shown using AppID_stageID)
Stage Disk Bytes Spilledbytes
spark_driver_rdd_blocks

Number of RDD blocks in the driver
Driver Rdd Blocks
spark_driver_memory_used

Amount of memory used in the driver
Driver Memory Used
spark_driver_disk_used

Amount of disk used in the driver
Driver Disk Used
spark_driver_active_tasks

Number of active tasks in the driver
Driver Active Tasks
spark_driver_failed_tasks

Number of failed tasks in the driver
Driver Failed Tasks
spark_driver_completed_tasks

Number of completed tasks in the driver
Driver Completed Tasks
spark_driver_total_tasks

Number of total tasks in the driver
Driver Total Tasks
spark_driver_total_duration

Time spent in the driver
Driver Total Duration
spark_driver_total_input_bytes

Number of input bytes in the driver
Driver Total Input Bytesbytes
spark_driver_total_shuffle_read

Number of bytes read during a shuffle in the driver
Driver Total Shuffle Read
spark_driver_total_shuffle_write

Number of shuffled bytes in the driver
Driver Total Shuffle Write
spark_driver_max_memory

Maximum memory used in the driver
Driver Max Memory
spark_executor_count

Number of executors
Executor Count
spark_executor_rdd_blocks

Number of persisted RDD blocks in the application's executors
Executor Rdd Blocks
spark_executor_memory_used

Amount of memory used for cached RDDs in the application's executors
Executor Memory Used
spark_executor_max_memory

Maximum memory across all executors working for a particular application
Executor Max Memory
spark_executor_disk_used

Amount of disk space used by persisted RDDs in the application's executors
Executor Disk Used
spark_executor_active_tasks

Number of active tasks in the application's executors
Executor Active Tasks
spark_executor_failed_tasks

Number of failed tasks in the application's executors
Executor Failed Tasks
spark_executor_completed_tasks

Number of completed tasks in the application's executors
Executor Completed Tasks
spark_executor_total_tasks

Total number of tasks in the application's executors
Executor Total Tasks
spark_executor_total_duration

Time spent by the application's executors executing tasks
Executor Total Duration
spark_executor_total_input_bytes

Total number of input bytes in the application's executors
Executor Total Input Bytesbytes
spark_executor_total_shuffle_read

Total number of bytes read during a shuffle in the application's executors
Executor Total Shuffle Read
spark_executor_total_shuffle_write

Total number of shuffled bytes in the application's executors
Executor Total Shuffle Write
spark_rdd_count

Number of RDDs
Rdd Count
spark_rdd_num_partitions

Number of persisted RDD partitions in the application
Rdd Num Partitions
spark_rdd_num_cached_partitions

Number of in-memory cached RDD partitions in the application
Rdd Num Cached Partitions
spark_rdd_memory_used

Amount of memory used in the application's persisted RDDs
Rdd Memory Used
spark_rdd_disk_used

Amount of disk space used by persisted RDDs in the application
Rdd Disk Used
spark_streaming_statistics_avg_input_rate

Average streaming input data rate
Streaming Avg Input Rate
spark_streaming_statistics_avg_processing_time

Average application's streaming batch processing time
Streaming Avg Processing Time
spark_streaming_statistics_avg_scheduling_delay

Average application's streaming batch scheduling delay
Streaming Avg Scheduling Delay
spark_streaming_statistics_avg_total_delay

Average application's streaming batch total delay
Streaming Avg Total Delay
spark_streaming_statistics_batch_duration

Application's streaming batch duration
Streaming Batch Duration
spark_streaming_statistics_num_active_batches

Number of active streaming batches
Streaming Num Active Batches
spark_streaming_statistics_num_active_receivers

Number of active streaming receivers
Streaming Num Active Receivers
spark_streaming_statistics_num_inactive_receivers

Number of inactive streaming receivers
Streaming Num Inactive Receivers
spark_streaming_statistics_num_processed_records

Number of processed streaming records
Streaming Num Processed Records
spark_streaming_statistics_num_received_records

Number of received streaming records
Streaming Num Received Records
spark_streaming_statistics_num_receivers

Number of streaming application's receivers
Streaming Num Receivers
spark_streaming_statistics_num_retained_completed_batches

Number of retained completed application's streaming batches
Streaming Num Retained Completed Batches
spark_streaming_statistics_num_total_completed_batches

Total number of completed application's streaming batches
Streaming Num Total Completed Batches