Apache Spark is an open-source distributed general-purpose cluster-computing framework. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Prerequisites

  1. Configure the following endpoints to collect the respective metrics:
stats-metrics : http://<ip_addr>:<port>/json/

app-url : http://<ip_addr>:<port>/app/?appId=<app-id>

job-metrics : http://<ip_addr>:<port>/api/v1/applications/<app-id>/jobs

stage-metrics : http://<ip_addr>:<port>/api/v1/applications/<app-id>/stages

storage-metrics : http://<ip_addr>:<port>/api/v1/applications/<app-id>/storage/rdd

executor-metrics : http://<ip_addr>:<port>/api/v1/applications/<app-id>/executors

streaming-metrics : http://<ip_addr>:<port>/api/v1/applications/<app-id>/streaming/statistics
  1. For Virtual Machines, install the Linux Agent.

Configuring the credentials

Configure the credentials in the directory /opt/opsramp/agent/conf/app.d/creds.yaml

spark:
- name: spark
  user: <username>
  pwd: <Password>
  encoding-type: plain
  labels:
    key1: val1
    key2: val2

Configuring the application

Virtual machine

Configure the application in the directory /opt/opsramp/agent/conf/app/discovery/auto-detection.yaml

- name: spark
  instance-checks:
    process-check:
      - spark
    port-check:
      - 8080

Docker environment

Configure the application in the directory /opt/opsramp/agent/conf/app/discovery/auto-container-detection.yaml

- name: spark
  container-checks:
    image-check:
      - spark
    port-check:
      - 8080

Kubernetes environment

Configure the application in config.yaml

- name: spark
  container-checks:
    image-check:
      - spark
    port-check:
      - 8080

The specified port is used to fetch all the URLs for each app.

Validate

Go to Resources under the Infrastructure tab to check if your resources are onboarded and the metrics are collected.

Metrics

OpsRamp MetricMetric Display NameUnitDescription
spark_workersWorkersNumber of workers connected to the master
spark_coresCoresNumber of CPUs available for all workers
spark_cores_usedCores UsedNumber of CPUs used for all applications
spark_applications_activeApplications ActiveNumber of applications waiting or running
spark_applications_completedApplications CompletedNumber of application completed
spark_drivers_activeDrivers ActiveNumber of drivers available
spark_statusStatusAvailable status of spark master. For example, alive
spark_memoryMemorymegabytesCalculates the total memory available on Spark Master
spark_memory_usedMemory UsedmegabytesCalculates the memory used by the applications on Spark Master
spark_job_countJobsNumber of jobs
spark_job_num_tasksTasksNumber of tasks in the application (different instances are shown using AppID_jobID)
spark_job_num_active_tasksActive TasksNumber of active tasks in the application (different instances are shown using AppID_jobID)
spark_job_num_skipped_tasksSkipped TasksNumber of skipped tasks in the application (different instances are shown using AppID_jobID)
spark_job_num_failed_tasksFailed TasksNumber of failed tasks in the application (different instances are shown using AppID_jobID)
spark_job_num_completed_tasksCompleted TasksNumber of completed tasks in the application (different instances are shown using AppID_jobID)
spark_job_num_active_stagesActive StagesNumber of active stages in the application (different instances are shown using AppID_jobID)
spark_job_num_completed_stagesCompleted StagesNumber of completed stages in the application (different instances are shown using AppID_jobID)
spark_job_num_skipped_stagesSkipped StagesNumber of skipped stages in the application (different instances are shown using AppID_jobID)
spark_job_num_failed_stagesFailed StagesNumber of failed stages in the application (different instances are shown using AppID_jobID)
spark_stage_countStage CountNumber of stages (different instances are shown using AppID_stageID)
spark_stage_num_active_tasksStage Num Active TasksNumber of active tasks in the application's stages (different instances are shown using AppID_stageID)
spark_stage_num_complete_tasksStage Num Complete TasksNumber of complete tasks in the application's stages (different instances are shown using AppID_stageID)
spark_stage_num_failed_tasksStage Num Failed TasksNumber of failed tasks in the application's stages (different instances are shown using AppID_stageID)
spark_stage_executor_run_timeStage Executor Run TimeTime spent by the executor in the application's stages (different instances are shown using AppID_stageID)
spark_stage_input_bytesStage Input BytesbytesInput bytes in the application's stages (different instances are shown using AppID_stageID)
spark_stage_input_recordsStage Input RecordsInput records in the application's stages (different instances are shown using AppID_stageID)
spark_stage_output_bytesStage Output BytesbytesOutput bytes in the application's stages (different instances are shown using AppID_stageID)
spark_stage_output_recordsStage Output RecordsOutput records in the application's stages (different instances are shown using AppID_stageID)
spark_stage_shuffle_read_bytesStage Shuffle Read BytesbytesNumber of bytes read during a shuffle in the application's stages (different instances are shown using AppID_stageID)
spark_stage_shuffle_read_recordsStage Shuffle Read RecordsNumber of records read during a shuffle in the application's stages (different instances are shown using AppID_stageID)
spark_stage_shuffle_write_bytesStage Shuffle Write BytesbytesNumber of shuffled bytes in the application's stages (different instances are shown using AppID_stageID)
spark_stage_shuffle_write_recordsStage Shuffle Write RecordsNumber of shuffled records in the application's stages (different instances are shown using AppID_stageID)
spark_stage_memory_bytes_spilledStage Memory Bytes SpilledbytesNumber of bytes spilled to disk in the application's stages (different instances are shown using AppID_stageID)
spark_stage_disk_bytes_spilledStage Disk Bytes SpilledbytesMaximum size on disk of the spilled bytes in the application's stages (different instances are shown using AppID_stageID)
spark_driver_rdd_blocksDriver Rdd BlocksNumber of RDD blocks in the driver
spark_driver_memory_usedDriver Memory UsedAmount of memory used in the driver
spark_driver_disk_usedDriver Disk UsedAmount of disk used in the driver
spark_driver_active_tasksDriver Active TasksNumber of active tasks in the driver
spark_driver_failed_tasksDriver Failed TasksNumber of failed tasks in the driver
spark_driver_completed_tasksDriver Completed TasksNumber of completed tasks in the driver
spark_driver_total_tasksDriver Total TasksNumber of total tasks in the driver
spark_driver_total_durationDriver Total DurationTime spent in the driver
spark_driver_total_input_bytesDriver Total Input BytesbytesNumber of input bytes in the driver
spark_driver_total_shuffle_readDriver Total Shuffle ReadNumber of bytes read during a shuffle in the driver
spark_driver_total_shuffle_writeDriver Total Shuffle WriteNumber of shuffled bytes in the driver
spark_driver_max_memoryDriver Max MemoryMaximum memory used in the driver
spark_executor_countExecutor CountNumber of executors
spark_executor_rdd_blocksExecutor Rdd BlocksNumber of persisted RDD blocks in the application's executors
spark_executor_memory_usedExecutor Memory UsedAmount of memory used for cached RDDs in the application's executors
spark_executor_max_memoryExecutor Max MemoryMaximum memory across all executors working for a particular application
spark_executor_disk_usedExecutor Disk UsedAmount of disk space used by persisted RDDs in the application's executors
spark_executor_active_tasksExecutor Active TasksNumber of active tasks in the application's executors
spark_executor_failed_tasksExecutor Failed TasksNumber of failed tasks in the application's executors
spark_executor_completed_tasksExecutor Completed TasksNumber of completed tasks in the application's executors
spark_executor_total_tasksExecutor Total TasksTotal number of tasks in the application's executors
spark_executor_total_durationExecutor Total DurationTime spent by the application's executors executing tasks
spark_executor_total_input_bytesExecutor Total Input BytesbytesTotal number of input bytes in the application's executors
spark_executor_total_shuffle_readExecutor Total Shuffle ReadTotal number of bytes read during a shuffle in the application's executors
spark_executor_total_shuffle_writeExecutor Total Shuffle WriteTotal number of shuffled bytes in the application's executors
spark_rdd_countRdd CountNumber of RDDs
spark_rdd_num_partitionsRdd Num PartitionsNumber of persisted RDD partitions in the application
spark_rdd_num_cached_partitionsRdd Num Cached PartitionsNumber of in-memory cached RDD partitions in the application
spark_rdd_memory_usedRdd Memory UsedAmount of memory used in the application's persisted RDDs
spark_rdd_disk_usedRdd Disk UsedAmount of disk space used by persisted RDDs in the application
spark_streaming_statistics_avg_input_rateStreaming Avg Input RateAverage streaming input data rate
spark_streaming_statistics_avg_processing_timeStreaming Avg Processing TimeAverage application's streaming batch processing time
spark_streaming_statistics_avg_scheduling_delayStreaming Avg Scheduling DelayAverage application's streaming batch scheduling delay
spark_streaming_statistics_avg_total_delayStreaming Avg Total DelayAverage application's streaming batch total delay
spark_streaming_statistics_batch_durationStreaming Batch DurationApplication's streaming batch duration
spark_streaming_statistics_num_active_batchesStreaming Num Active BatchesNumber of active streaming batches
spark_streaming_statistics_num_active_receiversStreaming Num Active ReceiversNumber of active streaming receivers
spark_streaming_statistics_num_inactive_receiversStreaming Num Inactive ReceiversNumber of inactive streaming receivers
spark_streaming_statistics_num_processed_recordsStreaming Num Processed RecordsNumber of processed streaming records
spark_streaming_statistics_num_received_recordsStreaming Num Received RecordsNumber of received streaming records
spark_streaming_statistics_num_receiversStreaming Num ReceiversNumber of streaming application's receivers
spark_streaming_statistics_num_retained_completed_batchesStreaming Num Retained Completed BatchesNumber of retained completed application's streaming batches
spark_streaming_statistics_num_total_completed_batchesStreaming Num Total Completed BatchesTotal number of completed application's streaming batches