Apache Spark is an open-source distributed general-purpose cluster-computing framework. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Prerequisites
- Configure the following endpoints to collect the respective metrics:
stats-metrics : http://<ip_addr>:<port>/json/
app-url : http://<ip_addr>:<port>/app/?appId=<app-id>
job-metrics : http://<ip_addr>:<port>/api/v1/applications/<app-id>/jobs
stage-metrics : http://<ip_addr>:<port>/api/v1/applications/<app-id>/stages
storage-metrics : http://<ip_addr>:<port>/api/v1/applications/<app-id>/storage/rdd
executor-metrics : http://<ip_addr>:<port>/api/v1/applications/<app-id>/executors
streaming-metrics : http://<ip_addr>:<port>/api/v1/applications/<app-id>/streaming/statistics
- For Virtual Machines, install the Linux Agent.
Configuring the credentials
Configure the credentials in the directory /opt/opsramp/agent/conf/app.d/creds.yaml
spark:
- name: spark
user: <username>
pwd: <Password>
encoding-type: plain
labels:
key1: val1
key2: val2
Configuring the application
Virtual machine
Configure the application in the directory /opt/opsramp/agent/conf/app/discovery/auto-detection.yaml
- name: spark
instance-checks:
process-check:
- spark
port-check:
- 8080
Docker environment
Configure the application in the directory /opt/opsramp/agent/conf/app/discovery/auto-container-detection.yaml
- name: spark
container-checks:
image-check:
- spark
port-check:
- 8080
Kubernetes environment
Configure the application in config.yaml
- name: spark
container-checks:
image-check:
- spark
port-check:
- 8080
The specified port is used to fetch all the URLs for each app.
Validate
Go to Resources under the Infrastructure tab to check if your resources are onboarded and the metrics are collected.
Supported metrics
OpsRamp Metric | Metric Display Name | Unit |
---|---|---|
spark_workers Number of workers connected to the master | Workers | |
spark_cores Number of CPUs available for all workers | Cores | |
spark_cores_used Number of CPUs used for all applications | Cores Used | |
spark_applications_active Number of applications waiting or running | Applications Active | |
spark_applications_completed Number of application completed | Applications Completed | |
spark_drivers_active Number of drivers available | Drivers Active | |
spark_status Available status of spark master. For example, alive | Status | |
spark_memory Calculates the total memory available on Spark Master | Memory | megabytes |
spark_memory_used Calculates the memory used by the applications on Spark Master | Memory Used | megabytes |
spark_job_count Number of jobs | Jobs | |
spark_job_num_tasks Number of tasks in the application (different instances are shown using AppID_jobID) | Tasks | |
spark_job_num_active_tasks Number of active tasks in the application (different instances are shown using AppID_jobID) | Active Tasks | |
spark_job_num_skipped_tasks Number of skipped tasks in the application (different instances are shown using AppID_jobID) | Skipped Tasks | |
spark_job_num_failed_tasks Number of failed tasks in the application (different instances are shown using AppID_jobID) | Failed Tasks | |
spark_job_num_completed_tasks Number of completed tasks in the application (different instances are shown using AppID_jobID) | Completed Tasks | |
spark_job_num_active_stages Number of active stages in the application (different instances are shown using AppID_jobID) | Active Stages | |
spark_job_num_completed_stages Number of completed stages in the application (different instances are shown using AppID_jobID) | Completed Stages | |
spark_job_num_skipped_stages Number of skipped stages in the application (different instances are shown using AppID_jobID) | Skipped Stages | |
spark_job_num_failed_stages Number of failed stages in the application (different instances are shown using AppID_jobID) | Failed Stages | |
spark_stage_count Number of stages (different instances are shown using AppID_stageID) | Stage Count | |
spark_stage_num_active_tasks Number of active tasks in the application stages (different instances are shown using AppID_stageID) | Stage Num Active Tasks | |
spark_stage_num_complete_tasks Number of complete tasks in the application stages (different instances are shown using AppID_stageID) | Stage Num Complete Tasks | |
spark_stage_num_failed_tasks Number of failed tasks in the application stages (different instances are shown using AppID_stageID) | Stage Num Failed Tasks | |
spark_stage_executor_run_time Time spent by the executor in the application stages (different instances are shown using AppID_stageID) | Stage Executor Run Time | |
spark_stage_input_bytes Input bytes in the application stages (different instances are shown using AppID_stageID) | Stage Input Bytes | bytes |
spark_stage_input_records Input records in the application stages (different instances are shown using AppID_stageID) | Stage Input Records | |
spark_stage_output_bytes Output bytes in the application stages (different instances are shown using AppID_stageID) | Stage Output Bytes | bytes |
spark_stage_output_records Output records in the application stages (different instances are shown using AppID_stageID) | Stage Output Records | |
spark_stage_shuffle_read_bytes Number of bytes read during a shuffle in the application stages (different instances are shown using AppID_stageID) | Stage Shuffle Read Bytes | bytes |
spark_stage_shuffle_read_records Number of records read during a shuffle in the application stages (different instances are shown using AppID_stageID) | Stage Shuffle Read Records | |
spark_stage_shuffle_write_bytes Number of shuffled bytes in the application stages (different instances are shown using AppID_stageID) | Stage Shuffle Write Bytes | bytes |
spark_stage_shuffle_write_records Number of shuffled records in the application stages (different instances are shown using AppID_stageID) | Stage Shuffle Write Records | |
spark_stage_memory_bytes_spilled Number of bytes spilled to disk in the application stages (different instances are shown using AppID_stageID) | Stage Memory Bytes Spilled | bytes |
spark_stage_disk_bytes_spilled Maximum size on disk of the spilled bytes in the application stages (different instances are shown using AppID_stageID) | Stage Disk Bytes Spilled | bytes |
spark_driver_rdd_blocks Number of RDD blocks in the driver | Driver Rdd Blocks | |
spark_driver_memory_used Amount of memory used in the driver | Driver Memory Used | |
spark_driver_disk_used Amount of disk used in the driver | Driver Disk Used | |
spark_driver_active_tasks Number of active tasks in the driver | Driver Active Tasks | |
spark_driver_failed_tasks Number of failed tasks in the driver | Driver Failed Tasks | |
spark_driver_completed_tasks Number of completed tasks in the driver | Driver Completed Tasks | |
spark_driver_total_tasks Number of total tasks in the driver | Driver Total Tasks | |
spark_driver_total_duration Time spent in the driver | Driver Total Duration | |
spark_driver_total_input_bytes Number of input bytes in the driver | Driver Total Input Bytes | bytes |
spark_driver_total_shuffle_read Number of bytes read during a shuffle in the driver | Driver Total Shuffle Read | |
spark_driver_total_shuffle_write Number of shuffled bytes in the driver | Driver Total Shuffle Write | |
spark_driver_max_memory Maximum memory used in the driver | Driver Max Memory | |
spark_executor_count Number of executors | Executor Count | |
spark_executor_rdd_blocks Number of persisted RDD blocks in the application executors | Executor Rdd Blocks | |
spark_executor_memory_used Amount of memory used for cached RDDs in the application executors | Executor Memory Used | |
spark_executor_max_memory Maximum memory across all executors working for an application | Executor Max Memory | |
spark_executor_disk_used Amount of disk space used by persisted RDDs in the application executors | Executor Disk Used | |
spark_executor_active_tasks Number of active tasks in the application executors | Executor Active Tasks | |
spark_executor_failed_tasks Number of failed tasks in the application executors | Executor Failed Tasks | |
spark_executor_completed_tasks Number of completed tasks in the application executors | Executor Completed Tasks | |
spark_executor_total_tasks Total number of tasks in the application executors | Executor Total Tasks | |
spark_executor_total_duration Time spent by the application executors executing tasks | Executor Total Duration | |
spark_executor_total_input_bytes Total number of input bytes in the application executors | Executor Total Input Bytes | bytes |
spark_executor_total_shuffle_read Total number of bytes read during a shuffle in the application executors | Executor Total Shuffle Read | |
spark_executor_total_shuffle_write Total number of shuffled bytes in the application executors | Executor Total Shuffle Write | |
spark_rdd_count Number of RDDs | Rdd Count | |
spark_rdd_num_partitions Number of persisted RDD partitions in the application | Rdd Num Partitions | |
spark_rdd_num_cached_partitions Number of in-memory cached RDD partitions in the application | Rdd Num Cached Partitions | |
spark_rdd_memory_used Amount of memory used in the application persisted RDDs | Rdd Memory Used | |
spark_rdd_disk_used Amount of disk space used by persisted RDDs in the application | Rdd Disk Used | |
spark_streaming_statistics_avg_input_rate Average streaming input data rate | Streaming Avg Input Rate | |
spark_streaming_statistics_avg_processing_time Average application streaming batch processing time | Streaming Avg Processing Time | |
spark_streaming_statistics_avg_scheduling_delay Average application streaming batch scheduling delay | Streaming Avg Scheduling Delay | |
spark_streaming_statistics_avg_total_delay Average application streaming batch total delay | Streaming Avg Total Delay | |
spark_streaming_statistics_batch_duration Application streaming batch duration | Streaming Batch Duration | |
spark_streaming_statistics_num_active_batches Number of active streaming batches | Streaming Num Active Batches | |
spark_streaming_statistics_num_active_receivers Number of active streaming receivers | Streaming Num Active Receivers | |
spark_streaming_statistics_num_inactive_receivers Number of inactive streaming receivers | Streaming Num Inactive Receivers | |
spark_streaming_statistics_num_processed_records Number of processed streaming records | Streaming Num Processed Records | |
spark_streaming_statistics_num_received_records Number of received streaming records | Streaming Num Received Records | |
spark_streaming_statistics_num_receivers Number of streaming application receivers | Streaming Num Receivers | |
spark_streaming_statistics_num_retained_completed_batches Number of retained completed application streaming batches | Streaming Num Retained Completed Batches | |
spark_streaming_statistics_num_total_completed_batches Total number of completed application streaming batches | Streaming Num Total Completed Batches |