Overview

Amazon SageMaker is a fully managed machine learning service. With Amazon SageMaker, data scientists and developers can quickly and easily build and train machine learning models, and directly deploy them into a production-ready hosted environment.

SageMaker provides:

  • An integrated Jupyter authoring notebook instance for easy access to your data sources for exploration and analysis, so you do not have to manage servers.
  • A common machine learning algorithms that are optimized to run efficiently against extremely large data in a distributed environment.

With native support for bring-your-own-algorithms and frameworks, Amazon SageMaker offers flexible distributed training options that adjust to your specific workflows. Deploy a model into a secure and scalable environment by launching it with a single click from the Amazon SageMaker console. Training and hosting are billed by minutes of usage, with no minimum fees and no upfront commitments.

Amazon SageMaker Ground Truth

High-quality training datasets by using workers including machine learning to create labeled datasets.

See GroundTruth metrics.

Amazon SageMaker Training

An Amazon SageMaker training job is an iterative process that teaches a model to make predictions by presenting examples from a training dataset. Typically, a training algorithm computes several metrics, such as training error and prediction accuracy. These metrics help diagnose whether the model is learning well and will generalize well for making predictions on unseen data. The training algorithm writes the values of these metrics to logs, which Amazon SageMaker monitors and sends to Amazon CloudWatch in real-time.

See Training metrics.

Amazon SageMaker Endpoint

Creates an endpoint using the endpoint configuration specified in the request. Amazon SageMaker uses the endpoint to provision resources and deploy models.

Amazon SageMaker Transform Job

Use batch transform when you need to do the following:

  • Preprocess datasets to remove noise or bias that interferes with training or inference from your dataset.
  • Get inferences from large datasets.
  • Run inference when you do not need a persistent endpoint.
  • Associate input records with inferences to help interpretation results.

External reference

Amazon SageMaker

Setup

To set up the integration:

  1. Select SageMaker GroundTruth in AWS Integration Discovery Profile to discover AWS SageMaker GroundTruth.
  2. Select SageMaker Training in AWS Integration Discovery Profile to discover AWS SageMaker Training Job.
  3. Select SageMaker EndPoint in AWS Integration Discovery Profile to discover AWS SageMaker Endpoint.
  4. Select SageMaker Transform Job in AWS Integration Discovery Profile to discover AWS SageMaker Transform Job.

Event support

CloudTrail event support

  • Supported (Sagemaker GroundTruth, Training, Endpoint, Transform Job)
  • Configurable in OpsRamp AWS Integration Discovery Profile.

CloudWatch alarm support

  • Not Supported

Supported metrics

GroundTruth metrics

OpsRamp MetricMetric Display NameUnitAggregation Type
aws_sagemaker_labelingjobs_ActiveWorkers

Number of workers on a private work team performing a labeling job.
ActiveWorkersCountSum
aws_sagemaker_labelingjobs_JobsSucceeded

Number of labeling jobs that succeeded. To get the total number of labeling jobs that succeeded.
JobsSucceededNoneSum
aws_sagemaker_labelingjobs_DatasetObjectsAutoAnnotated

Number of dataset objects auto-annotated in a labeling job.
DatasetObjectsAutoAnnotatedCountMax
aws_sagemaker_labelingjobs_DatasetObjectsHumanAnnotated

Number of dataset objects annotated by a human in a labeling job.
DatasetObjectsHumanAnnotatedCountMax
aws_sagemaker_labelingjobs_DatasetObjectsLabelingFailed

Number of dataset objects that failed labeling in a labeling job.
DatasetObjectsLabelingFailedCountMax
aws_sagemaker_labelingjobs_TotalDatasetObjectsLabeled

Number of dataset objects labeled successfully in a labeling job.
TotalDatasetObjectsLabeledCountMax
aws_sagemaker_labelingjobs_JobsStopped

Number of labeling jobs that were stopped.
JobsStoppedCountSum

Training metrics

OpsRamp MetricMetric Display NameUnitAggregation Type
aws_sagemaker_trainingjobs_CPUUtilization

Percentage of CPU units used by the containers on an instance.
CPUUtilizationPercentAverage
aws_sagemaker_trainingjobs_MemoryUtilization

Percentage of memory used by the containers on an instance.
MemoryUtilizationPercentAverage
aws_sagemaker_trainingjobs_GPUUtilization

Percentage of GPU units used by the containers on an instance.
GPUUtilizationPercentAverage
aws_sagemaker_trainingjobs_GPUMemoryUtilization

Percentage of GPU memory used by the containers on an instance.
GPUMemoryUtilizationPercentAverage
aws_sagemaker_trainingjobs_DiskUtilization

Percentage of disk space used by the containers on an instance.
DiskUtilizationPercentAverage