Published on

AWS Batch Cheat Sheet - Complete Batch Computing Guide 2025

Authors

AWS Batch is a fully managed batch computing service that plans, schedules, and runs your containerized batch workloads.

Supported Workloads

AWS Batch supports workloads such as:

  • Machine Learning (ML)
  • Simulations
  • Analytics

Compute Options

Works across the full range of AWS compute options:

  • Amazon ECS
  • Amazon EKS
  • AWS Fargate
  • EC2 Spot Instances
  • EC2 On-Demand Instances

Key Features

  • Package and submit jobs using the Console, CLI, or SDK
  • Define dependencies and execution parameters for jobs
  • Integrates with popular workflow engines:
    • Pegasus WMS
    • Luigi
    • Nextflow
    • Metaflow
    • Apache Airflow
    • AWS Step Functions
  • Automatic provisioning & scaling of compute resources
    • Supports ECS, EKS, and AWS Fargate
  • Flexible compute choices:
    • Use On-Demand or Spot Instances depending on job needs
  • Default job queues and compute environments for easy getting started

Core Components

AWS Batch consists of four main components:

  1. Jobs
  2. Job Definitions
  3. Job Queues
  4. Compute Environment

Jobs

A Job is the basic unit of work in AWS Batch.

Job Types

A job can be:

  • A shell script
  • A Linux executable
  • A Docker container image

Job Execution

Jobs run as containerized applications on:

  • AWS Fargate
  • Amazon EC2 (in your compute environment)

Job Components

Each job includes:

  • A name
  • Parameters defined in a job definition (e.g., memory, vCPU, environment variables)

Job Dependencies

Jobs can:

  • Depend on other jobs (by name or ID)
  • Wait for other jobs to successfully complete
  • Wait for specific resources to be available

Job Definitions

A blueprint that defines how AWS Batch jobs are to be run.

Job Definition Specifications

  • Required vCPU and memory
  • An IAM role for access to other AWS services
  • Container properties, including the Docker image and entry point commands
  • Environment variables to configure the job
  • Mount points for persistent storage (e.g., Amazon EFS)

Key Benefits

  • Allows you to override key parameters like resource requirements and environment variables when submitting individual jobs
  • A single job definition can be reused across multiple jobs

Job Queues

A job queue is where your AWS Batch jobs wait to be scheduled onto a compute environment.

How Job Queues Work

  • You submit jobs to a queue
  • AWS Batch manages placing them in an available compute environment
  • Each queue can be associated with one or more compute environments

Priority Management

You can assign priority values:

  • Between compute environments within the same queue
  • Between different job queues

Example Use Cases

  • High-priority queue: For urgent or time-sensitive jobs
  • Low-priority queue: For flexible, cost-sensitive workloads that can wait for lower-cost compute (e.g., Spot Instances)

Compute Environment

A compute environment is a collection of compute resources (managed or unmanaged) used to run AWS Batch jobs.

Managed Compute Environment

  • AWS Batch provisions, scales, and terminates resources automatically
  • Compute types you can choose:
    • Fargate
    • EC2 instances (e.g., c5.2xlarge, m5.10xlarge) or the newest generation types only
  • Configurable resource parameters:
    • vCPU limits: minimum, desired, maximum
    • Spot pricing: set acceptable Spot price as a % of On-Demand price
    • VPC subnets for networking locations

Unmanaged Compute Environment

  • You manage EC2 resources in an Amazon ECS cluster
  • Responsible for provisioning, scaling, and terminating instances manually

Security

IAM Integration

  • AWS Batch uses AWS Identity and Access Management (IAM) to control access to jobs, queues, and compute environments
  • You can attach IAM roles to jobs to allow them to securely access other AWS resources (e.g., S3, DynamoDB)

IAM Roles in AWS Batch

  • Job Role: Assigned in job definitions; grants jobs permission to access services
  • Service Role (AWSBatchServiceRole): Allows AWS Batch to interact with other AWS services on your behalf
  • Instance Role (ecsInstanceRole): Required when using EC2 resources so instances can communicate with ECS and Batch

VPC Configuration

  • Jobs can run in private subnets with no internet access, enhancing network-level security
  • Use security groups and network ACLs to control inbound and outbound traffic to EC2 instances or Fargate tasks

Encryption

  • Supports encryption of job data at rest using AWS-managed or customer-managed keys in AWS KMS
  • Use HTTPS/TLS to encrypt data in transit

Logging & Auditing

  • Logs are sent to Amazon CloudWatch Logs for visibility
  • Use AWS CloudTrail to audit API activity related to Batch operations

Monitoring

Amazon CloudWatch Metrics

AWS Batch automatically publishes metrics to Amazon CloudWatch, such as:

  • JobAttempts: Number of job attempts
  • JobsSubmitted: Number of jobs submitted
  • JobsRunning: Number of jobs currently running
  • JobsSucceeded / JobsFailed: Job completion statistics

You can set CloudWatch alarms on these metrics to trigger notifications or automated actions.

Amazon CloudWatch Logs

  • You can configure your jobs to send stdout/stderr logs to CloudWatch Logs for each job container
  • Helps in debugging job failures, tracking performance, and auditing output

AWS CloudTrail Integration

  • CloudTrail logs API activity related to AWS Batch, including job submission, job definition updates, and compute environment changes
  • Useful for auditing and compliance - track who did what, when

Job Status Monitoring

Use the AWS Batch Console, AWS CLI, or SDKs to track job progress and states:

  • Job States: SUBMITTED, PENDING, RUNNABLE, STARTING, RUNNING, SUCCEEDED, FAILED
  • Supports tagging jobs for easier filtering and tracking

Compute Environment Monitoring

Monitor the underlying EC2 instances or Fargate tasks via:

  • EC2 CloudWatch metrics (CPU, memory, disk)
  • ECS task metrics (for container-based insights)

Custom Monitoring Dashboards

Build custom CloudWatch dashboards to visualize AWS Batch activity across regions or workloads.

Pricing

ComponentDetails
AWS Batch Service FeeFree
Billed ResourcesYou only pay for the underlying EC2, Fargate, EBS, S3, etc. resources used
Compute Options• EC2 On-Demand: Pay per second • EC2 Spot: Save up to 90% • Fargate: Billed based on vCPU-seconds and GB-seconds
Spot Instances (Cost-saving)• Set maximum bid price • Use multiple instance types for better capacity • Use retries to handle interruptions
Data TransferCharged if data is transferred across regions or to the internet
Storage ChargesCharged for resources like EBS volumes, S3 buckets used during job execution
Monitoring Costs• CloudWatch Logs/Metrics: Billed for log storage and retrieval • CloudTrail: Billed for log data

References