- Published on
AWS Batch Cheat Sheet - Complete Batch Computing Guide 2025
- Authors
- Name
- QuizCld
AWS Batch is a fully managed batch computing service that plans, schedules, and runs your containerized batch workloads.
Supported Workloads
AWS Batch supports workloads such as:
- Machine Learning (ML)
- Simulations
- Analytics
Compute Options
Works across the full range of AWS compute options:
- Amazon ECS
- Amazon EKS
- AWS Fargate
- EC2 Spot Instances
- EC2 On-Demand Instances
Key Features
- Package and submit jobs using the Console, CLI, or SDK
- Define dependencies and execution parameters for jobs
- Integrates with popular workflow engines:
- Pegasus WMS
- Luigi
- Nextflow
- Metaflow
- Apache Airflow
- AWS Step Functions
- Automatic provisioning & scaling of compute resources
- Supports ECS, EKS, and AWS Fargate
- Flexible compute choices:
- Use On-Demand or Spot Instances depending on job needs
- Default job queues and compute environments for easy getting started
Core Components
AWS Batch consists of four main components:
- Jobs
- Job Definitions
- Job Queues
- Compute Environment
Jobs
A Job is the basic unit of work in AWS Batch.
Job Types
A job can be:
- A shell script
- A Linux executable
- A Docker container image
Job Execution
Jobs run as containerized applications on:
- AWS Fargate
- Amazon EC2 (in your compute environment)
Job Components
Each job includes:
- A name
- Parameters defined in a job definition (e.g., memory, vCPU, environment variables)
Job Dependencies
Jobs can:
- Depend on other jobs (by name or ID)
- Wait for other jobs to successfully complete
- Wait for specific resources to be available
Job Definitions
A blueprint that defines how AWS Batch jobs are to be run.
Job Definition Specifications
- Required vCPU and memory
- An IAM role for access to other AWS services
- Container properties, including the Docker image and entry point commands
- Environment variables to configure the job
- Mount points for persistent storage (e.g., Amazon EFS)
Key Benefits
- Allows you to override key parameters like resource requirements and environment variables when submitting individual jobs
- A single job definition can be reused across multiple jobs
Job Queues
A job queue is where your AWS Batch jobs wait to be scheduled onto a compute environment.
How Job Queues Work
- You submit jobs to a queue
- AWS Batch manages placing them in an available compute environment
- Each queue can be associated with one or more compute environments
Priority Management
You can assign priority values:
- Between compute environments within the same queue
- Between different job queues
Example Use Cases
- High-priority queue: For urgent or time-sensitive jobs
- Low-priority queue: For flexible, cost-sensitive workloads that can wait for lower-cost compute (e.g., Spot Instances)
Compute Environment
A compute environment is a collection of compute resources (managed or unmanaged) used to run AWS Batch jobs.
Managed Compute Environment
- AWS Batch provisions, scales, and terminates resources automatically
- Compute types you can choose:
- Fargate
- EC2 instances (e.g., c5.2xlarge, m5.10xlarge) or the newest generation types only
- Configurable resource parameters:
- vCPU limits: minimum, desired, maximum
- Spot pricing: set acceptable Spot price as a % of On-Demand price
- VPC subnets for networking locations
Unmanaged Compute Environment
- You manage EC2 resources in an Amazon ECS cluster
- Responsible for provisioning, scaling, and terminating instances manually
Security
IAM Integration
- AWS Batch uses AWS Identity and Access Management (IAM) to control access to jobs, queues, and compute environments
- You can attach IAM roles to jobs to allow them to securely access other AWS resources (e.g., S3, DynamoDB)
IAM Roles in AWS Batch
- Job Role: Assigned in job definitions; grants jobs permission to access services
- Service Role (AWSBatchServiceRole): Allows AWS Batch to interact with other AWS services on your behalf
- Instance Role (ecsInstanceRole): Required when using EC2 resources so instances can communicate with ECS and Batch
VPC Configuration
- Jobs can run in private subnets with no internet access, enhancing network-level security
- Use security groups and network ACLs to control inbound and outbound traffic to EC2 instances or Fargate tasks
Encryption
- Supports encryption of job data at rest using AWS-managed or customer-managed keys in AWS KMS
- Use HTTPS/TLS to encrypt data in transit
Logging & Auditing
- Logs are sent to Amazon CloudWatch Logs for visibility
- Use AWS CloudTrail to audit API activity related to Batch operations
Monitoring
Amazon CloudWatch Metrics
AWS Batch automatically publishes metrics to Amazon CloudWatch, such as:
- JobAttempts: Number of job attempts
- JobsSubmitted: Number of jobs submitted
- JobsRunning: Number of jobs currently running
- JobsSucceeded / JobsFailed: Job completion statistics
You can set CloudWatch alarms on these metrics to trigger notifications or automated actions.
Amazon CloudWatch Logs
- You can configure your jobs to send stdout/stderr logs to CloudWatch Logs for each job container
- Helps in debugging job failures, tracking performance, and auditing output
AWS CloudTrail Integration
- CloudTrail logs API activity related to AWS Batch, including job submission, job definition updates, and compute environment changes
- Useful for auditing and compliance - track who did what, when
Job Status Monitoring
Use the AWS Batch Console, AWS CLI, or SDKs to track job progress and states:
- Job States: SUBMITTED, PENDING, RUNNABLE, STARTING, RUNNING, SUCCEEDED, FAILED
- Supports tagging jobs for easier filtering and tracking
Compute Environment Monitoring
Monitor the underlying EC2 instances or Fargate tasks via:
- EC2 CloudWatch metrics (CPU, memory, disk)
- ECS task metrics (for container-based insights)
Custom Monitoring Dashboards
Build custom CloudWatch dashboards to visualize AWS Batch activity across regions or workloads.
Pricing
Component | Details |
---|---|
AWS Batch Service Fee | Free |
Billed Resources | You only pay for the underlying EC2, Fargate, EBS, S3, etc. resources used |
Compute Options | • EC2 On-Demand: Pay per second • EC2 Spot: Save up to 90% • Fargate: Billed based on vCPU-seconds and GB-seconds |
Spot Instances (Cost-saving) | • Set maximum bid price • Use multiple instance types for better capacity • Use retries to handle interruptions |
Data Transfer | Charged if data is transferred across regions or to the internet |
Storage Charges | Charged for resources like EBS volumes, S3 buckets used during job execution |
Monitoring Costs | • CloudWatch Logs/Metrics: Billed for log storage and retrieval • CloudTrail: Billed for log data |