Docker Best Practices: A Comprehensive Technical Guide

date
Nov 28, 2025
slug
docker-best-practices
status
Published
tags
Docker Compose
Docker
Linux
Ubuntu
DevOps
DevSecOps
EC2
GCP
Kuber
summary
This guide provides an in-depth, technically rigorous exploration of Docker best practices for developers with solid technical backgrounds. It covers everything from installation on Ubuntu 24.04 LTS through production deployment strategies on cloud platforms, Kubernetes, and bare metal servers
type
Post
Welcome back to the roadmap again.
In our last post, we mapped out the grand journey of becoming a Cloud Engineer. We talked about Linux, Networking, and dipped our toes into the container waters.
Today, we are diving into the deep end.
Docker is deceptively simple. You type docker run, magic happens, and you feel like a wizard. But there is a massive difference between "I got the container running" and "This container is production-ready, secure, and won't eat my cloud bill for breakfast."

The "Docker Run" Rabbit Hole

We have all been there. You start with a simple need: run a database.
You: docker run postgresDocker: Runs.You: "Wait, I need a password." You: docker run -e POSTGRES_PASSWORD=secret postgresYou: "I need it to persist data." You: docker run -e POSTGRES_PASSWORD=secret -v ./data:/var/lib/postgresql/data postgresYou: "I need it on a specific network, exposed on port 5432, and named 'my-db'..."
Suddenly, your terminal looks like this:
If you are pasting this into a sticky note on your desktop, stop. You have entered the "Docker Run Rabbit Hole."
The solution? Infrastructure as Code. In Docker land, that starts with mastering the Dockerfile.
We are going to move from being Container Cowboys (yee-haw, sudo docker run --privileged!) to Docker Craftsmen.
Grab your coffee. Let’s optimize some layers.
From Installation to Production-Ready Deployments
This guide provides an in-depth, technically rigorous exploration of Docker best practices for developers with solid technical backgrounds. It covers everything from installation on Ubuntu 24.04 LTS through production deployment strategies on cloud platforms, Kubernetes, and bare metal servers. This guide assumes zero prior Docker knowledge while building expertise to production-level mastery.

1. Introduction

Docker is a containerization platform that packages applications with their dependencies into isolated, portable containers. Unlike virtual machines that virtualize hardware, containers virtualize the operating system, sharing the host kernel while maintaining process isolation. This makes containers lightweight, fast to start, and resource-efficient.
Why Docker matters for production deployments:
  • Consistency: The same container runs identically across development, staging, and production environments, eliminating "works on my machine" issues.
  • Isolation: Applications run in isolated environments with defined resource limits and security boundaries.
  • Efficiency: Containers start in milliseconds and use fewer resources than VMs because they share the host OS kernel.
  • Scalability: Container orchestration platforms can scale applications horizontally with ease.
  • DevOps Integration: Docker integrates seamlessly with CI/CD pipelines for automated testing and deployment.
Container vs Virtual Machine: A virtual machine includes a full operating system with its own kernel, while a container shares the host's kernel and only packages the application and its dependencies. This architectural difference means containers are 10-100x smaller and start 10-100x faster than VMs.
notion image

2. Installation

2.1 Installing Docker Engine on Ubuntu 24.04 LTS

Docker Engine is the core runtime that builds and runs containers. On Ubuntu 24.04, you'll install Docker from the official Docker repository to ensure you get the latest stable version.
Prerequisites verification:
Before installing Docker, verify your system meets the requirements:
Understanding technical terms:
  • Kernel: The core of the operating system that manages hardware and system resources
  • 64-bit system: Computer architecture that can address more than 4GB of RAM and process data in 64-bit chunks
  • Repository: A storage location from which software packages are retrieved and installed
Step 1: Update system packages
Update your package index to ensure you're installing the latest versions:
What this does: apt update refreshes the list of available packages from configured repositories, while apt upgrade installs newer versions of installed packages.
Step 2: Install required dependencies
Install prerequisite packages that Docker needs:
What each package does:
  • apt-transport-https: Allows apt to retrieve packages over HTTPS (secure HTTP)
  • ca-certificates: Contains trusted Certificate Authority certificates for verifying SSL connections
  • curl: Command-line tool for transferring data using URLs
  • software-properties-common: Provides scripts for managing software repositories
  • lsb-release: Provides Linux Standard Base information about your distribution
Step 3: Add Docker's GPG key
Add Docker's official GPG key to verify package authenticity:
Technical explanation:
  • GPG (GNU Privacy Guard): Encryption software used to verify that packages haven't been tampered with
  • Dearmor: Converts ASCII-armored GPG key to binary format that apt can use
  • chmod a+r: Makes the key file readable by all users (required for apt to access it)
Step 4: Add Docker repository
For Ubuntu 24.04 (codename "noble"), Docker packages are available. Add the repository:
What this command does:
  • dpkg --print-architecture: Detects your CPU architecture (amd64, arm64, etc.)
  • lsb_release -cs: Gets your Ubuntu codename (noble for 24.04)
  • tee: Writes the repository configuration to a file while showing no output (> /dev/null)
Note: If Docker doesn't yet support the "noble" codename, use "jammy" (Ubuntu 22.04) as a workaround:
Step 5: Install Docker Engine
Install Docker Engine, CLI, and required plugins:
Package breakdown:
  • docker-ce: Docker Community Edition engine (the core daemon that runs containers)
  • docker-ce-cli: Command-line interface for interacting with Docker
  • containerd.io: Container runtime that manages the complete container lifecycle
  • docker-buildx-plugin: Extended build capabilities with BuildKit for advanced features
  • docker-compose-plugin: Tool for defining and running multi-container applications
Technical terms:
  • Daemon: A background process that runs continuously and handles requests
  • Runtime: Software that executes and manages running containers
  • BuildKit: Docker's next-generation build system with improved performance and caching
Step 6: Enable and start Docker service
Configure Docker to start automatically on boot:
What these commands do:
  • enable: Creates symbolic links so Docker starts automatically at boot
  • start: Starts the Docker daemon immediately
  • status: Shows whether Docker is running and displays recent log entries
Press Ctrl+C to exit the status view.

2.2 Installing Docker Compose v2

Docker Compose is a tool for defining and running multi-container applications using YAML configuration files. Compose v2 was rewritten in Go and is now a Docker CLI plugin (accessed via docker compose rather than the old docker-compose command).
Installation via apt (recommended)
Docker Compose v2 is included when you install docker-compose-plugin:
Manual installation (if needed)
To install a specific version or the latest release manually:
System-wide installation (for all users):
Technical note: Compose v2 uses a space (docker compose) instead of a hyphen (docker-compose). The old syntax is deprecated but still works if you install the legacy version.

2.3 Verifying Installation

Verify Docker is working correctly:
What the test container does:
  1. Docker searches for the hello-world image locally
  1. Since it's not found, Docker pulls it from Docker Hub (the default registry)
  1. Docker creates a container from the image
  1. The container runs, prints a success message, and exits
Expected output:
Verify system information:
This displays comprehensive information including:
  • Server version
  • Storage driver (typically overlay2 on Ubuntu)
  • Number of containers and images
  • Docker root directory (/var/lib/docker)
  • Logging driver (default is json-file)
  • Operating system details
  • CPU and memory information

2.4 Post-Installation Setup (Non-Root User)

By default, Docker requires sudo for all commands because the Docker daemon runs as root and owns the Unix socket /var/run/docker.sock. For convenience and security, add your user to the docker group.
Add user to docker group:
Technical explanation:
  • usermod: Command to modify user account properties
  • aG: Append to group (without removing from other groups)
  • $USER: Environment variable containing your username
  • Unix socket: Inter-process communication mechanism (file-based)
Verify non-root access:
Security consideration: Users in the docker group have effective root access to the host system because they can mount host directories and run privileged containers. Only add trusted users to this group.

3. Low-Level Docker Understanding

Understanding Docker CLI flags deeply is crucial for production deployments. This section explains important flags with practical examples and the reasoning behind their use.

3.1 Docker Run Flags

The docker run command creates and starts containers. Its general syntax is:
Basic execution flags:
  • d, --detach: Run container in background (detached mode)
When to use: Production services, long-running processes, anything not requiring interactive terminal.
  • it: Interactive terminal
When to use: Debugging, running shell commands, development workflows.
Technical explanation:
  • STDIN: Standard input stream (keyboard input)
  • TTY: TeleTypewriter, a terminal interface that allows line editing and signal handling
  • Pseudo-TTY: Software emulation of a terminal
  • -rm: Automatically remove container when it exits
When to use: One-off tasks, CI/CD jobs, testing. Prevents accumulation of stopped containers.
  • -name: Assign custom name to container
When to use: Production deployments, containers you reference frequently. Without --name, Docker assigns random names like zealous_darwin.
Environment and configuration flags:
  • e, --env: Set environment variables
  • -env-file: Load variables from file
Security warning: Environment variables are visible in docker inspect and process listings. Never store secrets in environment variables in production. Use Docker secrets or external secret managers instead.
  • w, --workdir: Set working directory inside container
  • -hostname: Set container hostname
  • u, --user: Run as specific user
When to use: Security best practice. Never run production containers as root unless absolutely necessary.
Resource flags (covered in detail in section 3.5):
  • -memory, -m: Memory limit
  • -cpus: CPU limit
Exit behavior flags:
  • -restart: Restart policy
Policy explanations:
  • no: Never restart automatically
  • always: Always restart, even after daemon restart
  • unless-stopped: Restart unless manually stopped (recommended for production)
  • on-failure:N: Restart only if container exits with non-zero code, max N times
When to use: Production deployments require restart policies to recover from crashes. Use unless-stopped for most services.

3.2 Docker Build Flags

The docker build command creates images from Dockerfiles.
Basic build flags:
  • t, --tag: Name and tag image
Tag naming convention: [registry/][namespace/]name[:tag][@digest]
  • f, --file: Specify Dockerfile location
. (build context): Directory containing files needed for build
Technical explanation: The build context is sent to the Docker daemon. All COPY and ADD commands in the Dockerfile are relative to this context. Large contexts slow builds.
Build argument flags:
  • -build-arg: Pass build-time variables
  • -target: Build specific stage in multi-stage Dockerfile
Cache management flags:
  • -no-cache: Build without using cache
When to use: When cache is causing issues, testing clean builds, or deploying security updates that must propagate through all layers.
  • -cache-from: Use external cache source
BuildKit cache flags (requires DOCKER_BUILDKIT=1):
Cache modes:
  • mode=min: Export only final stage layers (default)
  • mode=max: Export all layers including intermediate stages
When to use mode=max: Multi-stage builds with expensive intermediate stages (installing dependencies, compiling code).
  • -no-cache-filter: Ignore cache for specific stages
Platform flags:
  • -platform: Build for specific platform
Technical explanation: Docker uses QEMU to emulate different architectures when building cross-platform images.

3.3 Networking Flags

Docker networking enables container communication.
Port publishing flags:
  • p, --publish: Publish container port to host
  • P, --publish-all: Publish all exposed ports to random host ports
Network mode flags:
  • -network: Connect to specific network
Network modes explained:
  1. bridge (default): Containers get private IP addresses and can communicate through Docker's virtual network. Port mapping required for external access.
  1. host: Container uses host's network stack directly. No port mapping needed. Container's port 80 is accessible at host's port 80.
When to use host mode: High-performance networking applications, applications requiring specific network interfaces, when port mapping overhead is unacceptable. Security warning: Host mode removes network isolation.
  1. none: No networking. Container is completely isolated.
  1. Custom networks: Bridge networks you create. Provides automatic DNS resolution between containers.
DNS and hostname flags:
  • -dns: Set custom DNS server
  • -add-host: Add entry to /etc/hosts
  • -link (deprecated): Link to another container

3.4 Volume and Storage Flags

Volumes persist data beyond container lifecycle.
Volume types:
  1. Named volumes: Managed by Docker, stored in /var/lib/docker/volumes/
  1. Bind mounts: Mount host directory into container
  1. tmpfs mounts: Store in host memory (Linux only)
  • v, --volume: Mount volume (older syntax)
  • -mount: Mount volume (preferred syntax)
Why --mount is preferred: More explicit, supports all options, better error messages.
tmpfs mount flags:
  • -tmpfs: Create tmpfs mount
tmpfs options:
  • size: Maximum size (default: 50% of host RAM)
  • mode: File permissions in octal (default: 1777)
When to use tmpfs: Storing sensitive temporary data (passwords, session tokens), temporary caches, high-performance temporary storage. Data is lost when container stops.
Volume management commands:
Best practices:
  1. Use named volumes for data persistence: Easier to manage than bind mounts
  1. Use bind mounts for development: Live code updates without rebuilding
  1. Use tmpfs for secrets and temporary data: Never persisted to disk
  1. Never store data in container's writable layer: Lost when container is removed

3.5 Resource Constraint Flags

Resource limits prevent containers from consuming excessive host resources.
Memory limits:
  • -memory, -m: Maximum memory
Memory units: b (bytes), k (kilobytes), m (megabytes), g (gigabytes).
  • -memory-reservation: Soft limit (memory reservation)
How it works: Container tries to stay below reservation. Docker only enforces hard limit (-m) when host memory is low.
  • -memory-swap: Total memory + swap
When to use memory limits: Always set memory limits in production to prevent OOM (Out of Memory) crashes affecting host.
CPU limits:
  • -cpus: Maximum CPU usage
How it works: If container has 100% CPU load and --cpus 0.5, Docker throttles it to use only 50% of one CPU over time.
  • -cpu-shares: Relative CPU priority
How it works: Only matters when CPU is contested. High-priority (2048) gets 2x CPU time of low-priority (512) when both are CPU-bound.
When to use: Multi-tenant environments, background tasks vs. user-facing services.
  • -cpuset-cpus: Pin to specific CPU cores
When to use: NUMA optimization, CPU-intensive applications that benefit from cache locality, isolating workloads.
Technical explanation:
  • NUMA (Non-Uniform Memory Access): Multi-processor systems where memory access speed depends on CPU location
  • Cache locality: Keeping process on same CPU improves performance due to CPU cache
  • -cpu-period and -cpu-quota: Precise CPU throttling
How it works: Within each cpu-period microseconds, container can use cpu-quota microseconds of CPU time.
Monitoring resource usage:
Output columns:
  • CPU %: CPU usage percentage
  • MEM USAGE / LIMIT: Current memory / Maximum memory
  • MEM %: Memory usage percentage
  • NET I/O: Network bytes in/out
  • BLOCK I/O: Disk bytes read/written
  • PIDS: Number of processes
Production resource limits strategy:
Guidelines:
  1. Set memory limits 20-30% above normal usage to handle spikes
  1. Set CPU limits to prevent noisy neighbor problems
  1. Monitor with docker stats and adjust based on real usage
  1. Use resource limits in docker-compose for consistency

3.6 Security Flags

Security flags reduce attack surface and enforce least privilege.
Read-only filesystem:
  • -read-only: Mount root filesystem as read-only
Why this matters: Attackers can't modify system files, install malware, or persist changes. Significantly reduces damage from compromised containers.
When to use: Production containers that don't need to write to disk (most stateless applications). Identify write locations during development and mount as tmpfs.
Capability flags:
Linux capabilities split root privileges into granular permissions. Docker drops many capabilities by default.
  • -cap-drop: Drop capabilities
  • -cap-add: Add capabilities
Common capabilities:
  • NET_BIND_SERVICE: Bind ports below 1024
  • NET_RAW: Use raw sockets (ping, packet capture)
  • SYS_ADMIN: Mount filesystems, various admin tasks
  • CHOWN: Change file ownership
  • DAC_OVERRIDE: Bypass file permission checks
Secure container example:
Breakdown:
  1. -read-only: Immutable filesystem
  1. -tmpfs /tmp: Writable temp with security flags
  1. -cap-drop ALL: Drop all capabilities
  1. -cap-add NET_BIND_SERVICE: Allow port 80 binding
  1. -security-opt no-new-privileges: Prevent privilege escalation
  1. -user 1000:1000: Run as non-root user
Security profiles:
  • -security-opt: Apply security profiles
Security profiles explained:
  1. Seccomp: Filters system calls container can make. Docker's default profile blocks ~44 dangerous syscalls.
  1. AppArmor: Mandatory Access Control (MAC) system. Restricts file access, network access, capabilities.
  1. SELinux: Another MAC system common on Red Hat/CentOS. Labels resources and enforces policies.
Never disable security profiles in production.
Privileged mode (dangerous):
  • -privileged: Give container all host capabilities
What it does: Disables all security features, gives access to all devices, allows mounting filesystems. Container has nearly root access to host.
When to use: Docker-in-Docker, hardware access (GPU, USB devices), very specific admin tasks. Never use in production unless absolutely necessary.

4. Dockerfile Best Practices

A Dockerfile is a text document containing instructions to build Docker images. Well-written Dockerfiles create small, secure, cacheable images.

4.1 Multi-Stage Builds

Multi-stage builds use multiple FROM instructions in a single Dockerfile, allowing you to separate build environment from runtime environment.
Problem multi-stage builds solve:
Traditional Dockerfile includes build tools, source code, dependencies, and compiled artifacts—all in final image. This creates large images with unnecessary attack surface.
Single-stage Dockerfile (problematic):
Multi-stage Dockerfile (optimized):
Key concepts:
  1. FROM ... AS name: Names a build stage for later reference
  1. COPY --from=stage: Copies files from another stage
  1. Final stage determines image size: Only last stage content is in final image
Benefits:
  • Smaller images: 5-10x size reduction by excluding build tools
  • Better security: Fewer packages = smaller attack surface
  • Single Dockerfile: One file for all environments
  • Better caching: Build stages cached independently
Go application multi-stage example:
Why Alpine base image: Alpine Linux is minimal (~5MB) but includes package manager. Perfect for production.
Python application multi-stage example:
Advanced: Multiple build stages for testing:
Build specific stages:

4.2 Layer Ordering and Caching

Docker caches each layer. When a layer changes, all subsequent layers rebuild.
How layer caching works:
Each Dockerfile instruction creates a layer. Docker reuses cached layers if instruction and context haven't changed.
Bad layer ordering (rebuilds frequently):
Problem: Changing any source file invalidates COPY . . layer, forcing npm install to re-run even though dependencies didn't change.
Optimized layer ordering:
Caching strategy: Order instructions from least-frequently-changed to most-frequently-changed:
  1. Base image (rarely changes)
  1. System dependencies (changes occasionally)
  1. Application dependencies (changes sometimes)
  1. Application code (changes frequently)
Python example:
Combining commands to reduce layers:
Bad (many layers):
Good (single layer):
Why this matters: Each layer adds to image size. Combining commands and cleaning up in same layer prevents intermediate files from bloating image.
BuildKit cache mounts (advanced):
What --mount=type=cache does: Mounts persistent cache directory that survives across builds. Go modules stay cached even when other layers change.

4.3 Base Image Selection

Base image choice affects security, size, and build times.
Base image options:
  1. Official language images (e.g., node:20, python:3.11, golang:1.21)
      • Pros: Easy to use, well-maintained, include language toolchain
      • Cons: Large size (300MB-1GB), includes unnecessary tools
      • Use case: Build stages in multi-stage builds
  1. Slim variants (e.g., node:20-slim, python:3.11-slim)
      • Pros: Smaller (~150-300MB), fewer vulnerabilities
      • Cons: Missing some tools, may need manual package installation
      • Use case: Production stages when you need glibc
  1. Alpine variants (e.g., node:20-alpine, python:3.11-alpine)
      • Pros: Very small (~50MB), excellent security record
      • Cons: Uses musl libc (not glibc), some packages missing, occasional compatibility issues
      • Use case: Production images, microservices, when size matters
  1. Distroless images (Google's distroless)
      • Pros: Minimal attack surface (no shell, no package manager)
      • Cons: Hard to debug, requires multi-stage builds
      • Use case: Maximum security production deployments
  1. Scratch (empty image)
      • Pros: Smallest possible (just your binary)
      • Cons: No shell, no debugging tools, no CA certificates
      • Use case: Static binaries (Go, Rust), ultra-minimal images
Choosing base images:
Security consideration: Use specific versions, not latest:
Why pin versions: latest tag is mutable and can break builds or introduce vulnerabilities. Digests are immutable and guarantee exact image.

4.4 Minimizing Image Size

Smaller images = faster deployments, less storage, smaller attack surface.
Techniques for size reduction:
1. Use multi-stage builds (covered in 4.1)
2. Use Alpine base images (50-150MB vs 300-1000MB)
3. Clean up in same layer:
Technical explanation: Each RUN creates a layer with filesystem changes. Deleting files in later layer doesn't remove them from earlier layer.
4. Use --no-install-recommends:
5. Remove build dependencies after use:
6. Don't install unnecessary packages:
7. Copy only necessary files:
8. Use .dockerignore (covered in 4.5)
Image size comparison example:
Checking image sizes:

4.5 Using .dockerignore

.dockerignore excludes files from build context, reducing build time and image size.
Why .dockerignore matters:
The build context (all files sent to Docker daemon) affects:
  1. Build speed: Larger context = slower transfer to daemon
  1. Cache invalidation: Irrelevant file changes invalidate cache
  1. Accidental file inclusion: Secrets, logs, temp files shouldn't be in images
How .dockerignore works:
Create .dockerignore in same directory as Dockerfile. It uses gitignore-like patterns:
Pattern syntax:
Common patterns for different languages:
Node.js:
Python:
Go:
Java:
Dockerfile-specific .dockerignore:
You can create Dockerfile-specific ignore files:
Verifying .dockerignore works:
Best practices:
  1. Exclude .git: Git history doesn't belong in images
  1. Exclude dependencies: node_modules, vendor will be rebuilt
  1. Exclude build artifacts: Rebuilt during image build
  1. Exclude secrets: Never include .env, keys, certificates
  1. Include README: Often useful for documentation
  1. Keep it simple: Start with common patterns, add as needed
Security note: .dockerignore is your last defense against accidentally including secrets. Always use it.

4.6 Non-Root Users

Running containers as root is dangerous. If container is compromised, attacker has root access.
Why non-root users matter:
  1. Security: Limits damage from compromised containers
  1. Compliance: Many security policies require non-root
  1. Kubernetes: Some clusters block root containers by default
Creating non-root users:
Debian/Ubuntu syntax:
Alpine syntax (simpler):
Node.js example (built-in node user):
Python example:
Go example (nobody user):
Handling file permissions:
Common pitfalls:
Problem: User can't write to volumes
Solution: Set ownership in volume mount or use numeric UID
Problem: Port below 1024 requires root
Solution: Use port above 1024 or add NET_BIND_SERVICE capability
Best practices:
  1. Always create explicit user: Don't rely on default
  1. Use numeric UID/GID: More portable across systems
  1. Common UID: Use 1000 or app-specific UID (e.g., 3000)
  1. Set ownership: Use chown or COPY --chown
  1. Switch early: Run USER before copying sensitive files
  1. Test as non-root: Ensure app works without root
Verification:

4.7 Example Dockerfiles

Production-ready Dockerfile examples for common languages.
Node.js/Express Application:
Key features:
  • Multi-stage build separates build and runtime
  • Alpine base for small size (~150MB vs ~1GB)
  • Non-root user (nodejs)
  • dumb-init for proper signal handling (graceful shutdown)
  • Health check for orchestration
  • Production-only dependencies
Python/Flask Application:
Key features:
  • Multi-stage build for smaller image
  • Only runtime dependencies in final stage
  • Non-root user
  • Gunicorn production server (not Flask dev server)
  • Environment variables for Python optimization
  • Health check using Python
Go Application (Minimal):
Key features:
  • Scratch base (minimal possible image ~10MB)
  • Static binary (no dependencies)
  • CA certificates for HTTPS requests
  • Non-root user (numeric UID)
  • Extremely small and secure
Java/Spring Boot Application:
Key features:
  • Multi-stage build (Maven in builder only)
  • JRE instead of JDK in production (smaller)
  • Container-aware JVM settings
  • Non-root user
  • Health check via Spring Actuator
  • Environment variable for JVM tuning
React/Nginx Application:
nginx.conf for non-root:
Key features:
  • Multi-stage build (Node for build, Nginx for serving)
  • Non-root nginx configuration
  • Port 8080 (non-privileged)
  • Security headers
  • Static asset optimization
Common patterns across all examples:
  1. Multi-stage builds: Separate build and runtime
  1. Non-root users: Security best practice
  1. Health checks: Enable orchestration monitoring
  1. Small base images: Alpine or slim variants
  1. Layer caching: Dependencies before source code
  1. Security: Drop privileges, minimal packages
  1. Production servers: Gunicorn, not Flask dev server; no nodemon

5. Image Versioning & Registry Strategy

Proper image versioning is critical for reliable deployments, rollbacks, and reproducibility.

5.1 Tags vs Digests

Image references have three forms:
  1. Tag: Human-readable label (e.g., myapp:v1.2.3)
  1. Digest: Immutable SHA256 hash (e.g., myapp@sha256:abc123...)
  1. Tag + Digest: Both for clarity and immutability
Tags are mutable - same tag can point to different images:
Problem with mutable tags:
  • Builds become non-reproducible
  • Security patches may reintroduce vulnerabilities
  • Hard to know exactly what's running in production
Digests are immutable - always reference exact image:
Getting image digest:
Using digests in Dockerfile:
Using digests in docker-compose:
When to use digests:
  • Production deployments: Always use digests for reproducibility
  • Security scanning: Scan specific digest, not floating tag
  • Compliance: Prove exactly what's running
  • Rollbacks: Reference exact previous version
When tags are acceptable:
  • Development: Convenient to pull latest changes
  • CI builds: Build tagged, then extract digest for deployment

5.2 Why 'latest' is Bad

The latest tag is Docker's default but dangerous for production.
Problems with latest:
1. Not actually "latest"
latest is just a default tag name. It's only updated when explicitly pushed:
2. Mutable and unpredictable
Result: Production has mixed versions, causing inconsistent behavior.
3. Impossible to rollback
4. Breaks caching and reproducibility
Rebuild tomorrow = different image, possibly breaking changes.
5. Security vulnerabilities
Real-world incident: Node.js official images broke yarn support when latest was updated, breaking thousands of builds.
What to use instead:
Exceptions where latest is acceptable:
  • Never in production
  • Local development experiments: Convenient for testing new versions ✅
  • Automated daily builds: If you explicitly want latest ✅
Kubernetes example showing the problem:
Problem: Three pods might run three different versions as they're created at different times.
Solution:

5.3 Production Pinning Strategy

Production-ready tagging strategy ensures reliability and traceability.
Semantic Versioning (SemVer)
Use SemVer format: MAJOR.MINOR.PATCH
  • MAJOR: Breaking changes (1.x.x → 2.0.0)
  • MINOR: New features, backward compatible (1.1.x → 1.2.0)
  • PATCH: Bug fixes, backward compatible (1.1.1 → 1.1.2)
Tagging strategy:
Tag hierarchy:
  • myapp:2.3.5 - Immutable, specific version
  • myapp:2.3 - Tracks latest patch in 2.3.x
  • myapp:2 - Tracks latest minor in 2.x.x
  • myapp:latest - Tracks latest release
Production deployment uses specific version:
Additional metadata tags:
Example complete tagging:
Benefits:
  1. Specific version: Exactly know what's deployed
  1. Git SHA: Trace back to source code
  1. Build number: Track CI/CD build
  1. Rolling tags: Convenience for development
Image labels (metadata in image):
Inspect labels:
Registry security:
Enable tag immutability in registry (Harbor, ECR, ACR):
Benefit: Prevents accidentally overwriting tags.

5.4 CI/CD Build and Push Workflow

Automated image building and publishing in CI/CD pipelines.
Typical workflow:
  1. Developer pushes code to Git
  1. CI/CD triggers on push/merge
  1. Build Docker image with cache
  1. Run tests in container
  1. Tag image with version
  1. Push to registry
  1. Deploy to staging
  1. Run integration tests
  1. Manual approval (optional)
  1. Deploy to production
GitHub Actions example:
What this does:
  1. Triggers on push to main or version tags
  1. Logs into GitHub Container Registry
  1. Extracts version from Git tag
  1. Builds with BuildKit cache from registry
  1. Tags with multiple strategies (version, branch, SHA)
  1. Pushes image and cache
GitLab CI example:
Advanced: Multi-platform builds
Automated tagging patterns:
Build caching strategies:
Security scanning in CI/CD:
Best practices:
  1. Cache layers: Use BuildKit registry cache for faster builds
  1. Multi-stage builds: Keep CI build times low
  1. Scan images: Integrate security scanning
  1. Semantic versioning: Auto-tag from Git tags
  1. Immutable tags: Never overwrite version tags
  1. Prune old images: Cleanup unused images in registry

6. Docker Compose (Latest Spec)

Docker Compose defines multi-container applications in YAML files. Compose v2 uses the Docker CLI plugin architecture.

6.1 Core Compose Fields

Compose file structure (version 3.8+):
Service definition anatomy:

6.2 Services Configuration

Build configuration:
Image and container naming:
Port mapping:
Environment variables:
.env file example:
Variable substitution in compose file:
Command and entrypoint:
User specification:
Working directory:

6.3 Networks and Volumes

Networks:
Network isolation example:
Volumes:
Volume permissions:

6.4 Secrets and Configs

Secrets (sensitive data like passwords, keys):
Secrets are mounted at /run/secrets/<secret_name>:
Configs (non-sensitive configuration files):
Environment-based secrets (for non-Swarm):
Better secrets management (production):

6.5 Healthchecks and Dependencies

Healthchecks:
Healthcheck alternatives:
Dependencies:
Dependency conditions:
  • service_started: Wait for container to start (default)
  • service_healthy: Wait for healthcheck to pass
  • service_completed_successfully: Wait for container to exit with code 0
Restart policies:

6.6 Production-Ready Compose Example

Complete production example with all best practices:
Key production features:
  1. Resource limits: CPU and memory constraints
  1. Health checks: All services monitored
  1. Dependency order: Services wait for dependencies to be healthy
  1. Restart policies: Automatic recovery from crashes
  1. Log rotation: Prevents disk exhaustion
  1. Network isolation: Backend services not exposed
  1. Secrets management: Passwords not in environment
  1. Version pinning: Specific image versions, no latest
  1. YAML anchors: Reusable configuration blocks
Running production compose:

7. Security & Hardening

Securing Docker containers prevents attacks and limits damage from compromised containers.

7.1 Capabilities (cap-drop, cap-add)

Linux capabilities split root privileges into granular permissions.
Default Docker capabilities:
Docker containers start with these capabilities:
  • CHOWN: Change file ownership
  • DAC_OVERRIDE: Bypass file permission checks
  • FOWNER: Bypass permission checks on operations
  • FSETID: Set file capabilities
  • KILL: Send signals to processes
  • SETGID: Set GID
  • SETUID: Set UID
  • SETPCAP: Modify capabilities
  • NET_BIND_SERVICE: Bind ports below 1024
  • NET_RAW: Use raw sockets
  • SYS_CHROOT: Use chroot
  • MKNOD: Create device nodes
  • AUDIT_WRITE: Write to audit log
  • SETFCAP: Set file capabilities
Dangerous capabilities Docker blocks:
  • SYS_ADMIN: Mount filesystems, load kernel modules
  • SYS_MODULE: Load/unload kernel modules
  • SYS_RAWIO: Raw I/O operations
  • SYS_PTRACE: Trace processes
  • SYS_BOOT: Reboot system
Drop all capabilities (most secure):
When to drop ALL capabilities:
Most applications don't need any special privileges. Start with dropping all, then add only what's needed.
Common capability needs:
Testing capabilities:

7.2 Seccomp and AppArmor

Seccomp filters system calls containers can make.
What Seccomp does:
Docker's default seccomp profile blocks ~44 dangerous syscalls including:
  • reboot
  • mount
  • swapon
  • kexec_load
  • init_module (load kernel modules)
Using default seccomp profile:
Custom seccomp profile:
Example custom profile (block execve):
Never disable seccomp in production:
AppArmor (Mandatory Access Control):
AppArmor restricts:
  • File access
  • Network access
  • Capabilities
  • IPC
Docker's default AppArmor profile (docker-default):
Custom AppArmor profile:
Example custom AppArmor profile:
Security options in production:
What each security option does:
  • no-new-privileges: Prevents privilege escalation via setuid binaries
  • seccomp:default: Blocks dangerous system calls
  • apparmor:docker-default: Restricts file/network access
  • read_only: Immutable filesystem
  • tmpfs with noexec,nosuid: Temp dir can't execute binaries or use setuid

7.3 Read-Only Filesystem

Why read-only matters:
If attacker compromises container, they can't:
  • Modify system files
  • Install malware
  • Persist backdoors
  • Replace binaries
  • Change configurations
Basic read-only:
Problem: Most apps need some writable locations:
Solution: Mount writable tmpfs (in-memory):
tmpfs options explained:
  • rw: Read-write
  • noexec: Can't execute binaries (prevents code injection)
  • nosuid: Ignores setuid bits (prevents privilege escalation)
  • size: Maximum size
Identifying writable locations:
Production example:
Benefits:
  • Prevents malware: Can't write executables
  • Prevents persistence: Changes lost on restart
  • Reduces attack surface: No writable system files
  • Compliance: Meets immutable infrastructure requirements
Performance note: tmpfs is RAM-based, so very fast. But uses memory from container's limit.

7.4 Resource Limits

Why resource limits are critical:
Without limits, a container can:
  • Consume all host memory (crash host)
  • Use 100% CPU (starve other containers)
  • Fill disk with logs
  • Cause OOM (Out of Memory) kills
Memory limits:
Memory limit behavior:
  1. Container tries to stay below reservation (256M)
  1. Can use up to limit (512M) if needed
  1. Exceeding limit → OOM kill
Setting appropriate memory limits:
CPU limits:
CPU limit behavior:
  • cpus: 1.5 → Maximum 150% CPU (1.5 cores)
  • Container throttled if it tries to exceed
  • Doesn't block other containers
CPU shares (priority):
How CPU shares work:
Only matter when CPU is contested:
  • Both containers running → high-priority gets 4x CPU
  • Only one running → uses full CPU regardless
Pinning to specific CPUs:
Production resource limits example:
Monitoring resources:
PID limits (prevent fork bombs):
Storage limits (rootfs size):

7.5 Secrets Management

Never store secrets in:
  1. ❌ Environment variables (visible in docker inspect, process lists)
  1. ❌ Dockerfiles (baked into image layers)
  1. ❌ Git repositories (version history)
  1. ❌ Plain text files in images
Best practices:
1. Docker Secrets (Swarm only):
Mounted at /run/secrets/db_password (in-memory, read-only).
2. Docker Compose secrets (file-based):
3. BuildKit secret mounts (build-time secrets):
4. External secret managers:
AWS Secrets Manager:
HashiCorp Vault:
5. Kubernetes Secrets:
Handling environment variables securely:
Secret rotation:
Security checklist:
  1. ✅ Store secrets in secret manager
  1. ✅ Mount as read-only files
  1. ✅ Use minimal permissions
  1. ✅ Rotate secrets regularly
  1. ✅ Audit secret access
  1. ✅ Never log secrets
  1. ✅ Use separate secrets per environment

7.6 Security Scanning

Scan images for vulnerabilities before deploying.
Tools:
  1. Trivy (recommended - free, comprehensive)
  1. Snyk
  1. Anchore
  1. Clair
  1. Docker Scout
Trivy scanning:
CI/CD integration:
Fail build on vulnerabilities:
Docker Scout:
Remediation workflow:
  1. Scan image
  1. Identify vulnerabilities
  1. Update base image / dependencies
  1. Rebuild image
  1. Re-scan
  1. Deploy if clean
Example vulnerabilities and fixes:
Automated scanning schedule:

8. Performance & Observability

8.1 Log Drivers and Rotation

Docker's default logging driver (json-file) doesn't rotate logs, causing disk exhaustion.
Problem:
Solution: Configure log rotation:
Global daemon configuration (/etc/docker/daemon.json):
What this does:
  • max-size: Rotate when log reaches 10MB
  • max-file: Keep 3 rotated files (30MB total)
  • compress: Compress rotated logs
Restart Docker daemon:
Per-container logging (docker-compose):
Recommended logging drivers:
1. local (recommended for production):
Benefits:
  • Automatic rotation (unlike json-file)
  • More efficient storage format
  • Better performance
2. journald (systemd integration):
View logs:
3. syslog (remote logging):
4. fluentd (centralized logging):
5. awslogs (CloudWatch):
Production logging stack example:
Viewing logs:
Best practices:
  1. Always configure rotation: Prevent disk exhaustion
  1. Use local driver: Better than default json-file
  1. Centralize logs: Send to log aggregation system
  1. Add metadata: Use labels and tags
  1. Monitor disk usage: Alert on log partition filling

8.2 Healthchecks

Healthchecks tell orchestration systems if container is functioning.
Dockerfile healthcheck:
Docker Compose healthcheck:
Parameters explained:
  • interval: Time between checks (30s = check every 30 seconds)
  • timeout: Maximum time for check to complete (10s)
  • retries: Consecutive failures before marking unhealthy (3)
  • start-period: Grace period during startup (40s - don't check)
Health check endpoint (/health):
What to check:
Check:
  • Database connectivity
  • Cache connectivity
  • Essential external services
  • Application-specific critical resources
Don't check:
  • Disk space (infrastructure concern)
  • CPU/memory (handled by resource limits)
  • Optional services (should degrade gracefully)
Complex healthcheck:
Healthcheck without curl:
Viewing health status:
Restart unhealthy containers:
Docker doesn't restart unhealthy containers by default. Two solutions:
1. Custom healthcheck that kills container:
2. External monitoring (autoheal):
Kubernetes healthchecks (liveness/readiness):
Difference:
  • Liveness: Is container alive? (restart if fails)
  • Readiness: Is container ready for traffic? (remove from load balancer if fails)

8.3 Metrics and Monitoring

Container metrics:
Metrics to monitor:
  1. CPU usage (docker stats)
  1. Memory usage (docker stats)
  1. Network I/O (docker stats)
  1. Disk I/O (docker stats)
  1. Container health (docker inspect)
  1. Image vulnerabilities (Trivy scans)
  1. Log volume (disk usage)
Prometheus + cAdvisor:

9. Local vs Production Deployment Workflows

9.1 Local Development with Compose

For local development, optimize for speed and convenience (hot reloading, debugging).
docker-compose.dev.yml:
Run local dev:

9.2 Production Deployment using Pinned Images

For production, optimize for stability and reproducibility (immutable images).
docker-compose.prod.yml:
Deploy script:

9.3 Rollback Strategy

Manual Rollback:
  1. Identify previous version tag (e.g., v1.2.2)
  1. Update compose file or environment variable
  1. Redeploy
Automated Rollback:
Keep backup compose files:

10. Production Checklist

Do:
Pin image versions (use digests or specific tags)
Run as non-root user
Set resource limits (CPU/Memory)
Use read-only filesystem where possible
Configure log rotation
Use multi-stage builds
Implement healthchecks
Scan images for vulnerabilities
Use secrets management (not ENV vars)
Keep images small (Alpine/Slim)
Don't:
Don't use latest tag
Don't run as root
Don't expose unnecessary ports
Don't include build tools in production image
Don't store secrets in image
Don't use -privileged flag
Don't mount Docker socket (/var/run/docker.sock) unless absolutely necessary
Don't ignore .dockerignore

11. Deploying to Cloud and Kubernetes

11.1 Deploying to AWS (ECS/EC2)

ECS (Elastic Container Service):
  • Push image to ECR (Elastic Container Registry)
  • Define Task Definition (equivalent to Docker Compose)
  • Create Service to run tasks
  • Use Fargate for serverless containers (no EC2 management)
EC2 (Bare Metal Docker):
  • Install Docker on EC2 instance (User Data script)
  • Use Docker Compose for simple stacks
  • Use AWS Systems Manager for secrets

11.2 Deploying to GCP (Cloud Run/GKE)

Cloud Run:
  • Fully managed, serverless
  • Scale to zero capability
  • Deploy directly from image:
    GKE (Google Kubernetes Engine):
    • Managed Kubernetes
    • Best for complex orchestrations

    11.3 Deploying to Kubernetes

    Migration from Compose:
    • Use tools like Kompose to convert docker-compose.yml to K8s manifests
    • Or write Helm charts for better management
    Key differences:
    • docker-compose.yml -> Deployment/Service/Ingress manifests
    • depends_on -> InitContainers or readiness probes
    • volumes -> PersistentVolumeClaims (PVC)
    • secrets -> Kubernetes Secrets
    CI/CD for K8s:
    1. Build & Push Image
    1. Update Manifest (gitops) or kubectl apply
    1. Rollout status check

    © Satyendra Bongi 2025