4-2 · Chapter 4 · 6 min read

What Happens First in a CI/CD Pipeline: Checkout and Environment Setup

You push a commit. Your CI/CD tool detects the change and starts a new pipeline run. But before any build, test, or deployment happens, the pipeline needs

What Happens First in a CI/CD Pipeline: Checkout and Environment Setup

You push a commit. Your CI/CD tool detects the change and starts a new pipeline run. But before any build, test, or deployment happens, the pipeline needs to answer three basic questions: What code am I working with? What tools do I have? And is my workspace clean?

This first stage is easy to overlook, but it's where many pipeline failures actually start. A dirty workspace, a mismatched tool version, or a missing dependency can derail an entire pipeline before it even begins meaningful work. Let's walk through what happens during checkout and initial preparation, and why getting this right matters for every pipeline that follows.

The Checkout Step: Getting the Right Code

When a pipeline triggers, it carries information from the triggering event: a commit hash, a branch name, or a tag. The checkout step uses that information to pull the exact version of code from the repository into the pipeline's workspace.

The workspace is a temporary folder on the machine running the pipeline. That machine is usually called a runner or agent, depending on the CI/CD tool you use. The pipeline downloads the code into this folder, and everything else builds, tests, and deployments happen inside that space.

Think of it like showing up to a new desk at work. You need to know which project you're working on, which version of the project you should use, and what tools are available at your desk. Without that, you can't start.

The following flowchart shows the sequence of actions that happen in the first stage of a CI/CD pipeline:

Here is a practical example using GitHub Actions:

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Setup Node.js 18
        uses: actions/setup-node@v4
        with:
          node-version: '18'
          cache: 'npm'

flowchart TD A[Trigger Event: commit, branch, or tag] --> B[Checkout Code: pull exact commit hash] B --> C[Clean Workspace: remove leftover files] C --> D[Identify Build: label artifact by branch/tag] D --> E[Setup Environment: install tools and dependencies] E --> F{Cache Available?} F -- Yes --> G[Restore cached dependencies] F -- No --> H[Download fresh dependencies] G --> I[Ready for Build Stage] H --> I

Why a Clean Workspace Matters

Every pipeline run should start with a clean workspace. If files from a previous run remain, they can contaminate the current build. A leftover configuration file, a stale compiled binary, or an old dependency can cause the pipeline to produce an artifact that doesn't match the current code. Debugging that kind of issue is painful because the problem isn't in your code, it's in the leftover junk from a previous run.

Most CI/CD tools offer a clean workspace option. Some enable it by default, others require manual configuration. If you're setting up a pipeline, make sure this is enabled. It's a small setting that prevents a large class of hard-to-find bugs.

Identifying What You're Building

After checkout, the pipeline needs to know what branch or tag it's processing. This information determines how the resulting artifact gets labeled and where it goes next.

For example, a commit to the main branch might produce an artifact labeled latest or stable. A commit to a feature branch might produce an artifact with the branch name and a build number. A tag like v1.2.3 should produce an artifact labeled with that exact version.

This labeling matters because it helps teams trace artifacts back to their source. When someone asks "which version of the code produced this artifact?", the label should give a clear answer. Without consistent labeling, artifact management becomes guesswork.

Setting Up the Environment

Once the code is in place, the pipeline needs to prepare its environment. Environment here means more than just the folder where code lives. It includes all the tools and dependencies required for building, testing, and deploying the application.

A Java application needs a specific JDK version. A Node.js application needs the right Node.js runtime and npm. A database migration needs a migration tool like Flyway or Liquibase. Each of these tools must be available in the pipeline environment, at the correct version.

The "It Works on My Machine" Problem

One of the most common frustrations in CI/CD is the mismatch between a developer's local environment and the pipeline environment. A developer runs the build on their laptop, everything passes, and they push the code. The pipeline picks it up, runs the same build, and it fails.

The cause is almost always an environment difference. The developer has JDK 17 installed locally, but the pipeline uses JDK 11. The developer has a global npm package that the pipeline doesn't. The developer's laptop has a different operating system or architecture.

This is the classic "it works on my machine" problem, and it's a sign that the pipeline environment isn't defined explicitly enough.

Making Environments Reproducible

The solution is to define the pipeline environment explicitly, so it can be reproduced anywhere. There are several common approaches:

Docker images: Package all required tools into a Docker image. The pipeline runs inside a container based on that image, guaranteeing the same environment every time.
Tool version files: Use files like .tool-versions (for asdf), .nvmrc (for Node.js), or .ruby-version to declare exact tool versions. The pipeline reads these files and installs the specified versions.
Environment managers: Tools like Conda for Python or SDKMAN for Java can manage tool versions declaratively.

The goal is the same: the pipeline environment should be reproducible wherever the pipeline runs. If you can run the pipeline on your laptop, on a CI server, and on a colleague's machine, and get the same result, you've solved the environment consistency problem.

Caching: Speed vs. Freshness

Some pipelines add caching at this stage to speed up subsequent runs. Dependencies that were downloaded in a previous run can be stored and reused. This makes sense for large dependency sets, like Node.js node_modules or Python's venv.

But caching introduces a trade-off. A stale cache can cause the pipeline to use old dependencies that should have been replaced. If a dependency was updated in the repository, but the cache still holds the old version, the pipeline might build against outdated code.

If you use caching, make sure the cache key includes enough information to invalidate it when dependencies change. A common approach is to hash the dependency file (like package-lock.json or requirements.txt) and use that hash as part of the cache key. When the dependency file changes, the cache key changes, and a fresh download happens.

What the Pipeline Has After This Stage

By the time checkout and environment setup are complete, the pipeline has:

The exact code from the triggering commit, branch, or tag
A clean workspace with no leftover files
A known label for the artifact it will produce
All the tools and dependencies needed for the next stages

This is the foundation. Without it, every subsequent stage build, test, deploy is built on uncertainty. With it, the pipeline can move forward with confidence.

A Quick Checklist for Your Pipeline's First Stage

Clean workspace is enabled or configured
Checkout uses the exact commit hash from the trigger
Artifact labeling matches branch or tag conventions
Tool versions are declared explicitly (Docker, tool version files, or environment manager)
Cache keys include dependency file hashes if caching is used

The Takeaway

The first few seconds of a pipeline run determine whether the rest of the pipeline runs on solid ground. A clean workspace, the right code, and a reproducible environment are not optional details. They are the foundation that makes every subsequent stage predictable and debuggable. Invest time in getting this stage right, and you will save far more time troubleshooting failures that turn out to be environment issues rather than code issues.