Build this pipeline yourself

Open in the interactive wizard to customize and export

CI/CD Deployment Pipeline

Automated build, test, deploy workflow with health checks and rollback

intermediate DevOps & CI/CD

Components Used

http_request conditional trycatch logger generator

CI/CD Deployment Pipeline Pipeline Visualization

The Problem

Deployment pipelines are fragile chains of scripts, webhooks, and manual checks. When a deploy fails at 2am:

No visibility - Which step failed? What was the error?
Manual rollback - Someone SSHs in and runs commands
Scattered logs - Check GitHub, then Kubernetes, then Datadog
No analysis - Why did it fail? Is it a flaky test or real bug?

Pain Points We’re Solving

Script sprawl - Dozens of shell scripts that nobody maintains
Silent failures - Deploys that “succeed” but break production
Slow recovery - 30+ minutes to rollback and understand what happened
Alert fatigue - Too many notifications, not enough context

Thinking Process

We need a pipeline that:

flowchart TB
    subgraph Strategy["Design Strategy"]
        S1["1. Trigger build via CI API"]
        S2["2. Wait for build completion"]
        S3["3. Deploy to staging + verify"]
        S4["4. Run tests"]
        S5["5. Deploy to prod OR rollback"]
        S6["6. Notify with context"]
    end

    S1 --> S2 --> S3 --> S4 --> S5 --> S6

Key Insight: Orchestration Across Systems

FlowMason calls GitHub API, then Kubernetes API, then Slack API, with proper error handling between each. If staging health check fails, we never touch production.

Solution Architecture

flowchart TB
    subgraph Input["Trigger"]
        I1["commit_sha: abc123"]
        I2["service: api-gateway"]
        I3["environment: production"]
    end

    subgraph Build["Build Phase"]
        B1["Trigger GitHub Actions"]
        B2["Poll for completion"]
        B3["Validate build success"]
    end

    subgraph Staging["Staging Phase"]
        S1["Deploy to staging"]
        S2["Health check (12 retries)"]
        S3["Run smoke tests"]
    end

    subgraph Production["Production Phase"]
        P1["Deploy to production"]
        P2["Health check"]
        P3["Notify success"]
    end

    subgraph Failure["Failure Path"]
        F1["AI analyze failure"]
        F2["Rollback if needed"]
        F3["Notify with analysis"]
    end

    Input --> Build
    Build --> Staging
    Staging -->|tests pass| Production
    Staging -->|tests fail| Failure
    Production -->|healthy| P3
    Production -->|unhealthy| F2

Pipeline Stages

Stage 1: Log Deployment Start

{
  "id": "log-start",
  "component": "logger",
  "config": {
    "level": "info",
    "message": "Starting deployment of {{input.service}} ({{input.commit_sha}}) to {{input.environment}}"
  }
}

Stage 2: Trigger Build

Initiate the CI build via GitHub Actions API:

{
  "id": "trigger-build",
  "component": "http-request",
  "depends_on": ["log-start"],
  "config": {
    "url": "https://api.github.com/repos/{{input.org}}/{{input.repo}}/actions/workflows/build.yml/dispatches",
    "method": "POST",
    "headers": {
      "Authorization": "Bearer {{secrets.GITHUB_TOKEN}}",
      "Accept": "application/vnd.github.v3+json"
    },
    "body": {
      "ref": "{{input.branch}}",
      "inputs": {
        "commit_sha": "{{input.commit_sha}}",
        "service": "{{input.service}}"
      }
    },
    "timeout": 30000
  }
}

Stage 3: Poll Build Status

Wait for the build to complete (with retries):

{
  "id": "poll-build-status",
  "component": "http-request",
  "depends_on": ["trigger-build"],
  "config": {
    "url": "https://api.github.com/repos/{{input.org}}/{{input.repo}}/actions/runs?head_sha={{input.commit_sha}}",
    "method": "GET",
    "headers": {
      "Authorization": "Bearer {{secrets.GITHUB_TOKEN}}"
    },
    "timeout": 30000,
    "retry": {
      "max_attempts": 30,
      "delay_seconds": 10,
      "condition": "{{output.body.workflow_runs[0].status != 'completed'}}"
    }
  }
}

Stage 4: Validate Build

Branch based on build result:

{
  "id": "validate-build",
  "component": "conditional",
  "depends_on": ["poll-build-status"],
  "config": {
    "condition": "{{stages.poll-build-status.output.body.workflow_runs[0].conclusion == 'success'}}",
    "if_true": ["deploy-staging"],
    "if_false": ["notify-build-failure"]
  }
}

Stage 5: Deploy to Staging

{
  "id": "deploy-staging",
  "component": "http-request",
  "depends_on": ["validate-build"],
  "config": {
    "url": "{{input.k8s_api_url}}/apis/apps/v1/namespaces/staging/deployments/{{input.service}}",
    "method": "PATCH",
    "headers": {
      "Authorization": "Bearer {{secrets.K8S_TOKEN}}",
      "Content-Type": "application/strategic-merge-patch+json"
    },
    "body": {
      "spec": {
        "template": {
          "spec": {
            "containers": [{
              "name": "{{input.service}}",
              "image": "{{input.registry}}/{{input.service}}:{{input.commit_sha}}"
            }]
          }
        }
      }
    },
    "timeout": 60000
  }
}

Stage 6: Staging Health Check

{
  "id": "staging-health-check",
  "component": "http-request",
  "depends_on": ["deploy-staging"],
  "config": {
    "url": "https://staging.{{input.domain}}/health",
    "method": "GET",
    "timeout": 10000,
    "retry": {
      "max_attempts": 12,
      "delay_seconds": 5,
      "condition": "{{output.status_code != 200}}"
    }
  }
}

Stage 7: Run Integration Tests

{
  "id": "run-tests",
  "component": "http-request",
  "depends_on": ["staging-health-check"],
  "config": {
    "url": "{{input.test_runner_url}}/run",
    "method": "POST",
    "body": {
      "suite": "integration",
      "environment": "staging",
      "service": "{{input.service}}"
    },
    "timeout": 300000
  }
}

Stage 8: Production Decision

{
  "id": "check-tests",
  "component": "conditional",
  "depends_on": ["run-tests"],
  "config": {
    "condition": "{{stages.run-tests.output.body.passed == true}}",
    "if_true": ["deploy-production"],
    "if_false": ["analyze-test-failure"]
  }
}

Stage 9: Deploy to Production (with rollback protection)

{
  "id": "deploy-production",
  "component": "trycatch",
  "depends_on": ["check-tests"],
  "config": {
    "try": ["prod-deploy", "prod-health-check"],
    "catch": ["rollback-production"],
    "finally": ["log-production-result"]
  }
}

Stage 10: AI Failure Analysis

When things go wrong, get actionable insights:

{
  "id": "analyze-failure",
  "component": "generator",
  "depends_on": ["rollback-production"],
  "config": {
    "model": "gpt-4",
    "temperature": 0.3,
    "system_prompt": "You are a DevOps engineer analyzing deployment failures. Provide: 1) Root cause, 2) Severity, 3) Recommended fix, 4) Was rollback correct?",
    "prompt": "Analyze this deployment failure:\n\nService: {{input.service}}\nCommit: {{input.commit_sha}}\nHealth check response: {{stages.prod-health-check.output}}\nRecent logs: {{stages.fetch-logs.output}}\n\nWhat went wrong and what should we do?"
  }
}

Stage 11: Notify Result

{
  "id": "notify-success",
  "component": "http-request",
  "depends_on": ["prod-health-check"],
  "config": {
    "url": "{{secrets.SLACK_WEBHOOK}}",
    "method": "POST",
    "body": {
      "blocks": [
        {
          "type": "section",
          "text": {
            "type": "mrkdwn",
            "text": ":white_check_mark: *{{input.service}}* deployed to production\n*Commit:* `{{input.commit_sha}}`\n*Deploy time:* {{execution.duration}}ms"
          }
        }
      ]
    }
  }
}

Execution Timeline

gantt
    title Deployment Timeline
    dateFormat X
    axisFormat %L

    section Build
    trigger-build     :0, 2000
    poll-build-status :2000, 180000

    section Staging
    deploy-staging    :180000, 185000
    health-check      :185000, 195000
    run-tests         :195000, 295000

    section Production
    deploy-prod       :295000, 300000
    health-check      :300000, 310000
    notify            :310000, 311000

Sample Input

{
  "service": "api-gateway",
  "commit_sha": "abc123def456",
  "branch": "main",
  "environment": "production",
  "org": "mycompany",
  "repo": "api-gateway",
  "domain": "api.mycompany.com",
  "registry": "gcr.io/myproject",
  "k8s_api_url": "https://kubernetes.default.svc",
  "test_runner_url": "https://tests.internal.mycompany.com"
}

Expected Output

{
  "deployment": {
    "status": "success",
    "service": "api-gateway",
    "commit_sha": "abc123def456",
    "environment": "production",
    "duration_ms": 311000,
    "stages_completed": 11
  },
  "health_check": {
    "staging": "healthy",
    "production": "healthy"
  },
  "tests": {
    "passed": true,
    "total": 47,
    "failed": 0
  },
  "notifications": {
    "slack": "sent"
  }
}

Key Learnings

1. Staged Deployment Pattern

flowchart LR
    Build --> Staging
    Staging -->|verify| Tests
    Tests -->|pass| Production
    Tests -->|fail| Stop
    Production -->|verify| Done
    Production -->|fail| Rollback

Never skip staging verification. Every environment gets health checked.

2. Error Handling Strategy

Failure Point	Action
Build fails	Notify, stop
Staging unhealthy	Notify, stop
Tests fail	Analyze, notify, stop
Production unhealthy	Rollback, analyze, notify

3. AI Enhancement Value

The AI analysis stage isn’t magic - it correlates:

Recent code changes
Log patterns
Health check responses
Historical failure patterns

This reduces MTTR by giving the on-call engineer a head start.

Try It Yourself

fm run pipelines/devops-cicd-deployment.pipeline.json \
  --input inputs/deploy-api-gateway.json

Book Chapter Editor

Service Health Monitor