FLOW MASON

Build this pipeline yourself

Open in the interactive wizard to customize and export

Open in Wizard
09

CI/CD Deployment Pipeline

Automated build, test, deploy workflow with health checks and rollback

intermediate DevOps & CI/CD

Components Used

http_request conditional trycatch logger generator
CI/CD Deployment Pipeline Pipeline Visualization

The Problem

Deployment pipelines are fragile chains of scripts, webhooks, and manual checks. When a deploy fails at 2am:

  • No visibility - Which step failed? What was the error?
  • Manual rollback - Someone SSHs in and runs commands
  • Scattered logs - Check GitHub, then Kubernetes, then Datadog
  • No analysis - Why did it fail? Is it a flaky test or real bug?

Pain Points We’re Solving

  • Script sprawl - Dozens of shell scripts that nobody maintains
  • Silent failures - Deploys that “succeed” but break production
  • Slow recovery - 30+ minutes to rollback and understand what happened
  • Alert fatigue - Too many notifications, not enough context

Thinking Process

We need a pipeline that:

flowchart TB
    subgraph Strategy["Design Strategy"]
        S1["1. Trigger build via CI API"]
        S2["2. Wait for build completion"]
        S3["3. Deploy to staging + verify"]
        S4["4. Run tests"]
        S5["5. Deploy to prod OR rollback"]
        S6["6. Notify with context"]
    end

    S1 --> S2 --> S3 --> S4 --> S5 --> S6

Key Insight: Orchestration Across Systems

FlowMason calls GitHub API, then Kubernetes API, then Slack API, with proper error handling between each. If staging health check fails, we never touch production.

Solution Architecture

flowchart TB
    subgraph Input["Trigger"]
        I1["commit_sha: abc123"]
        I2["service: api-gateway"]
        I3["environment: production"]
    end

    subgraph Build["Build Phase"]
        B1["Trigger GitHub Actions"]
        B2["Poll for completion"]
        B3["Validate build success"]
    end

    subgraph Staging["Staging Phase"]
        S1["Deploy to staging"]
        S2["Health check (12 retries)"]
        S3["Run smoke tests"]
    end

    subgraph Production["Production Phase"]
        P1["Deploy to production"]
        P2["Health check"]
        P3["Notify success"]
    end

    subgraph Failure["Failure Path"]
        F1["AI analyze failure"]
        F2["Rollback if needed"]
        F3["Notify with analysis"]
    end

    Input --> Build
    Build --> Staging
    Staging -->|tests pass| Production
    Staging -->|tests fail| Failure
    Production -->|healthy| P3
    Production -->|unhealthy| F2

Pipeline Stages

Stage 1: Log Deployment Start

{
  "id": "log-start",
  "component": "logger",
  "config": {
    "level": "info",
    "message": "Starting deployment of {{input.service}} ({{input.commit_sha}}) to {{input.environment}}"
  }
}

Stage 2: Trigger Build

Initiate the CI build via GitHub Actions API:

{
  "id": "trigger-build",
  "component": "http-request",
  "depends_on": ["log-start"],
  "config": {
    "url": "https://api.github.com/repos/{{input.org}}/{{input.repo}}/actions/workflows/build.yml/dispatches",
    "method": "POST",
    "headers": {
      "Authorization": "Bearer {{secrets.GITHUB_TOKEN}}",
      "Accept": "application/vnd.github.v3+json"
    },
    "body": {
      "ref": "{{input.branch}}",
      "inputs": {
        "commit_sha": "{{input.commit_sha}}",
        "service": "{{input.service}}"
      }
    },
    "timeout": 30000
  }
}

Stage 3: Poll Build Status

Wait for the build to complete (with retries):

{
  "id": "poll-build-status",
  "component": "http-request",
  "depends_on": ["trigger-build"],
  "config": {
    "url": "https://api.github.com/repos/{{input.org}}/{{input.repo}}/actions/runs?head_sha={{input.commit_sha}}",
    "method": "GET",
    "headers": {
      "Authorization": "Bearer {{secrets.GITHUB_TOKEN}}"
    },
    "timeout": 30000,
    "retry": {
      "max_attempts": 30,
      "delay_seconds": 10,
      "condition": "{{output.body.workflow_runs[0].status != 'completed'}}"
    }
  }
}

Stage 4: Validate Build

Branch based on build result:

{
  "id": "validate-build",
  "component": "conditional",
  "depends_on": ["poll-build-status"],
  "config": {
    "condition": "{{stages.poll-build-status.output.body.workflow_runs[0].conclusion == 'success'}}",
    "if_true": ["deploy-staging"],
    "if_false": ["notify-build-failure"]
  }
}

Stage 5: Deploy to Staging

{
  "id": "deploy-staging",
  "component": "http-request",
  "depends_on": ["validate-build"],
  "config": {
    "url": "{{input.k8s_api_url}}/apis/apps/v1/namespaces/staging/deployments/{{input.service}}",
    "method": "PATCH",
    "headers": {
      "Authorization": "Bearer {{secrets.K8S_TOKEN}}",
      "Content-Type": "application/strategic-merge-patch+json"
    },
    "body": {
      "spec": {
        "template": {
          "spec": {
            "containers": [{
              "name": "{{input.service}}",
              "image": "{{input.registry}}/{{input.service}}:{{input.commit_sha}}"
            }]
          }
        }
      }
    },
    "timeout": 60000
  }
}

Stage 6: Staging Health Check

{
  "id": "staging-health-check",
  "component": "http-request",
  "depends_on": ["deploy-staging"],
  "config": {
    "url": "https://staging.{{input.domain}}/health",
    "method": "GET",
    "timeout": 10000,
    "retry": {
      "max_attempts": 12,
      "delay_seconds": 5,
      "condition": "{{output.status_code != 200}}"
    }
  }
}

Stage 7: Run Integration Tests

{
  "id": "run-tests",
  "component": "http-request",
  "depends_on": ["staging-health-check"],
  "config": {
    "url": "{{input.test_runner_url}}/run",
    "method": "POST",
    "body": {
      "suite": "integration",
      "environment": "staging",
      "service": "{{input.service}}"
    },
    "timeout": 300000
  }
}

Stage 8: Production Decision

{
  "id": "check-tests",
  "component": "conditional",
  "depends_on": ["run-tests"],
  "config": {
    "condition": "{{stages.run-tests.output.body.passed == true}}",
    "if_true": ["deploy-production"],
    "if_false": ["analyze-test-failure"]
  }
}

Stage 9: Deploy to Production (with rollback protection)

{
  "id": "deploy-production",
  "component": "trycatch",
  "depends_on": ["check-tests"],
  "config": {
    "try": ["prod-deploy", "prod-health-check"],
    "catch": ["rollback-production"],
    "finally": ["log-production-result"]
  }
}

Stage 10: AI Failure Analysis

When things go wrong, get actionable insights:

{
  "id": "analyze-failure",
  "component": "generator",
  "depends_on": ["rollback-production"],
  "config": {
    "model": "gpt-4",
    "temperature": 0.3,
    "system_prompt": "You are a DevOps engineer analyzing deployment failures. Provide: 1) Root cause, 2) Severity, 3) Recommended fix, 4) Was rollback correct?",
    "prompt": "Analyze this deployment failure:\n\nService: {{input.service}}\nCommit: {{input.commit_sha}}\nHealth check response: {{stages.prod-health-check.output}}\nRecent logs: {{stages.fetch-logs.output}}\n\nWhat went wrong and what should we do?"
  }
}

Stage 11: Notify Result

{
  "id": "notify-success",
  "component": "http-request",
  "depends_on": ["prod-health-check"],
  "config": {
    "url": "{{secrets.SLACK_WEBHOOK}}",
    "method": "POST",
    "body": {
      "blocks": [
        {
          "type": "section",
          "text": {
            "type": "mrkdwn",
            "text": ":white_check_mark: *{{input.service}}* deployed to production\n*Commit:* `{{input.commit_sha}}`\n*Deploy time:* {{execution.duration}}ms"
          }
        }
      ]
    }
  }
}

Execution Timeline

gantt
    title Deployment Timeline
    dateFormat X
    axisFormat %L

    section Build
    trigger-build     :0, 2000
    poll-build-status :2000, 180000

    section Staging
    deploy-staging    :180000, 185000
    health-check      :185000, 195000
    run-tests         :195000, 295000

    section Production
    deploy-prod       :295000, 300000
    health-check      :300000, 310000
    notify            :310000, 311000

Sample Input

{
  "service": "api-gateway",
  "commit_sha": "abc123def456",
  "branch": "main",
  "environment": "production",
  "org": "mycompany",
  "repo": "api-gateway",
  "domain": "api.mycompany.com",
  "registry": "gcr.io/myproject",
  "k8s_api_url": "https://kubernetes.default.svc",
  "test_runner_url": "https://tests.internal.mycompany.com"
}

Expected Output

{
  "deployment": {
    "status": "success",
    "service": "api-gateway",
    "commit_sha": "abc123def456",
    "environment": "production",
    "duration_ms": 311000,
    "stages_completed": 11
  },
  "health_check": {
    "staging": "healthy",
    "production": "healthy"
  },
  "tests": {
    "passed": true,
    "total": 47,
    "failed": 0
  },
  "notifications": {
    "slack": "sent"
  }
}

Key Learnings

1. Staged Deployment Pattern

flowchart LR
    Build --> Staging
    Staging -->|verify| Tests
    Tests -->|pass| Production
    Tests -->|fail| Stop
    Production -->|verify| Done
    Production -->|fail| Rollback

Never skip staging verification. Every environment gets health checked.

2. Error Handling Strategy

Failure PointAction
Build failsNotify, stop
Staging unhealthyNotify, stop
Tests failAnalyze, notify, stop
Production unhealthyRollback, analyze, notify

3. AI Enhancement Value

The AI analysis stage isn’t magic - it correlates:

  • Recent code changes
  • Log patterns
  • Health check responses
  • Historical failure patterns

This reduces MTTR by giving the on-call engineer a head start.

Try It Yourself

fm run pipelines/devops-cicd-deployment.pipeline.json \
  --input inputs/deploy-api-gateway.json