Build this pipeline yourself
Open in the interactive wizard to customize and export
CI/CD Deployment Pipeline
Automated build, test, deploy workflow with health checks and rollback
Components Used
The Problem
Deployment pipelines are fragile chains of scripts, webhooks, and manual checks. When a deploy fails at 2am:
- No visibility - Which step failed? What was the error?
- Manual rollback - Someone SSHs in and runs commands
- Scattered logs - Check GitHub, then Kubernetes, then Datadog
- No analysis - Why did it fail? Is it a flaky test or real bug?
Pain Points We’re Solving
- Script sprawl - Dozens of shell scripts that nobody maintains
- Silent failures - Deploys that “succeed” but break production
- Slow recovery - 30+ minutes to rollback and understand what happened
- Alert fatigue - Too many notifications, not enough context
Thinking Process
We need a pipeline that:
flowchart TB
subgraph Strategy["Design Strategy"]
S1["1. Trigger build via CI API"]
S2["2. Wait for build completion"]
S3["3. Deploy to staging + verify"]
S4["4. Run tests"]
S5["5. Deploy to prod OR rollback"]
S6["6. Notify with context"]
end
S1 --> S2 --> S3 --> S4 --> S5 --> S6
Key Insight: Orchestration Across Systems
FlowMason calls GitHub API, then Kubernetes API, then Slack API, with proper error handling between each. If staging health check fails, we never touch production.
Solution Architecture
flowchart TB
subgraph Input["Trigger"]
I1["commit_sha: abc123"]
I2["service: api-gateway"]
I3["environment: production"]
end
subgraph Build["Build Phase"]
B1["Trigger GitHub Actions"]
B2["Poll for completion"]
B3["Validate build success"]
end
subgraph Staging["Staging Phase"]
S1["Deploy to staging"]
S2["Health check (12 retries)"]
S3["Run smoke tests"]
end
subgraph Production["Production Phase"]
P1["Deploy to production"]
P2["Health check"]
P3["Notify success"]
end
subgraph Failure["Failure Path"]
F1["AI analyze failure"]
F2["Rollback if needed"]
F3["Notify with analysis"]
end
Input --> Build
Build --> Staging
Staging -->|tests pass| Production
Staging -->|tests fail| Failure
Production -->|healthy| P3
Production -->|unhealthy| F2
Pipeline Stages
Stage 1: Log Deployment Start
{
"id": "log-start",
"component": "logger",
"config": {
"level": "info",
"message": "Starting deployment of {{input.service}} ({{input.commit_sha}}) to {{input.environment}}"
}
}
Stage 2: Trigger Build
Initiate the CI build via GitHub Actions API:
{
"id": "trigger-build",
"component": "http-request",
"depends_on": ["log-start"],
"config": {
"url": "https://api.github.com/repos/{{input.org}}/{{input.repo}}/actions/workflows/build.yml/dispatches",
"method": "POST",
"headers": {
"Authorization": "Bearer {{secrets.GITHUB_TOKEN}}",
"Accept": "application/vnd.github.v3+json"
},
"body": {
"ref": "{{input.branch}}",
"inputs": {
"commit_sha": "{{input.commit_sha}}",
"service": "{{input.service}}"
}
},
"timeout": 30000
}
}
Stage 3: Poll Build Status
Wait for the build to complete (with retries):
{
"id": "poll-build-status",
"component": "http-request",
"depends_on": ["trigger-build"],
"config": {
"url": "https://api.github.com/repos/{{input.org}}/{{input.repo}}/actions/runs?head_sha={{input.commit_sha}}",
"method": "GET",
"headers": {
"Authorization": "Bearer {{secrets.GITHUB_TOKEN}}"
},
"timeout": 30000,
"retry": {
"max_attempts": 30,
"delay_seconds": 10,
"condition": "{{output.body.workflow_runs[0].status != 'completed'}}"
}
}
}
Stage 4: Validate Build
Branch based on build result:
{
"id": "validate-build",
"component": "conditional",
"depends_on": ["poll-build-status"],
"config": {
"condition": "{{stages.poll-build-status.output.body.workflow_runs[0].conclusion == 'success'}}",
"if_true": ["deploy-staging"],
"if_false": ["notify-build-failure"]
}
}
Stage 5: Deploy to Staging
{
"id": "deploy-staging",
"component": "http-request",
"depends_on": ["validate-build"],
"config": {
"url": "{{input.k8s_api_url}}/apis/apps/v1/namespaces/staging/deployments/{{input.service}}",
"method": "PATCH",
"headers": {
"Authorization": "Bearer {{secrets.K8S_TOKEN}}",
"Content-Type": "application/strategic-merge-patch+json"
},
"body": {
"spec": {
"template": {
"spec": {
"containers": [{
"name": "{{input.service}}",
"image": "{{input.registry}}/{{input.service}}:{{input.commit_sha}}"
}]
}
}
}
},
"timeout": 60000
}
}
Stage 6: Staging Health Check
{
"id": "staging-health-check",
"component": "http-request",
"depends_on": ["deploy-staging"],
"config": {
"url": "https://staging.{{input.domain}}/health",
"method": "GET",
"timeout": 10000,
"retry": {
"max_attempts": 12,
"delay_seconds": 5,
"condition": "{{output.status_code != 200}}"
}
}
}
Stage 7: Run Integration Tests
{
"id": "run-tests",
"component": "http-request",
"depends_on": ["staging-health-check"],
"config": {
"url": "{{input.test_runner_url}}/run",
"method": "POST",
"body": {
"suite": "integration",
"environment": "staging",
"service": "{{input.service}}"
},
"timeout": 300000
}
}
Stage 8: Production Decision
{
"id": "check-tests",
"component": "conditional",
"depends_on": ["run-tests"],
"config": {
"condition": "{{stages.run-tests.output.body.passed == true}}",
"if_true": ["deploy-production"],
"if_false": ["analyze-test-failure"]
}
}
Stage 9: Deploy to Production (with rollback protection)
{
"id": "deploy-production",
"component": "trycatch",
"depends_on": ["check-tests"],
"config": {
"try": ["prod-deploy", "prod-health-check"],
"catch": ["rollback-production"],
"finally": ["log-production-result"]
}
}
Stage 10: AI Failure Analysis
When things go wrong, get actionable insights:
{
"id": "analyze-failure",
"component": "generator",
"depends_on": ["rollback-production"],
"config": {
"model": "gpt-4",
"temperature": 0.3,
"system_prompt": "You are a DevOps engineer analyzing deployment failures. Provide: 1) Root cause, 2) Severity, 3) Recommended fix, 4) Was rollback correct?",
"prompt": "Analyze this deployment failure:\n\nService: {{input.service}}\nCommit: {{input.commit_sha}}\nHealth check response: {{stages.prod-health-check.output}}\nRecent logs: {{stages.fetch-logs.output}}\n\nWhat went wrong and what should we do?"
}
}
Stage 11: Notify Result
{
"id": "notify-success",
"component": "http-request",
"depends_on": ["prod-health-check"],
"config": {
"url": "{{secrets.SLACK_WEBHOOK}}",
"method": "POST",
"body": {
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": ":white_check_mark: *{{input.service}}* deployed to production\n*Commit:* `{{input.commit_sha}}`\n*Deploy time:* {{execution.duration}}ms"
}
}
]
}
}
}
Execution Timeline
gantt
title Deployment Timeline
dateFormat X
axisFormat %L
section Build
trigger-build :0, 2000
poll-build-status :2000, 180000
section Staging
deploy-staging :180000, 185000
health-check :185000, 195000
run-tests :195000, 295000
section Production
deploy-prod :295000, 300000
health-check :300000, 310000
notify :310000, 311000
Sample Input
{
"service": "api-gateway",
"commit_sha": "abc123def456",
"branch": "main",
"environment": "production",
"org": "mycompany",
"repo": "api-gateway",
"domain": "api.mycompany.com",
"registry": "gcr.io/myproject",
"k8s_api_url": "https://kubernetes.default.svc",
"test_runner_url": "https://tests.internal.mycompany.com"
}
Expected Output
{
"deployment": {
"status": "success",
"service": "api-gateway",
"commit_sha": "abc123def456",
"environment": "production",
"duration_ms": 311000,
"stages_completed": 11
},
"health_check": {
"staging": "healthy",
"production": "healthy"
},
"tests": {
"passed": true,
"total": 47,
"failed": 0
},
"notifications": {
"slack": "sent"
}
}
Key Learnings
1. Staged Deployment Pattern
flowchart LR
Build --> Staging
Staging -->|verify| Tests
Tests -->|pass| Production
Tests -->|fail| Stop
Production -->|verify| Done
Production -->|fail| Rollback
Never skip staging verification. Every environment gets health checked.
2. Error Handling Strategy
| Failure Point | Action |
|---|---|
| Build fails | Notify, stop |
| Staging unhealthy | Notify, stop |
| Tests fail | Analyze, notify, stop |
| Production unhealthy | Rollback, analyze, notify |
3. AI Enhancement Value
The AI analysis stage isn’t magic - it correlates:
- Recent code changes
- Log patterns
- Health check responses
- Historical failure patterns
This reduces MTTR by giving the on-call engineer a head start.
Try It Yourself
fm run pipelines/devops-cicd-deployment.pipeline.json \
--input inputs/deploy-api-gateway.json