Performance Benchmarks
FlowMason is built for production workloads. See how it performs under various conditions.
These benchmarks were measured under controlled test conditions on specific hardware. Your actual performance will vary based on hardware, workload characteristics, provider latency, and other factors. Use these numbers as a reference point, not a guarantee.
Sequential Pipeline Depth
Tests long chains of dependent stages executed one after another.
| Depth | Avg Time | Stages/sec |
|---|---|---|
| 10 stages | 1.9ms | 5,258 |
| 25 stages | 4.37ms | 5,722 |
| 50 stages | 9ms | 5,557 |
| 100 stages | 19.9ms | 5,025 |
| 200 stages | 36.82ms | 5,432 |
Key insight: We observed consistent throughput of ~5,500 stages/second regardless of chain length. Linear scaling suggests no memory leaks or degradation in long pipelines.
Parallel Scaling
Tests fan-out/fan-in pattern with N parallel workers executing simultaneously.
| Width | Avg Time | Throughput |
|---|---|---|
| 5 parallel | 1.08ms | 922 ops/sec |
| 10 parallel | 1.23ms | 815 ops/sec |
| 25 parallel | 2.46ms | 406 ops/sec |
| 50 parallel | 4.46ms | 224 ops/sec |
| 100 parallel | 8.61ms | 116 ops/sec |
Key insight: Execution time scales linearly with parallel width (2x stages = ~2x time). Overhead per parallel stage is only ~0.08ms.
Nested Control Flow
Tests deeply nested conditional logic (decision trees).
| Nesting Depth | Avg Time | Conditionals/sec |
|---|---|---|
| 5 levels | 1.07ms | 4,673 |
| 10 levels | 1.89ms | 5,291 |
| 20 levels | 3.48ms | 5,747 |
| 30 levels | 5.35ms | 5,607 |
| 50 levels | 8.92ms | 5,605 |
Key insight: Conditional evaluation overhead is only ~0.17ms per level. Complex decision trees (30+ levels) execute in under 6ms.
ForEach Scaling
Tests iteration over collections of varying sizes.
| Collection Size | Avg Time | Items/sec |
|---|---|---|
| 10 items | 0.52ms | 19,393 |
| 50 items | 0.49ms | 101,836 |
| 100 items | 0.45ms | 220,184 |
| 250 items | 0.56ms | 449,748 |
| 500 items | 0.72ms | 691,858 |
Key insight: ForEach has near-constant overhead regardless of collection size. The ~0.5ms baseline is directive processing overhead. Batch processing 500 items adds only 0.7ms.
Design Considerations
Based on patterns observed in our testing. Your results may differ.
Observed Latencies (in testing)
- Simple API handler (5-10 stages)1-2ms
- Data transformation (20-30 stages)4-6ms
- Complex workflow (50-100 stages)10-20ms
- Large batch processing (100+ stages)20-40ms
Design Guidelines
- Use parallel fan-out for independent stages
- Up to 100 parallel stages performed efficiently in tests
- Sequential chains scaled linearly with no penalties
- Conditionals added minimal overhead (~0.17ms each)
- ForEach handled large collections efficiently
Test Environment
Similar performance expected on Apple M1/M2/M3 series, modern Intel/AMD processors (may vary ±20%), and cloud instances (c5/c6 class or equivalent).
Ready to build high-performance AI pipelines?