FLOW MASON

Performance Benchmarks

FlowMason is built for production workloads. See how it performs under various conditions.

These benchmarks were measured under controlled test conditions on specific hardware. Your actual performance will vary based on hardware, workload characteristics, provider latency, and other factors. Use these numbers as a reference point, not a guarantee.

~5,500
Sequential Stages/sec
Consistent throughput regardless of chain length
0.18ms
Per-Stage Overhead
Includes component lookup, input mapping, execution
100+
Max Parallel Stages
Near-linear scaling with parallel width
<1ms
ForEach Overhead
Near-constant regardless of collection size

Sequential Pipeline Depth

Tests long chains of dependent stages executed one after another.

Depth Avg Time Stages/sec
10 stages 1.9ms 5,258
25 stages 4.37ms 5,722
50 stages 9ms 5,557
100 stages 19.9ms 5,025
200 stages 36.82ms 5,432

Key insight: We observed consistent throughput of ~5,500 stages/second regardless of chain length. Linear scaling suggests no memory leaks or degradation in long pipelines.

Parallel Scaling

Tests fan-out/fan-in pattern with N parallel workers executing simultaneously.

Width Avg Time Throughput
5 parallel 1.08ms 922 ops/sec
10 parallel 1.23ms 815 ops/sec
25 parallel 2.46ms 406 ops/sec
50 parallel 4.46ms 224 ops/sec
100 parallel 8.61ms 116 ops/sec

Key insight: Execution time scales linearly with parallel width (2x stages = ~2x time). Overhead per parallel stage is only ~0.08ms.

Nested Control Flow

Tests deeply nested conditional logic (decision trees).

Nesting Depth Avg Time Conditionals/sec
5 levels 1.07ms 4,673
10 levels 1.89ms 5,291
20 levels 3.48ms 5,747
30 levels 5.35ms 5,607
50 levels 8.92ms 5,605

Key insight: Conditional evaluation overhead is only ~0.17ms per level. Complex decision trees (30+ levels) execute in under 6ms.

ForEach Scaling

Tests iteration over collections of varying sizes.

Collection Size Avg Time Items/sec
10 items 0.52ms 19,393
50 items 0.49ms 101,836
100 items 0.45ms 220,184
250 items 0.56ms 449,748
500 items 0.72ms 691,858

Key insight: ForEach has near-constant overhead regardless of collection size. The ~0.5ms baseline is directive processing overhead. Batch processing 500 items adds only 0.7ms.

Design Considerations

Based on patterns observed in our testing. Your results may differ.

Observed Latencies (in testing)

  • Simple API handler (5-10 stages)1-2ms
  • Data transformation (20-30 stages)4-6ms
  • Complex workflow (50-100 stages)10-20ms
  • Large batch processing (100+ stages)20-40ms

Design Guidelines

  • Use parallel fan-out for independent stages
  • Up to 100 parallel stages performed efficiently in tests
  • Sequential chains scaled linearly with no penalties
  • Conditionals added minimal overhead (~0.17ms each)
  • ForEach handled large collections efficiently

Test Environment

machine
MacBook Air (Mac15,12)
chip
Apple M3
cores
8 (4 performance + 4 efficiency)
memory
16 GB
os
macOS
python
3.11

Similar performance expected on Apple M1/M2/M3 series, modern Intel/AMD processors (may vary ±20%), and cloud instances (c5/c6 class or equivalent).

Ready to build high-performance AI pipelines?