Performance Benchmarks

FlowMason is built for production workloads. See how it performs under various conditions.

These benchmarks were measured under controlled test conditions on specific hardware. Your actual performance will vary based on hardware, workload characteristics, provider latency, and other factors. Use these numbers as a reference point, not a guarantee.

~5,500

Sequential Stages/sec

Consistent throughput regardless of chain length

0.18ms

Per-Stage Overhead

Includes component lookup, input mapping, execution

100+

Max Parallel Stages

Near-linear scaling with parallel width

<1ms

ForEach Overhead

Near-constant regardless of collection size

Sequential Pipeline Depth

Tests long chains of dependent stages executed one after another.

Depth	Avg Time	Stages/sec
10 stages	1.9ms	5,258
25 stages	4.37ms	5,722
50 stages	9ms	5,557
100 stages	19.9ms	5,025
200 stages	36.82ms	5,432

Key insight: We observed consistent throughput of ~5,500 stages/second regardless of chain length. Linear scaling suggests no memory leaks or degradation in long pipelines.

Parallel Scaling

Tests fan-out/fan-in pattern with N parallel workers executing simultaneously.

Width	Avg Time	Throughput
5 parallel	1.08ms	922 ops/sec
10 parallel	1.23ms	815 ops/sec
25 parallel	2.46ms	406 ops/sec
50 parallel	4.46ms	224 ops/sec
100 parallel	8.61ms	116 ops/sec

Key insight: Execution time scales linearly with parallel width (2x stages = ~2x time). Overhead per parallel stage is only ~0.08ms.

Nested Control Flow

Tests deeply nested conditional logic (decision trees).

Nesting Depth	Avg Time	Conditionals/sec
5 levels	1.07ms	4,673
10 levels	1.89ms	5,291
20 levels	3.48ms	5,747
30 levels	5.35ms	5,607
50 levels	8.92ms	5,605

Key insight: Conditional evaluation overhead is only ~0.17ms per level. Complex decision trees (30+ levels) execute in under 6ms.

ForEach Scaling

Tests iteration over collections of varying sizes.

Collection Size	Avg Time	Items/sec
10 items	0.52ms	19,393
50 items	0.49ms	101,836
100 items	0.45ms	220,184
250 items	0.56ms	449,748
500 items	0.72ms	691,858

Key insight: ForEach has near-constant overhead regardless of collection size. The ~0.5ms baseline is directive processing overhead. Batch processing 500 items adds only 0.7ms.

Design Considerations

Based on patterns observed in our testing. Your results may differ.

Observed Latencies (in testing)

Simple API handler (5-10 stages)1-2ms
Data transformation (20-30 stages)4-6ms
Complex workflow (50-100 stages)10-20ms
Large batch processing (100+ stages)20-40ms

Design Guidelines

Use parallel fan-out for independent stages
Up to 100 parallel stages performed efficiently in tests
Sequential chains scaled linearly with no penalties
Conditionals added minimal overhead (~0.17ms each)
ForEach handled large collections efficiently

Test Environment

machine

MacBook Air (Mac15,12)

chip

Apple M3

cores

8 (4 performance + 4 efficiency)

memory

16 GB

macOS

python

3.11

Similar performance expected on Apple M1/M2/M3 series, modern Intel/AMD processors (may vary ±20%), and cloud instances (c5/c6 class or equivalent).

Ready to build high-performance AI pipelines?

Get Started Read Documentation