Experimentation Platform Directory
Explore in-house and vendor experimentation platforms. Discover their architectures, methodologies, and the people behind them.
14 platforms

Booking.com
In-HouseExperiment Tool
One of the most mature self-serve experimentation platforms in existence — running 1,000+ concurrent experiments across 75 countries since 2005 — with deep investments in variance reduction (CUPED), sequential testing, interleaving for ranking, and causal inference for two-sided marketplace dynamics that have influenced industry-wide practice.

Booking.com
In-HouseExperiment Tool
One of the most mature experimentation cultures in tech, running thousands of concurrent A/B tests with experimentation deeply embedded in product development at every level.

Eppo
VendorEppo
Warehouse-native experimentation platform that connects directly to your data warehouse, enabling rigorous statistical analysis without data duplication.

Netflix
In-HouseABLaze
Netflix's centralized UI for defining, scheduling, and monitoring A/B tests across hundreds of millions of members, where teams configure batch or real-time allocation, manage concurrent experiment isolation, and review results analyzed via bootstrapping, Bayesian, and sequential testing methods—with title artwork experiments alone producing 20–30% viewing increases for optimized images.

Skyscanner
In-HouseDr Jekyll
Skyscanner's dual-component platform (Dr Jekyll UI + Mr Hyde API) combines experimentation and feature configuration in one system — enabling crawler-aware segmentation, multi-paradigm testing, and configuration rollouts across 300 Java microservices at 35M daily searches.

Spotify
VendorConfidence
Born as Spotify's internal experimentation platform powering thousands of tests across the music streaming service, now available as a standalone product for external teams.

Spotify
In-HouseExperimentation Platform
Spotify's platform stands out for solving 'peeking problem 2.0' in longitudinal data, running 10,000+ experiments/year across 300 teams, and shifting success metrics from 'wins' to 'experiments with learning' — a cultural reframe that treats clear negative results as equally valuable.

Statsig
VendorStatsig
Modern experimentation and feature management platform built by former Facebook experimentation team members, offering feature flags, A/B testing, and product analytics.

Tinder
In-HousePhoenix
Tinder's Phoenix platform supports ~400 concurrent experiments on a two-sided marketplace where user-level randomization breaks down due to matching-graph interference, combining CUPED variance reduction, switchback experiments for network effects, and full lifecycle tooling across ideation, configuration, and analysis. Built on Redshift, Redis, and a custom API gateway, it computes ~100 metrics per experiment while keeping assignment latency under 100ms.

Duck Duck Goose
Built in 2010 to handle experimentation across hundreds of millions of users, Duck Duck Goose pioneered a three-stage Scalding pipeline that separates real-time health monitoring from deep offline analysis — and published unusually candid research on statistical pitfalls like bucket imbalance and multiple control groups that influenced the broader industry.

Uber
In-HouseCitrus
Uber's Project Citrus achieved a 100x reduction in experiment evaluation latency (10ms → 100µs) by inverting the traditional client-server model — pushing rules engines to host agents so assignment logic runs locally with no RPC calls — while simultaneously unifying A/B testing and feature flag infrastructure under one system running 1,000+ concurrent experiments.

Walmart
In-HouseExpo
Built to survive Black Friday at hyperscale, Expo processes up to 60,000 events/second across 8 tenants using a two-phase Spark Structured Streaming pipeline — replacing a Lambda architecture that broke precisely when real-time experiment metrics mattered most.

Wayfair
In-HouseGemini
Gemini flips the standard experimentation workflow on its head: instead of analyzing results post-hoc, it runs large-scale Monte Carlo simulations against historical data to validate and optimize test designs *before* launch — catching flawed designs in the planning phase where iteration is cheap.

Yelp
In-HouseBunsen
Bunsen achieved a 120% improvement in decision accuracy over 18 months by combining automated power analysis, CUPED variance reduction, and a distributed 'deputies' model that embedded experimentation expertise across every team — scaling to 1,000+ concurrent experiments without centralizing control.
Know a platform we're missing?
Help us grow the directory by submitting experimentation platforms you know about.
Submit a Platform