Name: Eppo
Author: Eppo

Overview

Eppo is a warehouse-native experimentation platform built on the premise that experiment metrics should be computed where the data already lives—inside the organization's cloud warehouse—rather than in a parallel analytics store maintained by the vendor. The platform connects directly to Snowflake, BigQuery, Databricks, and Amazon Redshift, orchestrating experiment analysis on top of existing tables so that the metric a product manager sees in an experiment report is the same metric an analyst queries in Looker or dbt. That architectural choice eliminates a class of trust problems that older platforms created: when experimentation tools calculate metrics through proprietary JavaScript instrumentation and warehouse-based BI tools calculate the same metrics from event tables, the numbers diverge, and organizations lose confidence in both.

Founded in February 2021 by Chetan Sharma, a data scientist whose experience at Airbnb showed him how much coordination overhead even a sophisticated internal platform demands, Eppo set out to make Airbnb-caliber experimentation accessible to companies that lack a dedicated experimentation infrastructure team. The platform ships as two components: a lightweight SDK that handles feature flag evaluation and experiment assignment locally after pulling configuration from a global CDN—no further network requests after initialization—and an analytics engine that runs statistical analysis against warehouse data. Client SDKs cover JavaScript, Android, iOS, React Native, and Dart/Flutter; server SDKs cover Node.js, Python, Java, Go, Ruby, PHP, Rust, .NET, and Elixir.

In 2024, Eppo was acquired by Datadog, folding into Datadog's broader observability and product analytics ecosystem. The acquisition pairs Eppo's experimentation and feature management capabilities with Datadog's infrastructure monitoring, giving teams a path from observing system behavior to measuring the business-metric impact of a change within a single vendor relationship. Eppo continues operating under its own brand as part of Datadog, maintaining its warehouse-native architecture and existing customer relationships—including organizations such as Twitch, DraftKings, Perplexity, and Coinbase. Prior to the acquisition, Eppo had raised $19.5 million in Series A funding.

Key Features

Warehouse-native metrics and analysis. Metrics are defined through SQL-based specifications—either in Eppo's graphical metric builder or programmatically via YAML definitions that can sync from dbt projects using the dbt-eppo-sync utility. Eppo's metric architecture employs a two-stage aggregation: raw events are first aggregated to the entity level (sum, count, etc.), then entity-level values are averaged across variants for comparison. Because the queries run in the customer's warehouse, every metric is consistent with the organization's existing analytics layer by construction, not by reconciliation.

CUPED and CUPED++ variance reduction. Eppo uses pre-experiment covariates to reduce variance in lift estimates—analogous to noise-cancelling headphones applied to noisy metric data. The extended CUPED++ method adds assignment-time properties (country, browser, device type) as additional regression covariates, providing variance reduction even for newly acquired users with limited baseline history. In practice, CUPED can cut experiment runtimes by roughly 50 percent while preserving statistical guarantees on error rates.

Multiple statistical frameworks. The platform supports frequentist, Bayesian, and sequential analysis methods. The default sequential frequentist approach allows teams to monitor accumulating evidence and make decisions when results are conclusive, without the rigid pre-commitment to fixed sample sizes that classical designs require. A hybrid sequential mode balances flexibility with the power advantages of fixed-sample designs. Bayesian analysis is available for contexts where prior beliefs and probabilistic reasoning about effect magnitude matter more than binary significance calls.

Feature flags and progressive delivery. Feature flags serve as kill switches, gradual-rollout gates, and the randomization layer for experiments, tying releases directly to measurement. The SDK evaluates flag assignments locally after initialization, so flag checks add no network latency to application code.

Advanced experiment types. Beyond standard A/B tests, Eppo supports contextual bandits for exploration–exploitation allocation, Switchback tests for marketplace experiments where user-level randomization is impractical, and Geolift tests for measuring the causal impact of interventions at the geographic level. The data model is built around entities, facts, and dimensions, which generalize beyond simple user-level experiments to accommodate these designs.

Experiment workflow and result exploration. Setting up an experiment follows a structured flow: define the entity being randomized, identify the assignment source, specify date ranges and traffic allocations, select the statistical method, and articulate the hypothesis—all before analysis runs. Results present treatment effects with confidence intervals, time-series evolution plots, and segmented breakdowns by user characteristics, so teams can detect heterogeneous effects rather than relying solely on aggregate comparisons.

What Makes It Notable

Eppo's core differentiator is that experiment metrics are not siloed. In earlier-generation platforms—Optimizely being the canonical example—metrics are captured through vendor-side JavaScript instrumentation that can only observe surface-level interactions, not downstream business outcomes like revenue, profit margin, or retention stored in the warehouse. Organizations running those tools frequently discover that experimentation numbers disagree with their BI numbers, eroding trust in both. Eppo sidesteps this entirely: the warehouse is the single source of truth, and the experimentation layer queries it rather than competing with it.

The platform also reflects a deliberate emphasis on statistical methodology as product, not afterthought. The availability of CUPED++, sequential testing, Bayesian inference, switchback designs, and geolift within a single interface—backed by published technical work from team members like Evan Miller on portable SQL-based regression engines—means teams can match the method to the metric and the decision context rather than defaulting to whichever single approach the vendor hardcoded. Combined with Eppo's investment in organizational guidance—structured experiment planning, metric certification practices, stakeholder education on interpreting confidence intervals—the platform treats building an experimentation habit as seriously as building the statistical engine behind it.

Eppo

Overview

Key Features

What Makes It Notable

People

Resources

Key Facts

Product

Related Platforms

Statsig