> ## Documentation Index
> Fetch the complete documentation index at: https://statsig-4b2ff144-serverless-cloudflare.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Confidence Intervals

Confidence intervals are an intuitive way to quantify the uncertainty in the observed metric deltas. A 95% confidence interval should contain the true effect 95% of the time. This means that if we ran an experiment 100 times, the true value of the metric delta should be inside the confidence intervals 95 times.

<Frame>
  <img src="https://user-images.githubusercontent.com/90343952/168382034-73afed14-d9f5-42cb-ade1-034526002a0b.png" alt="Confidence interval visualization showing statistical significance" />
</Frame>

In practical terms, a 95% confidence interval that doesn't contain zero (the green bar above) represents a statistically significant result (with *α = 0.05*). Only 5% of the time would we expect to see the confidence interval exclude zero if the true effect was zero (a.k.a. a false positive). Larger confidence intervals imply less certainty in the exact size of the effect with a larger range of likely values.

## Computing Confidence Intervals

Confidence intervals in Statsig are calculated using a two-sample z-test. This test requires knowledge of the variance in the metric delta we're measuring, which is derived differently depending on the type of metric (details [here](/stats-engine/variance)).

Once we've established the variance of the delta, it's straightforward to compute the confidence intervals.

### Two-Sided Tests

For the **absolute metric delta**, the confidence interval is given by:

$$
CI(\Delta \overline{X}) = \Delta \overline{X} \pm Z_{\alpha/2} \cdot \sqrt{{var(\Delta \overline{X})}}
$$

where:

* $Z_{\alpha/2}$ is the z-critical value for the desired significance level (1.96 for the standard $\alpha=0.05$ and 95% confidence interval) and we run a two-sided test
* $var(\Delta \overline{X})$ is the variance of the absolute delta (details [here](/stats-engine/variance))

The confidence interval for the **relative metric delta** can use one of two methods, [Fieller Intervals](/stats-engine/methodologies/fieller-intervals) and the [Delta Method](/stats-engine/methodologies/delta-method). While customers can opt for either method to be used in their Statsig console, Statsig recommends and automatically opts-in all new customers to use Fieller Intervals by default.

When using Fieller Intervals, the relative metric delta CI can be computed using:

$$
CI(\% \Delta \overline{X} ) = \frac{1}{1-g} ( \frac{\overline{X_T}}{\overline{X_C}} - 1 \pm \frac{Z_{\alpha/2}}{\sqrt{n_C} \cdot \overline{X_C}} \sqrt{(1-g) \cdot \frac{var(X_T)}{n_T(n_T-1)} + \frac{\overline{X_T} var(X_C)}{\overline{X_C} n_C (n_C-1)}})
$$

When using the Delta Method, the confidence interval is:

$$
\begin{split}
CI(\Delta \overline X\%)
&= \Delta \overline X\% \pm Z_{\alpha/2} \cdot\sqrt{{var(\Delta \overline X\%)}}\\
&= \frac{\Delta \overline X}{\overline X_c} \pm Z_{\alpha/2} \cdot\sqrt{(\frac{\overline X_t}{\overline X_c})^{2} \cdot (\frac{var(X_c)}{n_c \cdot \overline X_c^2} + \frac{var(X_t)}{n_t \cdot \overline X_t^2})} \cdot 100\%
\end{split}
$$

If using the Delta Method and the the control mean is not significantly away from zero, then it's simplified to:

$$
\begin{split}
CI(\Delta \overline X\%)
&= \Delta \overline X\% \pm Z_{\alpha/2} \cdot\sqrt{{var(\Delta \overline X\%)}} \\
&= \frac{\Delta \overline X}{\overline X_c} \pm Z_{\alpha/2} \cdot \frac{\sqrt{{var\left(\Delta \overline X\right)}}}{\overline X_c} \cdot 100\%
\end{split}
$$

### One-Sided Tests

When running one-sided tests, the form of the confidence interval calculation changes slightly to account for a redistribution of desired false positive rate when looking for increases or decreases in the metric:

$$
CI(\Delta \overline{X}) = \begin{cases}
\left[\Delta \overline{X} - Z_{\alpha} \cdot \sqrt{{var(\Delta \overline{X})}}, \quad +\infty  \right) & \text{if right-hand test}\\
\\
\left(-\infty, \quad \Delta \overline{X} + Z_{\alpha} \cdot \sqrt{{var(\Delta \overline{X})}} \: \right] & \text{if left-hand test}
\end{cases}
$$

where:

* $Z_{\alpha}$ is the z-critical value for the desired significance level (1.645 for the standard $\alpha=0.05$ and 95% confidence interval) and we run a one-sided test
* $var(\Delta \overline{X})$ is the same as for two-sided tests
* the choice of confidence interval depends on if the one-sided test is looking for increases or decreases in the metric

## Welch's T-test for Small Sample Sizes

For small sample sizes, we use Welch's t-test instead of a standard z-test. This statistical test is a better choice for handling samples of unequal size or variance without increasing the false positive rate. The structure of the confidence interval calculation remains the same as above (depending on 1- or 2-sided test), replacing the z-critical value with the t-critical value with degrees of freedom $\nu$.

For a two-sided test, the confidence interval is therefore:

$$
CI(\Delta \overline{X}) = \Delta \overline{X} \pm t_{\alpha/2} \cdot \sqrt{{var(\Delta \overline{X})}}
$$

$$
\nu = \frac{\left(var(\overline X_t) + var(\overline X_c)\right)^2}{\frac{var(\overline X_t)^2}{N_t - 1}+\frac{var(\overline X_c)^2}{N_c - 1}}\
= \frac{var(\Delta\overline{X})^2}{\frac{var(\overline X_t)^2}{N_t - 1}+\frac{var(\overline X_c)^2}{N_c - 1}}
$$

Where $N_t$ and $N_c$ are the number of users in the test and control groups, respectively. Note that for large number of degrees of freedom, the t-statistic converges with the z-statistic. Therefore, Welch's t-test is used only when $\nu < 100$.

## Comparing Experiment Data to a Fixed Baseline: One-sample T-test

Sometimes we want to answer questions like "Does my test variant lead to a click through rate higher than 0.5?". You can define a fixed-baseline comparison when adding metrics to the experiment.

The confidence interval is calculated by

$$
CI(\Delta \overline X) = (\overline X_{group} - fixed \ value) \pm Z \cdot\sqrt{{var( \overline X_{group})}}
$$
