FLanqi

A/B Test Basics

When we want to decide whether to launch a new feature (new web UI, sorting algorithm, etc.), A/B tests are usually performed, and depending on the significance results, we then make a decision whether to make that launch or not.

We've seen before how the required sample sizes are calculated, now let's see how to calculate the z-score, or p-value in order to make a change.

Here we still make two scenarios

1. Two Sample z-test for Comparing Two Means

Assume we have two independent and normally distributed distributions, then

$z = \frac{\bar{x_1}-\bar{x_2}}{\sqrt{\frac{\sigma_1^2}{n_1}}+\sqrt{\frac{\sigma_2^2}{n_2}}}$

2. Two Sample z-test for Comparing Two Proportions

Assume we have two independent distributions that follow binomial distributions, then

$z = \frac{\hat{p_1}-\hat{p_2}}{\sqrt{p_0(1-p_0)(\frac{1}{n_1}+\frac{1}{n_2})}}$ (pooled)

$z = \frac{\hat{p_1}-\hat{p_2}}{\sqrt{\frac{p_1(1-p_1)}{n_1}+\frac{p_2(1-p_2)}{n_2}}}$ (unpooled)

, where $p_0$ is the average of $p_1$ and $p_2$.

Now, after calculating the z-values are are able to make a conclusion. We want our p-value to be smaller than the significance level. Since the commonly used significant level is 5%, the minimum z-score required is 1.96 for two sided test and 1.64 for one-sided test.

Sample Sizes and Duration for A/B Testing

How Many Samples are Needed?

When we conduct A/B tests, it is important to know the minimum number of samples required to reach sufficient statistical power, or in other words, be able to successfully identify the difference when there is a difference between the treatment and the control group.

Let us first make some definitions clear:

Type I Error ($\alpha$). Type I error happens when we reject a null hypothesis when it we should not (analogous to false negative rate), this is also know as the significance level, and is usually set to be 5%, meaning that if there is truly difference, then 95 out of 100 times we will make correct rejection.

Type II Error ($\beta$). Type II error happens when we should have rejected a null hypothesis when we did not (analogous to false positive rate), this is also know as 1- the statistical power, and is usually set to be 20%.

Therefore we see that people commonly use 5% for significance level and 80% for statistical power, and in a business setting, this means we would rather miss 4 good products or features than launching a bad one.

Again, the reason to determine the sample size is because we want to make sure we have enough statistical power, which is the probability of detecting a meaningful difference when there really is one. The larger the sample size is, the more power we would get.

The general formula is as follows,

$n = \frac{\sigma^2}{\Delta^2}(Z_{\alpha/2}+Z_{\beta})^2$

1. Sample Size for Comparing Two Means

Sample size needed for comparing the means of two normally distributed samples (~$N(\mu_i, \sigma_i)$) of equal size using a two-sided test with significance level $\alpha$ and power $1-\beta$:

$n=\frac{(\sigma_1^2+\sigma_2^2)(Z_{\alpha/2}+Z_{\beta})^2}{\Delta^2}$ for each group

, where $\Delta = |\mu_1-\mu_2|$.

Example. Suppose our OEC(overall evaluation criterion) is mean daily conversion rate, then the sample size for each group would be the number of days. Each data point is the conversion rate of a specific day, and we need to compare the mean daily conversion rates between two groups. In this case, we assume the two groups' daily conversion rate follow normal distribution and use the formula above to calculate the required sample sizes.

2. Sample Size for Comparing Two Proportions

Sample size needed to compare two binomial proportions (~$B(p_i)$) using a two-sided test with significance level $\alpha$ and power $1-\beta$, where one sample $n_2$ is $k$ times as large as the other sample $n_1$ (independent-sample case):

$n_1=\frac{[\sqrt{\bar{p}\bar{q}(1+\frac{1}{k})}Z_{\alpha/2}+\sqrt{p_1q_1+\frac{p_2q_2}{k}}Z_{\beta}]^2}{\Delta^2}$

, where $\Delta = |p_1-p_2|$, $\bar{p}=\frac{p_1+kp_2}{1+k}$ and $\bar{q}=1-\bar{p}$.

Example. Suppose our OEC is conversion rate and we want to know the number of visits/sessions needed for each group, then each data point would be a visit/session, which has only two outcomes (purchase vs not purchase). Since each session is a Bernoulli trial, we assume each group follows a binomial distribution, and the formula above should be used for calculating the required sample sizes.

∇ other formula variants

1. Rule of Thumb

This is the most simple way of approximating the sample size, the way to derive this is to plug in the numbers of $Z_{0.25}$ and $Z_{0.2}$ in the formula of calculating sample sizes for comparing proportions:

$16p_1(1-p_1)/\Delta^2$

2. Evanmiller

This formula is used in this online calculator, which is also used in Udamy's A/B test class offered by Google:

$[Z_{\alpha/2}\sqrt{2p_1(1-p_1)}+Z_{\beta}\sqrt{p_1(1-p_1)+p_2(1-p2)}]^2/\Delta^2$

How Long Should the A/B Test Last?

Now we know how to calculate the required sample sizes, we could estimate the duration needed for running the A/B test.

This depends on the traffic of the app/web to be test, or how many data points we could get each day. Suppose we have 2,000 visitors per day for our tested pages, and our total number of samples needed (for two groups) is 6,000, then the minimum number of days we need is:

$d=6000/2000=3$ (days)

However, usually there are other considerations when running A/B tests. For example, we want our testing period span across weekdays and also weekends, when users might have different behaviors. Usually we recommend at least run your A/B tests for 1 week and not more than 4 weeks.

References:

Construction with Straight Edge and Compass

Euclid's Postulates

A straight line may be drawn from any point to any other point.
A finite straight line may be extended continuously in a straight line.
A circle may be described with any center and radius.
All right angles are equal to each other.
Parallel postulate.

Note that the first three postulates are visualizations of basic geometric constructions that can be accomplished by straight edge and compass.

However, there are three problems that cannot be done with these tools:

Squaring a circle: given a circle, construct a square with the same area.
Doubling a cube: given a cube, construct a cube with double the volume.
Trisection of angles: trisect an angle.

Here we explore what we can and cannot construct with straight edge and compass and how this problem connects with field theory.

First, we need to establish 2D-coordinate system. In order to make sense a number $x$, we have to first be given (or define) the length of 1. In other words, we can assume we know a segment AB of length 1, and we want to construct numbers from it using coordinate systems ${(x,y)|x,y\in \mathbb{R}}$.

Definition. $\alpha \in \mathbb{R}$ is constructible if $(\alpha,0)$ is constructible.

Remark. $(\alpha,\beta)$ is constructible if and only if both $\alpha$ and $\beta$ are constructible.

What constructions can we do?

1. Basic Constructions
(1) Given a segment, we can divide it into n equal parts.

(2) Given a segment of length x, we can construct one of length $\sqrt{x}$ and $x^2$.

Therefore, given (0, 0) and (1, 0), we can construct all coordinates with integer entries and then by method (1), we can construct all rational numbers, hence $\mathbb{Q}^2$.

2. Constructions by Intersections

(3) Given A, B, C, D distinct points, $P=AB\cap CD$ if it exists.

(4) Given A, B, C, D distinct points, P is the intersection of circle $O = (|A|, |AB|)$ with line CD if it exists.

Lemma If A, B, C, D $\in K \subset \mathbb{R}$, where $K$ is a subfield, then in case (3), the point P has coordinates in $K$; in case (4), P has coordinates in $K(\sqrt{\alpha})$, where $\alpha >0, \alpha \in K$.

Proof.

In case (3), we have two lines, both of form $\alpha x +\beta y = \gamma$, where $\alpha, \beta, \gamma \in K$, then the solution of the two equations of lines also lie in $K$ by Cramer's Rule.

In case (4), we have one quadratic equation $(x-\alpha)+(y-\beta) = \gamma^2$ and another linear equation of a line, to solve the system of equations, we will end up with a quadratic polynomial of form $ax^2+bx+c=0$, with $a,b,c\in K$, hence the solutions like in $K(\sqrt{\Delta})$, where $\Delta = b^2-4ac\ge 0$ since the circle and line intersect.

Corollary Any constructible number $\alpha$ lies in a subfield $K_n\subset \mathbb{R}$, $\mathbb{Q} = K_0 \subset K_1\subset K_2\subset \dots \subset K_n$ and each $K_i = K_{i-1}(\sqrt{a_i})$ for some $a_i>0$ in $K_{i-1}$.

Proof. By induction on the moves to construct $\alpha$.

Theorem Any constructible real number is algebraic, and its degree over $\mathbb{Q}$ is a power of 2.

Proof. By last corollary, if $\alpha$ is constructible, $\alpha \in K_n$ for some $n$, with $[K_n:\mathbb{Q}]= 2^r$ for some $r\ge 0$. Then $[\mathbb{Q}(\alpha):\mathbb{Q}]\ |\ 2^r$.

Remark. With more work (like the ones in basic constructions), one can show that the set of constructible numbers is a subfield of $\mathbb{R}$.

Now we are ready to show that the three problems are impossible to be solved by edge and compass:

(1) Squaring a circle:
If there were such a method, then we can square a circle of radius 1, so a square with area $\pi$, side length $\sqrt{\pi}$ would be constructible, hence $\pi$ is constructible, but $\pi$ is not algebraic, contradiction.

(2) Doubling a cube:

Doubling a cube of side length 1 amounts to constructing $^3\sqrt{2}$, the side length of the new cube. Because $x^3-2$ is irreducible, it's the minimal polynomial of $^3\sqrt{2}$ over $\mathbb{Q}$ hence $^3\sqrt{2}$ has degree 3 over $\mathbb{Q}$, which is not a power of 2, contradiction.

(3) Trisection of angles:

If we could trisect any angle, then we could trisect 60° angle, so $(\cos20°, \sin20°)$ is constructible, hence $2\cos 20^{\circ}$ is constructible. Note that $cos3x=4cos^3x-3cosx$, so $1/2=\cos 60° = 4\cos^3 20°-3\cos 20°$. Let $u=2\cos 20°$, then $u^3-u-1=0$. But $x^3-3x-1$ is irreducible over $\mathbb{Q}$ by rational root theorem, so $u$ has degree 3 over $\mathbb{Q}$, contradiction.

Remark. One can trisect any angle easily using a ruler and compass.

Constructing Regular n-gons

Question: Can we construct a regular 7-gon? When is an n-gon constructible ($n\ge 3$)?

In general, a regular n-gon is constructible if and only if $\cos \frac{2\pi}{n}$ is constructible.

Answer:

Let $\zeta^7 = 1, \zeta \neq 1$. Then $\zeta$ is a root of $\frac{x^7-1}{x-1}=x^6+x^5+\dots+1$, which is a cyclotomic polynomial and irreducible by Eisenstein's Criterion. If a regular 7-gon is constructible, both $\sin\frac{2\pi}{7}$ and $\cos\frac{2\pi}{7}$ would be constructible, hence they lie in some extension $K$ of $\mathbb{Q}$ that has a degree $[K:\mathbb{Q}]=2^r$. Because $K\subset \mathbb{R}$, $K\subset K(i)$ and $[K(i):K]=2$. So $[K(i):\mathbb{Q}]=2^{r+1}$. $\zeta = \cos\frac{2\pi}{7}+\sin\frac{2\pi}{7}$ but $\zeta$ has degree 6, not a power of 2, contradiction.

Corollary. A p-gon cannot be constructible for $p$ prime if $p-1$ is not a power of 2. If a regular p-gon is constructible, then $p=2^r+1, r\ge 1$.

Corollary. Let $p$ be prime. If a regular p-gon is constructible, then $p$ is a Fermat prime.

Proof. Let $r=2^k x$, with $k$ the power of 2 in the prime factorization of $r$. Then $x$ is 1 or any other odd positive integer. Suppose $x\ge 3$ odd, then $2^{2^{k}t}+1=(2^{2^{k}})^t+1^t$ has a factor $2^{2^k}+1, k\ge 0$, contradiction.

Homomorphism and Tensor Product Over P.I.D.

Let $R$ be a P.I.D., let $(a), (b)$ be two ideals of $R$, $d=gcd(a,b)$, then

$Hom_R(R/(a),R/(b))\cong R/(d)$

Proof

Suppose $a=\prod up_i^{e_i}$, $b=\prod vp_i^{f_i}$, then by Chinese Remainder Theorem (CRT), since Bezout's Identity holds for P.I.D., the ideals of form $(p_i^{e_i})$ are pairwise co-maximal(co-prime), so we have:

$R/(a)\cong \prod R/(p_i^{e_i})$ and $R/(b)\cong \prod R/(p_i^{f_i})$. (Note that $R/(u)\cong 0$ for unit $u$).

Given $R$ modules $A_{\alpha}, B$, we have $Hom_R(\prod_{\alpha} A_{\alpha}, B) \cong \prod_{\alpha} Hom_R(A_{\alpha}, B)$ (this is also true for the second coordinate).

So, $Hom_R(R/(a),R/(b)) \cong \prod Hom_R(R/(p_i^{e_i}),R/(p_j^{f_j}))$.

(1) when $i\neq j$, $M = Hom_R(R/(p_i^{e_i}),R/(p_j^{f_j})) = 0$ because if $h\in M-{0}$, then $f(\overline{1})\neq 0$, but $f(\overline{p_i^{e_i}})=p_i^{e_i}f(\overline{1})=0$, contradiction.

(2) when $i=j$, consider $M = Hom_R(R/(p^{e}),R/(p^{f}))$

(i) if $e\ge f$, then $M\cong R/(p^f)$ by $(\overline{1}\mapsto \overline{r})\mapsto \overline{r}$.

(ii) if $e< f$, then $M\cong R/(p^e)$ by $(\overline{1}\mapsto kp^{f-e})\mapsto k$.

Therefore, $Hom_R(R/(a),R/(b)) \cong \prod Hom_R(R/(p_i^{e_i}),R/(p_j^{f_j})) \cong \prod R/(p_i)^{min(e_i,f_i)}$, by CRT, $Hom_R(R/(a),R/(b)) \cong R/(d)$.

$R/(a)\otimes R/(b)\cong R/(d)$

Proof

We first define an $R$-balanced map $f$ as follows:

$(x\mod (a), y \mod (b))\mapsto xy \mod (d)$

Then there is a induced homomorphism $\varphi$ that sends $(x \mod (a)\otimes y \mod (b))\mapsto xy \mod (d)$.

Define $\psi: R/(d) \to R/(a)\otimes R/(b)$ by sending $r\mod (d) \mapsto (r\mod (a),1\mod (b))$, then $\varphi$ and $\psi$ are inverse homomorphisms hence isomorphisms.

A Look into Ultrametric Spaces

Let $X$ be a set, a metric on $X$ is a function $d:X\times X\to\mathbb{R}$ that satisfies:

1) $d(x,y)\ge 0$ for all $x,y\in X$, and $d(x,y)= 0$ if and only if $x=y$
2) $d(x,y)=d(y,x)$ for all $x,y\in X$
3) $d(x,y)\le d(x,z)+d(y,z)$ for all $x,y,z\in X$

This gives us a metric space $(X,d)$. A metric space is called ultrametric if it's metric is non-Archimedian, meaning that for property 3) we have the following:

$d(x,y) \le {\max}\{d(x,z),d(y,z)\}$

It's easy to check that when this holds, property 3) always holds.

Geometry

The topology of an ultrametric space is rather interesting, below are some geometric properties for an ultrametric space $(X,d)$.

All Triangles are Isosceles

Let $x,y,z$ be the three vertices of a triangle in the space. Then one can show that we always have two of the distances $d(x,y), d(y,z), d(x,z)$ to be equal.

Balls

As usual, we define an open ball of radius $r>0$ centered at a point $x\in X$ as $B_r(x) = \{y\in X | d(y,x)<r\}$ and a closed ball of radius $r$ centered at $x$ as $\overline{B_r(x)} = \{y\in X | d(y,x)\le r\}$.

The balls in an ultrametric space have the following properties:

1) They are clopen (both open and closed).

2) Every point of the ball is the center.

3) No two balls can overlap, and one has to be contained in another.

An Example - p-adic numbers

Definition Let $p$ be a prime. Given a non-zero rational $x=\frac{m}{n}$, where $m,n\in\mathbb{Z}$,we can write it as follows,
\begin{equation*}
x = p^{v(x)}\frac{a'}{b'}
\end{equation*}
such that $p$ doesn't divide $a'$ and $p$ doesn't divide $b'$. By unique prime factorization, we know $v(x)\in\mathbb{Z}$ is unique. Basically, we want to extract from $b$ and from $c$ as high a power of the prime number $p$ as possible.

The p-adic absolute value is defined as follows,
\begin{equation*}
|x|_p = p^{-v(x)}
\end{equation*}
and we define $|0|_p=0$.
This absolute value on $\mathbb{Q}$ measures how divisible a rational is by the prime number $p$, and since $0$ can be viewed as infinitely divisible by $p$, it is reasonable to define it as $0$.

Properties The p-adic absolute value can be easily checked to have the following properties,
$|a|_p = 0 \Leftrightarrow a=0$
$|ab|_p=|a|_p|b|_p$
$|a+b|_p\le {\max}\{|a|_p,|b|_p\}\le |a|_p + |b|_p$
As seen above, $|\ |_p$ is a non-Archimedian absolute value, and actually, it is the only non-trivial absolute value on $\mathbb{Q}$ where $p$ is allowed to be $\infty$ besides prime numbers.