Processing math: 1%

2. Probability and statistics

2.1 Combinations

The number of possible combinations of k elements from n elements is given by {n\choose k}=\frac{n!}{k!(n-k)!} The number of permutations of p from n is given by \frac{n!}{(n-p)!}=p!{n\choose p} The number of different ways to classify n_i elements in i groups, when the total number of elements is N, is \frac{N!}{\prod\limits_i n_i!}

2.2 Probability theory

The probability P(A) that an event A occurs is defined by: P(A)=\frac{n(A)}{n(U)} where n(A) is the number of events when A occurs and n(U) the total number of events.

The probability P(\neg A) that A does not occur is: P(\neg A)=1-P(A). The probability P(A\cup B) that A and B both occur is given by: P(A\cup B)=P(A)+P(B)-P(A\cap B). If A and B are independent, than holds: P(A\cap B)=P(A)\cdot P(B).

The probability P(A|B) that A occurs, given the fact that B occurs, is: P(A|B)=\frac{P(A\cap B)}{P(B)}

2.3 Statistics

2.3.1 General

The average or mean value \langle x\rangle of a collection of values is: \langle x\rangle=\sum_i x_i/n. The standard deviation \sigma_x in the distribution of x is given by: \sigma_x=\sqrt{\frac{\sum\limits_{i=1}^n(x_i-\langle x \rangle)^2}{n}} When samples are being used the sample variance s is given by \displaystyle s^2=\frac{n}{n-1}\sigma^2.

The covariance \sigma_{xy} of x and y is given by: \sigma_{xy}=\frac{\sum\limits_{i=1}^n(x_i-\langle x \rangle)(y_i-\langle y\rangle)}{n-1} The correlation coefficient r_{xy} of x and y than becomes: r_{xy}=\sigma_{xy}/\sigma_x\sigma_y.

The standard deviation in a variable f(x,y) resulting from errors in x and y is: \sigma^2_{f(x,y)}=\left(\frac{\partial f}{\partial x}\sigma_x\right)^2+\left(\frac{\partial f}{\partial y}\sigma_y\right)^2+ \frac{\partial f}{\partial x}\frac{\partial f}{\partial y}\sigma_{xy}

2.3.2 Distributions

  1. The Binomial distribution is the distribution describing a sampling with replacement. The probability for success is p. The probability P for k successes in n trials is then given by: P(x=k)={n\choose k}p^k(1-p)^{n-k} The standard deviation is given by \sigma_x=\sqrt{np(1-p)} and the expectation value is \varepsilon=np.
  2. The Hypergeometric distribution is the distribution describing a sampling without replacement in which the order is irrelevant. The probability for k successes in a trial with A possible successes and B possible failures is then given by: P(x=k)=\frac{\displaystyle{A\choose k}{B\choose n-k}}{\displaystyle{A+B\choose n}} The expectation value is given by \varepsilon=nA/(A+B).
  3. The Poisson distribution is a limiting case of the binomial distribution when p\rightarrow0, n\rightarrow\infty and also np=\lambda is constant. P(x)=\frac{\lambda^x {\rm e}^{-\lambda}}{x!} This distribution is normalized to \displaystyle\sum\limits_{x=0}^\infty P(x)=1.
  4. The Normal distribution is a limiting case of the binomial distribution for continuous variables: P(x)=\frac{1}{\sigma\sqrt{2\pi}}\exp\left(-\frac{1}{2}\left(\frac{x-\langle x\rangle}{\sigma}\right)^2\right)
  5. The Uniform distribution occurs when a random number x is taken from the set a\leq x\leq b and is given by: \left\{\begin{array}{l}\displaystyle P(x)=\frac{1}{b-a}~~~\mbox{if}~~~a\leq x\leq b\\ \\ P(x)=0~~~\mbox{in all other cases} \end{array}\right. \langle x\rangle=\frac{1}{2}(b-a) and \displaystyle\sigma^2=\frac{(b-a)^2}{12}.
  6. The Gamma distribution is given by: \left\{\begin{array}{l}\displaystyle P(x)=\frac{x^{\alpha-1}{\rm e}^{-x/\beta}}{\beta^\alpha\Gamma(\alpha)}~~~\mbox{if}~~~0\leq y\leq\infty \end{array}\right. with \alpha>0 and \beta>0. The distribution has the following properties: \langle x\rangle=\alpha\beta, \sigma^2=\alpha\beta^2.
  7. The Beta distribution is given by: \left\{\begin{array}{l}\displaystyle P(x)=\frac{x^{\alpha-1}(1-x)^{\beta-1}}{\beta(\alpha,\beta)}~~~\mbox{if}~~~0\leq x\leq1\\ \\ P(x)=0~~~\mbox{everywhere else} \end{array}\right. and has the following properties: \displaystyle\langle x\rangle=\frac{\alpha}{\alpha+\beta}, \displaystyle\sigma^2=\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}.

    For P(\chi^2) holds: \alpha=V/2 and \beta=2.

  8. The Weibull distribution is given by: \left\{\begin{array}{l}\displaystyle P(x)=\frac{\alpha}{\beta}x^{\alpha-1}{\rm e}^{-x^\alpha}~~~\mbox{if}~~~0\leq x\leq\infty\wedge\alpha\wedge\beta>0\\ \\ P(x)=0~~~\mbox{in all other cases} \end{array}\right. The average is \langle x\rangle=\beta^{1/\alpha}\Gamma((\alpha+1)\alpha)
  9. For a two-dimensional distribution holds: P_1(x_1)=\int P(x_1,x_2)dx_2~~,~~~P_2(x_2)=\int P(x_1,x_2)dx_1 with \varepsilon(g(x_1,x_2))=\iint g(x_1,x_2)P(x_1,x_2)dx_1dx_2=\sum_{x_1}\sum_{x_2}g\cdot P

2.4 Regression analyses

When there exists a relation between the quantities x and y of the form y=ax+b and there is a measured set x_i with related y_i, the following relation holds for a and b with \vec{x}=(x_1,x_2,...,x_n) and \vec{e}=(1,1,...,1): \vec{y}-a\vec{x}-b\vec{e}\in<\vec{x},\vec{e}>^\perp From this follows that the inner products are 0: \left\{ \begin{array}{l} (\vec{y},\vec{x})-a(\vec{x},\vec{x})-b(\vec{e},\vec{x})=0\\ (\vec{y},\vec{e})-a(\vec{x},\vec{e})-b(\vec{e},\vec{e})=0 \end{array}\right. with (\vec{x},\vec{x})=\sum\limits_i x_i^2, (\vec{x},\vec{y})=\sum\limits_ix_iy_i, (\vec{x},\vec{e})=\sum\limits_ix_i and (\vec{e},\vec{e})=n. a and b follow from this.

A similar method works for higher order polynomial fits: for a second order fit holds: \vec{y}-a\vec{x^2}-b\vec{x}-c\vec{e}\in<\vec{x^2},\vec{x},\vec{e}>^\perp with \vec{x^2}=(x_1^2,...,x_n^2).

The correlation coefficient r is a measure for the quality of a fit. In case of linear regression it is given by: r=\frac{n\sum xy-\sum x\sum y}{\sqrt{(n\sum x^2-(\sum x)^2)(n\sum y^2-(\sum y)^2)}}