Processing math: 35%

8. Numerical mathematics

8.1 Errors

There will be an error in the solution if a problem has a number of parameters which are not exactly known. The dependency between errors in input data and errors in the solution can be expressed in the condition number c. If the problem is given by x=ϕ(a) the first-order approximation for an error δa in a is: δxx=aϕ(a)ϕ(a)δaa The number c(a)=|aϕ(a)|/|ϕ(a)|. c1 if the problem is well-conditioned.

8.2 Floating point representations

The floating point representation depends on 4 natural numbers:

  1. The basis of the number system β,
  2. The length of the mantissa t,
  3. The length of the exponent q,
  4. The sign s.

Than the representation of machine numbers becomes: rd(x)=smβe where mantissa m is a number with t β-based numbers and for which holds 1/β|m|<1, and e is a number with q β-based numbers for which holds |e|βq1. The number 0 is added to this set, for example with m=e=0. The largest machine number is amax=(1βt)ββq1 and the smallest positive machine number is amin=ββq The distance between two successive machine numbers in the interval [βp1,βp] is βpt. If x is a real number and the closest machine number is rd(x), than holds: rd(x)=x(1+ε)   with   |ε|12β1tx=rd(x)(1+ε)   with   |ε|12β1t The number η:=12β1t is called the machine-accuracy, and ε,εη|xrd(x)x|η An often used 32 bits float format is: 1 bit for s, 8 for the exponent and 23 for de mantissa. The base here is 2.

8.3 Systems of equations

We want to solve the matrix equation Ax=b for a non-singular A, which is equivalent to finding the inverse matrix A1. Inverting a n×n matrix via Cramer's rule requires too much multiplications f(n) with n!f(n)(e1)n!, so other methods are preferable.

8.3.1 Triangular matrices

Consider the equation Ux=c where U is a right-upper triangular, this is a matrix in which Uij=0 for all j<i, and all Uii0. Than: xn=cn/Unnxn1=(cn1Un1,nxn)/Un1,n1x1=(c1nj=2U1jxj)/U11

In code:

for (k = n; k > 0; k--)
{
  S = c[k];
  for (j = k + 1; j < n; j++)
  {
    S -= U[k][j] * x[j];
  }
  x[k] = S / U[k][k];
}
This algorithm requires 12n(n+1) floating point calculations.

8.3.2 Gauss elimination

Consider a general set Ax=b. This can be reduced by Gauss elimination to a triangular form by multiplying the first equation with Ai1/A11 and than subtract it from all others; now the first column contains all 0's except A11. Than the 2nd equation is subtracted in such a way from the others that all elements on the second row are 0 except A22, etc. In code:
for (k = 1; k <= n; k++)
{
  for (j = k; j <= n; j++) U[k][j] = A[k][j];
  c[k] = b[k];

  for (i = k + 1; i <= n; i++)
  {
    L = A[i][k] / U[k][k];
    for (j = k + 1; j <= n; j++)
    {
      A[i][j] -= L * U[k][j];
    }
    b[i] -= L * c[k];
  }
}
This algorithm requires 13n(n21) floating point multiplications and divisions for operations on the coefficient matrix and 12n(n1) multiplications for operations on the right-hand terms, whereafter the triangular set has to be solved with 12n(n+1) operations.

8.3.3 Pivot strategy

Some equations have to be interchanged if the corner elements A11,A(1)22,... are not all 0 to allow Gauss elimination to work. In the following, A(n) is the element after the nth iteration. One method is: if A(k1)kk=0, than search for an element A(k1)pk with p>k that is 0 and interchange the pth and the nth equation. This strategy fails only if the set is singular and has no solution at all.

8.4 Roots of functions

8.4.1 Successive substitution

We want to solve the equation F(x)=0, so we want to find the root α with F(α)=0.

Many solutions are essentially the following:

  1. Rewrite the equation in the form x=f(x) so that a solution of this equation is also a solution of F(x)=0. Further, f(x) may not vary too much with respect to x near α.
  2. Assume an initial estimation x0 for α and obtain the series xn with xn=f(xn1), in the hope that lim.

Example: choose f(x)=\beta-\varepsilon\frac{h(x)}{g(x)}=x-\frac{F(x)}{G(x)} than we can expect that the row x_n with \begin{eqnarray*} x_0&=&\beta\\ x_n&=&x_{n-1}-\varepsilon\frac{h(x_{n-1})}{g(x_{n-1})} \end{eqnarray*} converges to \alpha.

8.4.2 Local convergence

Let \alpha be a solution of x=f(x) and let x_n=f(x_{n-1}) for a given x_0. Let f'(x) be continuous in a neighbourhood of \alpha. Let f'(\alpha)=A with |A|<1. Than there exists a \delta>0 so that for each x_0 with |x_0-\alpha|\leq\delta holds:

  1. \lim\limits_{n\rightarrow\infty}n_n=\alpha,
  2. If for a particular k holds: x_k=\alpha, than for each n\geq k holds that x_n=\alpha. If x_n\neq\alpha for all n than holds \lim_{n\rightarrow\infty}\frac{\alpha-x_n}{\alpha-x_{n-1}}=A~~~,~~~ \lim_{n\rightarrow\infty}\frac{x_n-x_{n-1}}{x_{n-1}-x_{n-2}}=A~~~,~~~ \lim_{n\rightarrow\infty}\frac{\alpha-x_n}{x_n-x_{n-1}}=\frac{A}{1-A}

The quantity A is called the {\it asymptotic convergence factor}, the quantity B=-{}^{10}\log|A| is called the {\it asymptotic convergence speed}.

8.4.3 Aitken extrapolation

We define A=\lim_{n\rightarrow\infty}\frac{x_n-x_{n-1}}{x_{n-1}-x_{n-2}} A converges to f'(a). Than the row \alpha_n=x_n+\frac{A_n}{1-A_n}(x_n-x_{n-1}) will converge to \alpha.

8.4.4 Newton iteration

There are more ways to transform F(x)=0 into x=f(x). One essential condition for them all is that in a neighbourhood of a root \alpha holds that |f'(x)|<1, and the smaller f'(x), the faster the series converges. A general method to construct f(x) is: f(x)=x-\Phi(x)F(x) with \Phi(x)\neq0 in a neighbourhood of \alpha. If one chooses: \Phi(x)=\frac{1}{F'(x)} Than this becomes Newtons method. The iteration formula than becomes: x_n=x_{n-1}-\frac{F(x_{n-1})}{F'(x_{n-1})} Some remarks:

For F(x)=x^k-a the series becomes: x_n=\frac{1}{k}\left((k-1)x_{n-1}+\frac{a}{x_{n-1}^{k-1}}\right) This is a well-known way to compute roots.

The following code finds the root of a function by means of Newton's method. The root lies within the interval {\tt [x1, x2]}. The value is adapted until the accuracy is better than {\tt\pmeps}. The function {\tt funcd} is a routine that returns both the function and its first derivative in point {\tt x} in the passed pointers.

float SolveNewton(void (*funcd)(float, float*, float*), float x1, float x2, float eps)
{
  int   j, max_iter = 25;
  float df, dx, f, root;

  root = 0.5 * (x1 + x2);
  for (j = 1; j <= max_iter; j++)
  {
    (*funcd)(root, &f, &df);
    dx  = f/df;
    root = -dx;
    if ( (x1 - root)*(root - x2) < 0.0 )
    {
      perror("Jumped out of brackets in SolveNewton.");
      exit(1);
    }
    if ( fabs(dx) < eps ) return root; /* Convergence */
  }
  perror("Maximum number of iterations exceeded in SolveNewton.");
  exit(1);
  return 0.0;
}

8.4.5 The secant method

This is, in contrast to the two methods discussed previously, a two-step method. If two approximations x_n and x_{n-1} exist for a root, than one can find the next approximation with x_{n+1}=x_n-F(x_n)\frac{x_n-x_{n-1}}{F(x_n)-F(x_{n-1})} If F(x_n) and F(x_{n-1}) have a different sign one is interpolating, otherwise extrapolating.

8.5 Polynomial interpolation

A base for polynomials of order n is given by {\it Lagrange's interpolation polynomials}: L_j(x)=\prod_{{l=0}\atop{l\neq j}}^n\frac{x-x_l}{x_j-x_l} The following holds:

  1. Each L_j(x) has order n,
  2. L_j(x_i)=\delta_{ij} for i,j=0,1,...,n,
  3. Each polynomial p(x) can be written uniquely as p(x)=\sum_{j=0}^n c_jL_j(x)~~~\mbox{with}~~~c_j=p(x_j)

This is not a suitable method to calculate the value of a ploynomial in a given point x=a. To do this, the Horner algorithm is more usable: the value s=\sum_k c_kx^k in x=a can be calculated as follows:

float GetPolyValue(float c[], int n)
{
  int i; float s = c[n];
  for (i = n - 1; i >= 0; i--)
  {
    s = s * a + c[i];
  }
  return s;
}
After it is finished s has value p(a).

8.6 Definite integrals

Almost all numerical methods are based on a formula of the type: \int\limits_a^bf(x)dx=\sum_{i=0}^nc_if(x_i)+R(f) with n, c_i and x_i independent of f(x) and R(f) the error which has the form R(f)=Cf^{(q)}(\xi) for all common methods. Here, \xi\in(a,b) and q\geq n+1. Often the points x_i are chosen equidistant. Some common formulas are:

The interval will usually be split up and the integration formulas be applied to the partial intervals if f varies much within the interval.

A Gaussian integration formula is obtained when one wants to get both the coefficients c_j and the points x_j in an integral formula so that the integral formula gives exact results for polynomials of an order as high as possible. Two examples are:

  1. Gaussian formula with 2 points: \int\limits_{-h}^hf(x)dx=h\left[f\left(\frac{-h}{\sqrt{3}}\right)+f\left(\frac{h}{\sqrt{3}}\right)\right]+\frac{h^5}{135}f^{(4)}(\xi)
  2. Gaussian formula with 3 points: \int\limits_{-h}^hf(x)dx=\frac{h}{9}\left[5f\left(-h\sqrt{\mbox{$\frac{3}{5}$}}\right)+8f(0)+5f\left(h\sqrt{\mbox{$\frac{3}{5}$}}\right)\right]+\frac{h^7}{15750}f^{(6)}(\xi)

8.7 Derivatives

There are several formulas for the numerical calculation of f'(x):

There are also formulas for higher derivatives: f''(x)=\frac{-f(x+2h)+16f(x+h)-30f(x)+16f(x-h)-f(x-2h)}{12h^2}+\frac{h^4}{90}f^{(6)}(\xi)

8.8 Differential equations

We start with the first order DE y'(x)=f(x,y) for x>x_0 and initial condition y(x_0)=x_0. Suppose we find approximations z_1,z_2,...,z_n for y(x_1), y(x_2),..., y(x_n). Than we can derive some formulas to obtain z_{n+1} as approximation for y(x_{n+1}):

Runge-Kutta methods are an important class of single-step methods. They work so well because the solution y(x) can be written as: y_{n+1}=y_n+hf(\xi_n,y(\xi_n))~~~\mbox{with}~~~\xi_n\in(x_n,x_{n+1}) Because \xi_n is unknown some measurements'' are done on the increment function k=hf(x,y) in well chosen points near the solution. Than one takes for z_{n+1}-z_n a weighted average of the measured values. One of the possible 3rd order Runge-Kutta methods is given by: \begin{eqnarray*} k_1&=&hf(x_n,z_n)\\ k_2&=&hf(x_n+\frac{1}{2} h,z_n+\frac{1}{2} k_1)\\ k_3&=&hf(x_n+\mbox{$\frac{3}{4}$}h,z_n+\mbox{$\frac{3}{4}$}k_2)\\ z_{n+1}&=&z_n+\mbox{$\frac{1}{9}$}(2k_1+3k_2+4k_3) \end{eqnarray*} and the classical 4th order method is: \begin{eqnarray*} k_1&=&hf(x_n,z_n)\\ k_2&=&hf(x_n+\frac{1}{2} h,z_n+\frac{1}{2} k_1)\\ k_3&=&hf(x_n+\frac{1}{2} h,z_n+\frac{1}{2} k_2)\\ k_4&=&hf(x_n+h,z_n+k_3)\\ z_{n+1}&=&z_n+\mbox{$\frac{1}{6}$}(k_1+2k_2+2k_3+k_4) \end{eqnarray*} Often the accuracy is increased by adjusting the stepsize for each step with the estimated error. Step doubling is most often used for 4th order Runge-Kutta.

8.9 The fast Fourier transform

The Fourier transform of a function can be approximated when some discrete points are known. Suppose we have N successive samples h_k=h(t_k) with t_k=k\Delta, k=0,1,2,...,N-1. Than the discrete Fourier transform is given by: H_n=\sum_{k=0}^{N-1}h_k{\rm e}^{2\pi ikn/N} and the inverse Fourier transform by h_k=\frac{1}{N}\sum_{n=0}^{N-1}H_n{\rm e}^{-2\pi ikn/N} This operation is order N^2. It can be faster, order N\cdot ^2\log(N), with the fast Fourier transform. The basic idea is that a Fourier transform of length N can be rewritten as the sum of two discrete Fourier transforms, each of length N/2. One is formed from the even-numbered points of the original N, the other from the odd-numbered points.

This can be implemented as follows. The array data[1..2*nn] contains on the odd positions the real and on the even positions the imaginary parts of the input data: data[1] is the real part and data[2] the imaginary part of f_0, etc. The next routine replaces the values in data by their discrete Fourier transformed values if isign =1, and by their inverse transformed values if {\tt isign} =-1. nn must be a power of 2.

#include <math.h>
#define SWAP(a,b) tempr=(a);(a)=(b);(b)=tempr

void FourierTransform(float data[], unsigned long nn, int isign)
{
  unsigned long n, mmax, m, j, istep, i;
  double        wtemp, wr, wpr, wpi, wi, theta;
  float         tempr, tempi;

  n = nn << 1;
  j = 1;
  for (i = 1; i < n; i += 2)
  {
    if ( j > i )
    {
      SWAP(data[j], data[i]);
      SWAP(data[j+1], data[i+1]);
    }
    m = n >> 1;
    while ( m >= 2 && j > m )
    {
      j -= m;
      m >>= 1;
    }
    j += m;
  }
  mmax = 2;
  while ( n > mmax ) /* Outermost loop, is executed log2(nn) times */
  {
    istep = mmax << 1;
    theta = isign * (6.28318530717959/mmax);
    wtemp = sin(0.5 * theta);
    wpr   = -2.0 * wtemp * wtemp;
    wpi   = sin(theta);
    wr    = 1.0;
    wi    = 0.0;
    for (m = 1; m < mmax; m += 2)
    {
      for (i = m; i <= n; i += istep) /* Danielson-Lanczos equation */
      {
        j          = i + mmax;
        tempr      = wr * data[j]   - wi * data[j+1];
        tempi      = wr * data[j+1] + wi * data[j];
        data[j]    = data[i]   - tempr;
        data[j+1]  = data[i+1] - tempi;
        data[i]   += tempr;
        data[i+1] += tempi;
      }
      wr = (wtemp = wr) * wpr - wi * wpi + wr;
      wi = wi * wpr + wtemp * wpi + wi;
    }
    mmax = istep;
  }
}