Processing math: 35%
There will be an error in the solution if a problem has a number of parameters
which are not exactly known. The dependency between errors in input data and
errors in the solution can be expressed in the condition number c. If
the problem is given by x=ϕ(a) the first-order approximation for an error
δa in a is:
δxx=aϕ′(a)ϕ(a)⋅δaa
The number c(a)=|aϕ′(a)|/|ϕ(a)|. c≪1 if the problem is
well-conditioned.
The floating point representation depends on 4 natural numbers:
- The basis of the number system β,
- The length of the mantissa t,
- The length of the exponent q,
- The sign s.
Than the representation of machine numbers becomes: rd(x)=s⋅m⋅βe
where mantissa m is a number with t β-based numbers and for which holds
1/β≤|m|<1, and e is a number with q β-based numbers for which
holds |e|≤βq−1. The number 0 is added to this set, for example with
m=e=0. The largest machine number is
amax=(1−β−t)ββq−1
and the smallest positive machine number is
amin=β−βq
The distance between two successive machine numbers in the interval
[βp−1,βp] is βp−t. If x is a real number and the
closest machine number is rd(x), than holds:
rd(x)=x(1+ε) with |ε|≤12β1−tx=rd(x)(1+ε′) with |ε′|≤12β1−t
The number η:=12β1−t is called the machine-accuracy, and
ε,ε′≤η|x−rd(x)x|≤η
An often used 32 bits float format is: 1 bit for s, 8 for the exponent
and 23 for de mantissa. The base here is 2.
We want to solve the matrix equation A→x=→b for a non-singular
A, which is equivalent to finding the inverse matrix A−1. Inverting
a n×n matrix via Cramer's rule requires too much multiplications
f(n) with n!≤f(n)≤(e−1)n!, so other methods are preferable.
Consider the equation U→x=→c where U is a right-upper triangular,
this is a matrix in which Uij=0 for all j<i, and all Uii≠0. Than:
xn=cn/Unnxn−1=(cn−1−Un−1,nxn)/Un−1,n−1⋮⋮x1=(c1−n∑j=2U1jxj)/U11
In code:
for (k = n; k > 0; k--)
{
S = c[k];
for (j = k + 1; j < n; j++)
{
S -= U[k][j] * x[j];
}
x[k] = S / U[k][k];
}
This algorithm requires 12n(n+1) floating point calculations.
Consider a general set A→x=→b. This can be reduced by Gauss
elimination to a triangular form by multiplying the first equation with
Ai1/A11 and than subtract it from all others; now the first column
contains all 0's except A11. Than the 2nd equation is subtracted in
such a way from the others that all elements on the second row are 0 except
A22, etc. In code:
for (k = 1; k <= n; k++)
{
for (j = k; j <= n; j++) U[k][j] = A[k][j];
c[k] = b[k];
for (i = k + 1; i <= n; i++)
{
L = A[i][k] / U[k][k];
for (j = k + 1; j <= n; j++)
{
A[i][j] -= L * U[k][j];
}
b[i] -= L * c[k];
}
}
This algorithm requires 13n(n2−1) floating point multiplications
and divisions for operations on the coefficient matrix and 12n(n−1)
multiplications for operations on the right-hand terms, whereafter the
triangular set has to be solved with 12n(n+1) operations.
Some equations have to be interchanged if the corner elements
A11,A(1)22,... are not all ≠0 to allow Gauss elimination to
work. In the following, A(n) is the element after the nth iteration.
One method is: if A(k−1)kk=0, than search for an element A(k−1)pk
with p>k that is ≠0 and interchange the pth and the nth equation. This
strategy fails only if the set is singular and has no solution at all.
We want to solve the equation F(x)=0, so we want to find the root α with F(α)=0.
Many solutions are essentially the following:
- Rewrite the equation in the form x=f(x) so that a solution of this
equation is also a solution of F(x)=0. Further, f(x) may not vary
too much with respect to x near α.
- Assume an initial estimation x0 for α and obtain the series
xn with xn=f(xn−1), in the hope that lim.
Example: choose
f(x)=\beta-\varepsilon\frac{h(x)}{g(x)}=x-\frac{F(x)}{G(x)}
than we can expect that the row x_n with
\begin{eqnarray*}
x_0&=&\beta\\
x_n&=&x_{n-1}-\varepsilon\frac{h(x_{n-1})}{g(x_{n-1})}
\end{eqnarray*}
converges to \alpha.
Let \alpha be a solution of x=f(x) and let x_n=f(x_{n-1}) for a given
x_0. Let f'(x) be continuous in a neighbourhood of \alpha. Let
f'(\alpha)=A with |A|<1. Than there exists a \delta>0 so that for each
x_0 with |x_0-\alpha|\leq\delta holds:
- \lim\limits_{n\rightarrow\infty}n_n=\alpha,
- If for a particular k holds: x_k=\alpha, than for each n\geq k
holds that x_n=\alpha. If x_n\neq\alpha for all n than holds
\lim_{n\rightarrow\infty}\frac{\alpha-x_n}{\alpha-x_{n-1}}=A~~~,~~~
\lim_{n\rightarrow\infty}\frac{x_n-x_{n-1}}{x_{n-1}-x_{n-2}}=A~~~,~~~
\lim_{n\rightarrow\infty}\frac{\alpha-x_n}{x_n-x_{n-1}}=\frac{A}{1-A}
The quantity A is called the {\it asymptotic convergence factor}, the
quantity B=-{}^{10}\log|A| is called the {\it asymptotic convergence speed}.
We define
A=\lim_{n\rightarrow\infty}\frac{x_n-x_{n-1}}{x_{n-1}-x_{n-2}}
A converges to f'(a). Than the row
\alpha_n=x_n+\frac{A_n}{1-A_n}(x_n-x_{n-1})
will converge to \alpha.
There are more ways to transform F(x)=0 into x=f(x). One essential
condition for them all is that in a neighbourhood of a root \alpha holds
that |f'(x)|<1, and the smaller f'(x), the faster the series converges.
A general method to construct f(x) is:
f(x)=x-\Phi(x)F(x)
with \Phi(x)\neq0 in a neighbourhood of \alpha. If one chooses:
\Phi(x)=\frac{1}{F'(x)}
Than this becomes Newtons method. The iteration formula than becomes:
x_n=x_{n-1}-\frac{F(x_{n-1})}{F'(x_{n-1})}
Some remarks:
- This same result can also be derived with Taylor series.
- Local convergence is often difficult to determine.
- f x_n is far apart from \alpha the convergence can sometimes be very slow.
- The assumption F'(\alpha)\neq0 means that \alpha is a simple root.
For F(x)=x^k-a the series becomes:
x_n=\frac{1}{k}\left((k-1)x_{n-1}+\frac{a}{x_{n-1}^{k-1}}\right)
This is a well-known way to compute roots.
The following code finds the root of a function by means of Newton's method. The
root lies within the interval {\tt [x1, x2]}. The value is adapted until the
accuracy is better than {\tt\pmeps}. The function {\tt funcd} is a routine
that returns both the function and its first derivative in point {\tt x} in the
passed pointers.
float SolveNewton(void (*funcd)(float, float*, float*), float x1, float x2, float eps)
{
int j, max_iter = 25;
float df, dx, f, root;
root = 0.5 * (x1 + x2);
for (j = 1; j <= max_iter; j++)
{
(*funcd)(root, &f, &df);
dx = f/df;
root = -dx;
if ( (x1 - root)*(root - x2) < 0.0 )
{
perror("Jumped out of brackets in SolveNewton.");
exit(1);
}
if ( fabs(dx) < eps ) return root; /* Convergence */
}
perror("Maximum number of iterations exceeded in SolveNewton.");
exit(1);
return 0.0;
}
This is, in contrast to the two methods discussed previously, a two-step
method. If two approximations x_n and x_{n-1} exist for a root, than
one can find the next approximation with
x_{n+1}=x_n-F(x_n)\frac{x_n-x_{n-1}}{F(x_n)-F(x_{n-1})}
If F(x_n) and F(x_{n-1}) have a different sign one is interpolating,
otherwise extrapolating.
A base for polynomials of order n is given by {\it Lagrange's interpolation
polynomials}:
L_j(x)=\prod_{{l=0}\atop{l\neq j}}^n\frac{x-x_l}{x_j-x_l}
The following holds:
- Each L_j(x) has order n,
- L_j(x_i)=\delta_{ij} for i,j=0,1,...,n,
- Each polynomial p(x) can be written uniquely as
p(x)=\sum_{j=0}^n c_jL_j(x)~~~\mbox{with}~~~c_j=p(x_j)
This is not a suitable method to calculate the value of a ploynomial in a
given point x=a. To do this, the Horner algorithm is more usable: the
value s=\sum_k c_kx^k in x=a can be calculated as follows:
float GetPolyValue(float c[], int n)
{
int i; float s = c[n];
for (i = n - 1; i >= 0; i--)
{
s = s * a + c[i];
}
return s;
}
After it is finished s has value p(a).
Almost all numerical methods are based on a formula of the type:
\int\limits_a^bf(x)dx=\sum_{i=0}^nc_if(x_i)+R(f)
with n, c_i and x_i independent of f(x) and R(f) the error which
has the form R(f)=Cf^{(q)}(\xi) for all common methods. Here, \xi\in(a,b)
and q\geq n+1. Often the points x_i are chosen equidistant.
Some common formulas are:
- The trapezoid rule: n=1, x_0=a, x_1=b, h=b-a:
\int\limits_a^bf(x)dx=\frac{h}{2}[f(x_0)+f(x_1)]-\frac{h^3}{12}f''(\xi)
- Simpson's rule: n=2, x_0=a, x_1=\frac{1}{2}(a+b), x_2=b, h=\frac{1}{2}(b-a):
\int\limits_a^bf(x)dx=\frac{h}{3}[f(x_0)+4f(x_1)+f(x_2)]-\frac{h^5}{90}f^{(4)}(\xi)
- The midpoint rule: n=0, x_0=\frac{1}{2}(a+b), h=b-a:
\int\limits_a^bf(x)dx=hf(x_0)+\frac{h^3}{24}f''(\xi)
The interval will usually be split up and the integration formulas be
applied to the partial intervals if f varies much within the interval.
A Gaussian integration formula is obtained when one wants to get both the
coefficients c_j and the points x_j in an integral formula so that the
integral formula gives exact results for polynomials of an order as high as
possible. Two examples are:
- Gaussian formula with 2 points:
\int\limits_{-h}^hf(x)dx=h\left[f\left(\frac{-h}{\sqrt{3}}\right)+f\left(\frac{h}{\sqrt{3}}\right)\right]+\frac{h^5}{135}f^{(4)}(\xi)
- Gaussian formula with 3 points:
\int\limits_{-h}^hf(x)dx=\frac{h}{9}\left[5f\left(-h\sqrt{\mbox{$\frac{3}{5}$}}\right)+8f(0)+5f\left(h\sqrt{\mbox{$\frac{3}{5}$}}\right)\right]+\frac{h^7}{15750}f^{(6)}(\xi)
There are several formulas for the numerical calculation of f'(x):
- Forward differentiation:
f'(x)=\frac{f(x+h)-f(x)}{h}-\frac{1}{2} hf''(\xi)
- Backward differentiation:
f'(x)=\frac{f(x)-f(x-h)}{h}+\frac{1}{2} hf''(\xi)
- Central differentiation:
f'(x)=\frac{f(x+h)-f(x-h)}{2h}-\frac{h^2}{6}f'''(\xi)
- The approximation is better if more function values are used:
f'(x)=\frac{-f(x+2h)+8f(x+h)-8f(x-h)+f(x-2h)}{12h}+\frac{h^4}{30}f^{(5)}(\xi)
There are also formulas for higher derivatives:
f''(x)=\frac{-f(x+2h)+16f(x+h)-30f(x)+16f(x-h)-f(x-2h)}{12h^2}+\frac{h^4}{90}f^{(6)}(\xi)
We start with the first order DE y'(x)=f(x,y) for x>x_0 and initial condition
y(x_0)=x_0. Suppose we find approximations z_1,z_2,...,z_n for y(x_1),
y(x_2),..., y(x_n). Than we can derive some formulas to obtain
z_{n+1} as approximation for y(x_{n+1}):
- Euler (single step, explicit):
z_{n+1}=z_n+hf(x_n,z_n)+\frac{h^2}{2}y''(\xi)
- Midpoint rule (two steps, explicit):
z_{n+1}=z_{n-1}+2hf(x_n,z_n)+\frac{h^3}{3}y'''(\xi)
- Trapezoid rule (single step, implicit):
z_{n+1}=z_n+\frac{1}{2} h(f(x_n,z_n)+f(x_{n+1},z_{n+1}))-\frac{h^3}{12}y'''(\xi)
Runge-Kutta methods are an important class of single-step methods. They
work so well because the solution y(x) can be written as:
y_{n+1}=y_n+hf(\xi_n,y(\xi_n))~~~\mbox{with}~~~\xi_n\in(x_n,x_{n+1})
Because \xi_n is unknown some measurements'' are done on the
increment function k=hf(x,y) in well chosen points near the solution.
Than one takes for z_{n+1}-z_n a weighted average of the measured values.
One of the possible 3rd order Runge-Kutta methods is given by:
\begin{eqnarray*}
k_1&=&hf(x_n,z_n)\\
k_2&=&hf(x_n+\frac{1}{2} h,z_n+\frac{1}{2} k_1)\\
k_3&=&hf(x_n+\mbox{$\frac{3}{4}$}h,z_n+\mbox{$\frac{3}{4}$}k_2)\\
z_{n+1}&=&z_n+\mbox{$\frac{1}{9}$}(2k_1+3k_2+4k_3)
\end{eqnarray*}
and the classical 4th order method is:
\begin{eqnarray*}
k_1&=&hf(x_n,z_n)\\
k_2&=&hf(x_n+\frac{1}{2} h,z_n+\frac{1}{2} k_1)\\
k_3&=&hf(x_n+\frac{1}{2} h,z_n+\frac{1}{2} k_2)\\
k_4&=&hf(x_n+h,z_n+k_3)\\
z_{n+1}&=&z_n+\mbox{$\frac{1}{6}$}(k_1+2k_2+2k_3+k_4)
\end{eqnarray*}
Often the accuracy is increased by adjusting the stepsize for each step with
the estimated error. Step doubling is most often used for 4th order Runge-Kutta.
The Fourier transform of a function can be approximated when some discrete
points are known. Suppose we have N successive samples h_k=h(t_k) with
t_k=k\Delta, k=0,1,2,...,N-1. Than the discrete Fourier transform is
given by:
H_n=\sum_{k=0}^{N-1}h_k{\rm e}^{2\pi ikn/N}
and the inverse Fourier transform by
h_k=\frac{1}{N}\sum_{n=0}^{N-1}H_n{\rm e}^{-2\pi ikn/N}
This operation is order N^2. It can be faster, order N\cdot ^2\log(N),
with the fast Fourier transform. The basic idea is that a Fourier transform
of length N can be rewritten as the sum of two discrete Fourier transforms,
each of length N/2. One is formed from the even-numbered points of
the original N, the other from the odd-numbered points.
This can be implemented as follows.
The array data[1..2*nn] contains on the odd positions the real and on the
even positions the imaginary parts of the input data: data[1]
is the real part and data[2] the imaginary part of f_0, etc. The
next routine replaces the values in data by their discrete Fourier
transformed values if isign =1, and by their inverse transformed values
if {\tt isign} =-1. nn must be a power of 2.
#include <math.h>
#define SWAP(a,b) tempr=(a);(a)=(b);(b)=tempr
void FourierTransform(float data[], unsigned long nn, int isign)
{
unsigned long n, mmax, m, j, istep, i;
double wtemp, wr, wpr, wpi, wi, theta;
float tempr, tempi;
n = nn << 1;
j = 1;
for (i = 1; i < n; i += 2)
{
if ( j > i )
{
SWAP(data[j], data[i]);
SWAP(data[j+1], data[i+1]);
}
m = n >> 1;
while ( m >= 2 && j > m )
{
j -= m;
m >>= 1;
}
j += m;
}
mmax = 2;
while ( n > mmax ) /* Outermost loop, is executed log2(nn) times */
{
istep = mmax << 1;
theta = isign * (6.28318530717959/mmax);
wtemp = sin(0.5 * theta);
wpr = -2.0 * wtemp * wtemp;
wpi = sin(theta);
wr = 1.0;
wi = 0.0;
for (m = 1; m < mmax; m += 2)
{
for (i = m; i <= n; i += istep) /* Danielson-Lanczos equation */
{
j = i + mmax;
tempr = wr * data[j] - wi * data[j+1];
tempi = wr * data[j+1] + wi * data[j];
data[j] = data[i] - tempr;
data[j+1] = data[i+1] - tempi;
data[i] += tempr;
data[i+1] += tempi;
}
wr = (wtemp = wr) * wpr - wi * wpi + wr;
wi = wi * wpr + wtemp * wpi + wi;
}
mmax = istep;
}
}