Than the representation of machine numbers becomes: rd$(x)=s\cdot m\cdot\beta^e$ where mantissa $m$ is a number with $t$ $\beta$-based numbers and for which holds $1/\beta\leq|m|<1$, and $e$ is a number with $q$ $\beta$-based numbers for which holds $|e|\leq\beta^q-1$. The number 0 is added to this set, for example with $m=e=0$. The largest machine number is \[ a_{\rm max}=(1-\beta^{-t})\beta^{\beta^q-1} \] and the smallest positive machine number is \[ a_{\rm min}=\beta^{-\beta^q} \] The distance between two successive machine numbers in the interval $[\beta^{p-1},\beta^p]$ is $\beta^{p-t}$. If $x$ is a real number and the closest machine number is ${\rm rd}(x)$, than holds: \begin{eqnarray*} {\rm rd}(x)=x(1+\varepsilon) &~~~\mbox{with}~~~&|\varepsilon|\leq\frac{1}{2}\beta^{1-t}\\ x={\rm rd}(x)(1+\varepsilon')&~~~\mbox{with}~~~&|\varepsilon'|\leq\frac{1}{2}\beta^{1-t} \end{eqnarray*} The number $\eta:=\frac{1}{2}\beta^{1-t}$ is called the machine-accuracy, and \[ \varepsilon,\varepsilon'\leq\eta\left|\frac{x-{\rm rd}(x)}{x}\right|\leq\eta \] An often used 32 bits float format is: 1 bit for $s$, 8 for the exponent and $23$ for de mantissa. The base here is 2.
In code:
for (k = n; k > 0; k--)
{
S = c[k];
for (j = k + 1; j < n; j++)
{
S -= U[k][j] * x[j];
}
x[k] = S / U[k][k];
}
This algorithm requires $\frac{1}{2}n(n+1)$ floating point calculations.
for (k = 1; k <= n; k++)
{
for (j = k; j <= n; j++) U[k][j] = A[k][j];
c[k] = b[k];
for (i = k + 1; i <= n; i++)
{
L = A[i][k] / U[k][k];
for (j = k + 1; j <= n; j++)
{
A[i][j] -= L * U[k][j];
}
b[i] -= L * c[k];
}
}
This algorithm requires $\frac{1}{3}n(n^2-1)$ floating point multiplications
and divisions for operations on the coefficient matrix and $\frac{1}{2}n(n-1)$
multiplications for operations on the right-hand terms, whereafter the
triangular set has to be solved with $\frac{1}{2}n(n+1)$ operations.
Many solutions are essentially the following:
Example: choose \[ f(x)=\beta-\varepsilon\frac{h(x)}{g(x)}=x-\frac{F(x)}{G(x)} \] than we can expect that the row $x_n$ with \begin{eqnarray*} x_0&=&\beta\\ x_n&=&x_{n-1}-\varepsilon\frac{h(x_{n-1})}{g(x_{n-1})} \end{eqnarray*} converges to $\alpha$.
The quantity $A$ is called the {\it asymptotic convergence factor}, the quantity $B=-{}^{10}\log|A|$ is called the {\it asymptotic convergence speed}.
For $F(x)=x^k-a$ the series becomes: \[ x_n=\frac{1}{k}\left((k-1)x_{n-1}+\frac{a}{x_{n-1}^{k-1}}\right) \] This is a well-known way to compute roots.
The following code finds the root of a function by means of Newton's method. The root lies within the interval {\tt [x1, x2]}. The value is adapted until the accuracy is better than {\tt$\pm$eps}. The function {\tt funcd} is a routine that returns both the function and its first derivative in point {\tt x} in the passed pointers.
float SolveNewton(void (*funcd)(float, float*, float*), float x1, float x2, float eps)
{
int j, max_iter = 25;
float df, dx, f, root;
root = 0.5 * (x1 + x2);
for (j = 1; j <= max_iter; j++)
{
(*funcd)(root, &f, &df);
dx = f/df;
root = -dx;
if ( (x1 - root)*(root - x2) < 0.0 )
{
perror("Jumped out of brackets in SolveNewton.");
exit(1);
}
if ( fabs(dx) < eps ) return root; /* Convergence */
}
perror("Maximum number of iterations exceeded in SolveNewton.");
exit(1);
return 0.0;
}
This is not a suitable method to calculate the value of a ploynomial in a given point $x=a$. To do this, the Horner algorithm is more usable: the value $s=\sum_k c_kx^k$ in $x=a$ can be calculated as follows:
float GetPolyValue(float c[], int n)
{
int i; float s = c[n];
for (i = n - 1; i >= 0; i--)
{
s = s * a + c[i];
}
return s;
}
After it is finished s has value $p(a)$.
The interval will usually be split up and the integration formulas be applied to the partial intervals if $f$ varies much within the interval.
A Gaussian integration formula is obtained when one wants to get both the coefficients $c_j$ and the points $x_j$ in an integral formula so that the integral formula gives exact results for polynomials of an order as high as possible. Two examples are:
There are also formulas for higher derivatives: \[ f''(x)=\frac{-f(x+2h)+16f(x+h)-30f(x)+16f(x-h)-f(x-2h)}{12h^2}+\frac{h^4}{90}f^{(6)}(\xi) \]
Runge-Kutta methods are an important class of single-step methods. They work so well because the solution $y(x)$ can be written as: \[ y_{n+1}=y_n+hf(\xi_n,y(\xi_n))~~~\mbox{with}~~~\xi_n\in(x_n,x_{n+1}) \] Because $\xi_n$ is unknown some ``measurements'' are done on the increment function $k=hf(x,y)$ in well chosen points near the solution. Than one takes for $z_{n+1}-z_n$ a weighted average of the measured values. One of the possible 3rd order Runge-Kutta methods is given by: \begin{eqnarray*} k_1&=&hf(x_n,z_n)\\ k_2&=&hf(x_n+\frac{1}{2} h,z_n+\frac{1}{2} k_1)\\ k_3&=&hf(x_n+\mbox{$\frac{3}{4}$}h,z_n+\mbox{$\frac{3}{4}$}k_2)\\ z_{n+1}&=&z_n+\mbox{$\frac{1}{9}$}(2k_1+3k_2+4k_3) \end{eqnarray*} and the classical 4th order method is: \begin{eqnarray*} k_1&=&hf(x_n,z_n)\\ k_2&=&hf(x_n+\frac{1}{2} h,z_n+\frac{1}{2} k_1)\\ k_3&=&hf(x_n+\frac{1}{2} h,z_n+\frac{1}{2} k_2)\\ k_4&=&hf(x_n+h,z_n+k_3)\\ z_{n+1}&=&z_n+\mbox{$\frac{1}{6}$}(k_1+2k_2+2k_3+k_4) \end{eqnarray*} Often the accuracy is increased by adjusting the stepsize for each step with the estimated error. Step doubling is most often used for 4th order Runge-Kutta.
This can be implemented as follows. The array data[1..2*nn] contains on the odd positions the real and on the even positions the imaginary parts of the input data: data[1] is the real part and data[2] the imaginary part of $f_0$, etc. The next routine replaces the values in data by their discrete Fourier transformed values if isign $=1$, and by their inverse transformed values if {\tt isign} $=-1$. nn must be a power of 2.
#include <math.h>
#define SWAP(a,b) tempr=(a);(a)=(b);(b)=tempr
void FourierTransform(float data[], unsigned long nn, int isign)
{
unsigned long n, mmax, m, j, istep, i;
double wtemp, wr, wpr, wpi, wi, theta;
float tempr, tempi;
n = nn << 1;
j = 1;
for (i = 1; i < n; i += 2)
{
if ( j > i )
{
SWAP(data[j], data[i]);
SWAP(data[j+1], data[i+1]);
}
m = n >> 1;
while ( m >= 2 && j > m )
{
j -= m;
m >>= 1;
}
j += m;
}
mmax = 2;
while ( n > mmax ) /* Outermost loop, is executed log2(nn) times */
{
istep = mmax << 1;
theta = isign * (6.28318530717959/mmax);
wtemp = sin(0.5 * theta);
wpr = -2.0 * wtemp * wtemp;
wpi = sin(theta);
wr = 1.0;
wi = 0.0;
for (m = 1; m < mmax; m += 2)
{
for (i = m; i <= n; i += istep) /* Danielson-Lanczos equation */
{
j = i + mmax;
tempr = wr * data[j] - wi * data[j+1];
tempi = wr * data[j+1] + wi * data[j];
data[j] = data[i] - tempr;
data[j+1] = data[i+1] - tempi;
data[i] += tempr;
data[i+1] += tempi;
}
wr = (wtemp = wr) * wpr - wi * wpi + wr;
wi = wi * wpr + wtemp * wpi + wi;
}
mmax = istep;
}
}