Arbitrary precision numerical algorithms

This chapter documents some numerical algorithms used in Yacas for exact integer calculations as well as for multiple precision floating-point calculations, gives brief descriptions of the non-trivial algorithms and estimates of the computational cost. Most of the algorithms are taken from referenced literature; the remaining algorithms were developed by us.


Basic arithmetic

Currently, Yacas uses either internal math (the yacasnumbers library) or the GNU multiple precision library gmp. The algorithms for basic arithmetic in the internal math mode are currently rather slow compared with gmp. If P is the number of digits of precision, then multiplication and division take M(P)=O(P^2) operations in the internal math. (Of course, multiplication and division by a short integer takes time linear in P.) Much faster algorithms for long multiplication (Karatsuba / Toom-Cook / FFT, Newton-Raphson division etc.) are implemented in gmp where at large precision M(P)=O(P*Ln(P)). In the computation cost estimations of this chapter we shall assume that M(P) is at least linear in P.

Warning: calculations with internal math with precision exceeding 10,000 digits are currently impractically slow.

In some algorithms it is necessary to compute the integer parts of expressions such as a*Ln(b)/Ln(10) or a*Ln(10)/Ln(2) where a, b are short integers of order O(P). Such expressions are frequently needed to estimate the number of terms in the Taylor series or similar parameters of the algorithms. In these cases, it is important that the result is not underestimated but it would be wasteful to compute Ln(10)/Ln(2) in floating point only to discard most of that information by taking the integer part of say 1000*Ln(10)/Ln(2). It is more efficient to approximate such constants from above by short rational numbers, for example, Ln(10)/Ln(2)<28738/8651 and Ln(2)<7050/10171. The error of such an approximation will be small enough for practical purposes. The function NearRational can be used to find optimal rational approximations. The function IntLog (see below) efficiently computes the integer part of a logarithm in integer base. If more precision is desired in calculating Ln(a)/Ln(b) for integer a, b, one can compute IntLog(a^k,b) for some integer k and then divide by k.


Prime numbers

Prime numbers are tested using the Miller-Rabin algorithm.

There are also a function NextPrime(n) that returns the smallest prime number larger than n. This function uses a sequence 5,7,11,13,... generated by the function NextPseudoPrime that contains numbers not divisible by 2 or 3 (but perhaps divisible by 5,7,...). NextPseudoPrime is very fast because it does not test for prime numbers.


Factorization of integers

Factorization of integers is implemented by functions Factor and Factors. Both functions try to find all prime factors of a given integer n. (Before doing this, the primality checking algorithm is used to detect whether n is a prime number.) Factorization consists of repeatedly finding a factor, i.e. an integer f such that Mod(n,f)=0, and dividing n by f.

For small prime factors the trial division algorithm is used: n is divided by all prime numbers p<=257 until a factor is found. NextPseudoPrime is used to generate the sequence of candidate divisors p.

After separating small prime factors, we test whether the number n is an integer power of a prime number, i.e. whether n=p^s for some prime number p and an integer s>=1. This is tested by the following algorithm. We already know that n is not prime and that n does not contain any small prime factors up to 257. Therefore if n=p^s, then p>257 and 2<=s<s[0]=Ln(n)/Ln(257). In other words, we only need to look for powers not greater than s[0]. This number can be approximated by the "integer logarithm" of n in base 257 (routine IntLog (n, 257)).

Now we need to check whether n is of the form p^s for s=2, 3, ..., s[0]. Note that if for example n=p^24 for some p, then the square root of n will already be an integer, n^(1/2)=p^12. Therefore it is enough to test whether n^(1/s) is an integer for all prime values of s up to s[0], and then we will definitely discover whether n is a power of some other integer. The testing is performed using the integer square root function IntNthRoot which quickly computes the integer part of n-th root of an integer number. If we discover that n has an integer root p of order s, we have to check that p itself is a prime power (we use the same algorithm recursively). The number n is a prime power if and only if p is itself a prime power. If we find no integer roots of orders s<=s[0], then n is not a prime power.

If the number n is not a prime power, the Pollard "rho" algorithm is applied (J. Pollard, Monte Carlo methods for index computation mod p, Mathematics of Computation, volume 32, pages 918-924, 1978). The Pollard "rho" algorithm takes an irreducible polynomial, e.g. p(x)=x^2+1 and builds a sequence of integers x[k+1]:=Mod(p(x[k]),n), starting from x[0]=2. For each k, the value x[2*k]-x[k] is attempted as possibly containing a common factor with n. The GCD of x[2*k]-x[k] with n is computed, and if Gcd(x[2*k]-x[k],n)>1, then that GCD value divides n.

The Pollard "rho" algorithm may enter an infinite loop when the sequence x[k] repeats itself without giving any factors of n. For example, the unmodified "rho" algorithm loops on the number 703. The loop is detected by comparing x[2*k] and x[k]. When these two quantities become equal to each other for the first time, the loop may not yet have occurred so the value of GCD is set to 1 and the sequence is continued. But when the equality of x[2*k] and x[k] occurs many times, it indicates that the algorithm has entered a loop. A solution is to randomly choose a different starting number x[0] when a loop occurs and try factoring again, and keep trying new random starting numbers between 1 and n until a non-looping sequence is found. The current implementation stops after 100 restart attempts and prints an error message, "failed to factorize number".

A better (and faster) integer factoring algorithm is needed.


Adaptive plotting

The adaptive plotting routine Plot2D'adaptive uses a simple algorithm to select the optimal grid to approximate a function f(x). The same algorithm for adaptive grid refinement could be used for numerical integration. The idea is that plotting and numerical integration require the same kind of detailed knowledge about the behavior of the function.

The algorithm first splits the interval into a specified initial number of equal subintervals, and then repeatedly splits each subinterval in half until the function is well enough approximated by the resulting grid. The integer parameter depth gives the maximum number of binary splittings for a given initial interval; thus, at most 2^depth additional grid points will be generated. The function Plot2D'adaptive should return a list of pairs of points {{x1,y1}, {x2,y2}, ...} to be used directly for plotting.

The recursive bisection algorithm goes like this:

This algorithm works well if the initial number of points and the depth parameter are large enough.

Singularities in the function are handled by the step 3. Namely, the algorithm checks whether the function returns a non-number (e.g. Infinity) and if so, the sign change is always considered to be "too rapid". Thus, the intervals immediately adjacent to the singularity will be plotted at the highest allowed refinement level. When plotting the resulting data, the singular points are simply not printed the data file and the plotting programs do not have any problems.

The meaning of Newton-Cotes quadrature coefficients is that an integral is approximated as

Integrate(x,a[0],a[n])f(x)<=>h*Sum(k,0,n,c[k]*f(a[k])),

where h:=a[1]-a[0] is the grid step, a[k] are the grid points, and c[k] are the quadrature coefficients. These coefficients are independent of the function f(x) and can be precomputed in advance for any grid a[k] (not necessarily with constant step h=a[k]-a[k-1]). The Newton-Cotes coefficients c[k] for grids with a constant step h can be found, for example, by solving a system of equations,

Sum(k,0,n,c[k]*k^p)=n^(p+1)/(p+1)

for p=0, 1, ..., n. This system of equations means that the coefficients c[k] correctly approximate the integrals of functions f(x)=x^p over the interval (0, n).

The solution of this system always exists and gives quadrature coefficients as rational numbers. For example, the Simpson quadrature c[0]=1/6, c[1]=2/3, c[2]=1/6 is obtained with n=2.

In the same way it is possible to find quadratures for the integral over a subinterval rather than over the whole interval of x. In the current implementation of the adaptive plotting algorithm, two quadratures are used: the 3-point quadrature ( n=2) and the 4-point quadrature ( n=3) for the integral over the first subinterval, Integrate(x,a[0],a[1])f(x). Their coefficients are (5/12, 2/3, -1/12) and ( 3/8, 19/24, -5/24, 1/24).


Newton's method and its improvements

The Newton-Raphson method of numerical solution of algebraic equations can be used to obtain multiple-precision values of several elementary functions.

The basic formula is widely known: If f(x)=0 must be solved, one starts with a value of x that is close to some root and iterates

x'=x-f(x)*D(x)f(x)^(-1).

This formula is based on the approximation of the function f(x) by a tangent line at some point x. A Taylor expansion in the neighborhood of the root shows that (for an initial value x[0] sufficiently close to the root) each iteration gives at least twice as many correct digits of the root as the previous one ("quadratic convergence"). Therefore the complexity of this algorithm is proportional to a logarithm of the required precision and to the time it takes to evaluate the function and its derivative. Generalizations of this method require computation of higher derivatives of the function f(x) but successive approximations to the root converge several times faster (the complexity is still logarithmic).

Newton's method is particularly convenient for multiple precision calculations because of its insensitivity to accumulated errors: if x[k] at some iteration is found with a small error, the error will be corrected at the next iteration. Therefore it is not necessary to compute all iterations with the full required precision; each iteration needs to be performed at the precision of the root expected from that iteration. For example, if we know that the initial approximation is accurate to 3 digits, then (assuming quadratic convergence) it is enough to perform the first iteration to 6 digits, the second iteration to 12 digits and so on. In this way, multiple precision calculations are enormously speeded up.

However, Newton's method suffers from sensitivity to the initial guess. If the initial value x[0] is not chosen sufficiently close to the root, the iterations may converge very slowly or not converge at all. To remedy this, one can combine Newton's iteration with simple bisection. Once the root is bracketed inside an interval (a, b), one checks whether (a+b)/2 is a better approximation for the root than that obtained from Newton's iteration. This guarantees at least linear convergence in the worst case.

For some equations f(x)=0, Newton's method converges faster; for example, solving Sin(x)=0 in the neighborhood of x=3.14159 gives "cubic" convergence, i.e. the number of correct digits is tripled at each step. This happens because Sin(x) near its root x=Pi has vanishing second derivative and thus the function is particularly well approximated by a straight line.

Halley's method is an improvement to Newton's method that makes each equation well approximated by a straight line near the root. Edmund Halley computed fractional powers, x=a^(1/n), by the iteration

x'=x*(n*(a+x^n)+a-x^n)/(n*(a+x^n)-(a-x^n)).

This formula is equivalent to Newton's method applied to the equation x^(n-q)=a*x^(-q) with q=(n-1)/2. This iteration has a cubic convergence rate. This is the fastest method to compute n-th roots with multiple precision. Iterations with higher order of convergence, for example, the method with quintic convergence rate

x'=x*((n-1)/(n+1)*(2*n-1)/(2*n+1)*x^(2*n)+2*(2*n-1)/(n+1)*x^n*a+a^2)/(x^(2*n)+2*(2*n-1)/(n+1)*x^n*a+(n-1)/(n+1)*(2*n-1)/(2*n+1)*a^2),

require more arithmetic operations per step and are in fact less efficient at high precision.

Halley's method can be generalized to any function f(x). A cubically convergent iteration is always obtained if we replace the equation f(x)=0 by an equivalent equation

g(x):=f(x)/Sqrt(Abs(D(x)f(x)))=0.

Here the function g(x) was chosen so that its second derivative vanishes (D(x,2)g(x)=0) at the root of the equation f(x)=0, independently of where this root is. (There is no unique choice of the function g(x) and sometimes another choice is needed to make the iteration more easily computable.) The Newton iteration for the equation g(x)=0 can be written as

x'=x-(2*f(x)*D(x)f(x))/(2*D(x)f(x)^2-f(x)*Deriv(x,2)f(x)).

For example, the equation Exp(x)=a is transformed into g(x):=Exp(x/2)-a*Exp(-x/2)=0.

Halley's iteration, despite its faster convergence rate, may be more cumbersome to evaluate than Newton's iteration and so it may not provide a more efficient numerical method for some functions. Only in some special cases is Halley's iteration just as simple to compute as Newton's iteration. But Halley's method has another advantage: it is generally less sensitive to the choice of the initial point x[0]. An extreme example of sensitivity to the initial point is the equation x^(-2)=12 for which Newton's iteration x'=3*x/2-6*x^3 converges to the root only from initial points 0<x[0]<0.5 and wildly diverges otherwise, while Halley's iteration converges to the root from any x[0]>0.

It is at any rate not true that Halley's method always converges better than Newton's method. For instance, it diverges on the equation 2*Cos(x)=x unless started at x[0] within the interval (-1/6*Pi, 7/6*Pi). Another example is the equation Ln(x)=a. This equation allows to compute x=Exp(a) if a fast method for computing Ln(x) is available (e.g. the AGM-based method). For this equation, Newton's iteration

x'=x*(1+a-Ln(x))

converges for any 0<x<Exp(a+1), while Halley's iteration converges only for Exp(a-2)<x<Exp(a+2).

When it converges, Halley's iteration can still converge very slowly for certain functions f(x), for example, for f(x)=x^n-a if n^n>a. For such functions that have very large and rapidly changing derivatives, no general method can converge faster than linearly. In other words, a simple bisection will generally do just as well as any sophisticated iteration, until the root is approximated relatively precisely. Halley's iteration combined with bisection seems to be a good choice for such problems.

For practical evaluation, iterations must be supplemented with error control. For example, if x0 and x1 are two consecutive approximations that are already very close, we can quickly compute the achieved (relative) precision by finding the number of leading zeros in the number Abs(x0-x1)/Max(x0,x1). This is easily done using the integer logarithm. After performing a small number of initial iterations at low precision, we can make sure that x1 has at least a certain number of correct digits of the root. Then we know which precision to use for the next iteration (e.g. triple precision if we are using a cubically convergent scheme). It is important to perform each iteration at the precision of the root which it will give and not at a higher precision; this saves a great deal of time since multiple-precision calculations quickly become very slow at high precision.


Fast evaluation of Taylor series

Taylor series for elementary functions can be used for evaluating the functions when no faster method is available. For example, to straightforwardly evaluate

Exp(x)<=>Sum(k,0,N-1,x^k/k!)

with P decimal digits of precision and x<2, one would need about N<=>P*Ln(10)/Ln(P) terms of the series. To evaluate the truncated series term by term, one needs N-1 long multiplications. (Divisions by large integers k! can be replaced by a short division of the previous term by k.) In addition, about Ln(N)/Ln(10) decimal digits will be lost due to accumulated roundoff errors; therefore the working precision must be increased by this many digits.

If we do not know in advance how many terms of the Taylor series we need, we cannot do any better than just evaluate each term and check if it is already small enough. So in this case we will have to do O(N) long multiplications. However, we can organize the calculation much more efficiently if we can estimate the necessary number of terms and if we can afford some storage. A "rectangular" algorithm uses 2*Sqrt(N) long multiplications (assuming that the coefficients of the series are short rational numbers) and Sqrt(N) units of storage. (See paper: D. M. Smith, Efficient multiple-precision evaluation of elementary functions, 1985.)

Suppose we need to evaluate Sum(k,0,N,a[k]*x^k) and we know that N terms are enough. Suppose also that the coefficients a[k] are rational numbers with small numerators and denominators, so a multiplication a[k]*x is not a long multiplication (usually, either a[k] or the ratio a[k]/a[k-1] is a short rational number). Then we can organize the calculation in a rectangular array with c columns and r rows like this,

a[0]+a[r]*x^r+...+a[(c-1)*r]*x^((c-1)*r)+

x*(a[1]+a[r+1]*x^r+...+a[(c-1)*r+1]*x^((c-1)*r+1))+

...+

x^(r-1)*(a[r-1]+a[2*r+1]*x^r+...).

To evaluate this rectangle, we first compute x^r (which, if done by the fast binary algorithm, requires O(Ln(r)) long multiplications). Then we compute the c-1 successive powers of x^r, namely x^(2*r), x^(3*r), ..., x^((c-1)*r) in c-1 long multiplications. The partial sums in the r rows are evaluated column by column as more powers of x^r become available. This requires storage of r intermediate results but no more long multiplications by x. If a simple formula relating the coefficients a[k] and a[k-1] is available, then a whole column can be computed and added to the accumulated row values using only short operations, e.g. a[r+1]*x^r can be computed from a[r]*x^r (note that each column contains some consecutive terms of the series). Otherwise, we would need to multiply each coefficient a[k] separately by the power of x; if the coefficients a[k] are short numbers, this is also a short operation. After this, we need r-1 more multiplications for the vertical summation of rows (using the Horner scheme). We have potentially saved time because we do not need to evaluate powers such as x^(r+1) separately, so we do not have to multiply x by itself quite so many times.

The total required number of long multiplications is r+c+Ln(r)-2. The minimum number of multiplications, given that r*c>=N, is around 2*Sqrt(N) at r<=>Sqrt(N)-1/2 (the formula r<=>Sqrt(N-Sqrt(N)) can be used with an integer square root algorithm). Therefore, by arranging the Taylor series in a rectangle of sides r and c, we have obtained an algorithm which costs O(Sqrt(N)) instead of O(N) long multiplications and requires Sqrt(N) units of storage.

One might wonder if we should not try to arrange the Taylor series in a cube or another multidimensional matrix instead of a rectangle. However, calculations show that this does not save time: the optimal arrangement is the two-dimensional rectangle.

An additional speed-up is possible if the elementary function allows a transformation that reduces x and makes the Taylor series converge faster. For example, Ln(x)=2*Ln(Sqrt(x)), Cos(2*x)=2*Cos(x)^2-1, and Sin(3*x)=3*Sin(x)-4*Sin(x)^3 are such transformations. It may be faster to perform a number of such transformations before evaluating the Taylor series, if the time saved by its quicker convergence is more than the time needed to perform the transformations. The optimal number of transformations can be estimated. Using this technique in principle reduces the cost of Taylor series from O(Sqrt(N)) to O(N^(1/3)) long multiplications. However, additional roundoff error may be introduced by this procedure for some x.


The AGM sequence algorithms

Several algorithms are based on the arithmetic-geometric mean (AGM) sequence. If one takes two numbers and computes their arithmetic mean and their geometric mean, the two means are generally much closer to each other than the original numbers. Repeating this process creates a rapidly converging sequence.

More formally, one can define the (complex) function of two (complex) numbers AGM(x,y) as the limit of the sequence a[k] where a[k+1]=1/2*(a[k]+b[k]), b[k+1]=Sqrt(a[k]*b[k]), and the initial values are a[0]=x, b[0]=y. This function is obviously linear, AGM(k*x,k*y)=k*AGM(x,y), so in principle it is enough to compute AGM(1,x) or arbitrarily select k for convenience.

The limit of the AGM sequence is related to the complete elliptic integral by

Pi/2*1/AGM(a,Sqrt(a^2-b^2))=Integrate(x,0,Pi/2)1/Sqrt(a^2-b^2*Sin(x)^2).

The definition of the AGM sequence for complex values requires to take a square root Sqrt(a*b), which needs a branch cut to be well-defined. Selecting the natural cut along the negative real semiaxis (Re(x)<0, Im(x)=0), we obtain a AGM sequence that converges for any initial values x, y with positive real part. If the numbers x and y are very different (one is much larger than another), then the numbers a[k], b[k] become approximately equal after about k=1/Ln(2)*Ln(Abs(Ln(x/y))) iterations (note: Brent's paper mistypes this as 1/Ln(2)*Abs(Ln(x/y))). Then one needs about Ln(n)/Ln(2) more iterations to make the first n decimal digits of a[k] and b[k] coincide, because the relative error epsilon=1-b/a decays approximately as epsilon[k]=1/8*Exp(-2^k).

Unlike the Newton iteration, the AGM sequence does not correct errors and all elements need to be computed with full precision. Actually, slightly more precision is needed to compensate for accumulated roundoff error. Brent (paper rpb028, see below) says that O(Ln(Ln(n))) bits of accuracy are lost to roundoff error if there are total of n iterations.

The AGM sequence can be used for fast computations of Pi, Ln(x) and ArcTan(x). However, currently the limitations of Yacas internal math make these methods less efficient than simpler methods based on Taylor series and Newton iterations.


Elementary functions


Powers

Integer powers are computed by a fast algorithm with repeated squarings, as described in Knuth's book.

The square root is computed by the Newton iteration.

A separate function IntNthRoot is provided to compute integer part of n^(1/s) for integer n and s. For a given s, it evaluates the integer part of n^(1/s) using only integer arithmetic with integers of size n^(1+1/s). This can be done by Halley's iteration method, solving the equation x^s=n. For this function, the Halley iteration sequence is monotonic. The initial guess is obtained by bit counting using the integer logarithm function, x[0]=2^(b(n)/s) where b(n) is the number of bits in n. It is clear that the initial guess is accurate to within a factor of 2. Since the relative error is squared at every iteration, we need as many iteration steps as bits in n^(1/s).

Since we only need the integer part of the root, it is enough to use integer division in the Halley iteration. The sequence x[k] will monotonically approximate the number n^(1/s) from below if we start from an initial guess that is less than the exact value. (We start from below so that we have to deal with smaller integers rather than with larger integers.) If n=p^s, then after enough iterations the floating-point value of x[k] would be slightly less than p; our value is the integer part of x[k]. Therefore, at each step we check whether 1+x[k] is a solution of x^s=n, in which case we are done; and we also check whether (1+x[k])^s>n, in which case the integer part of the root is x[k]. To speed up the Halley iteration in the worst case when s^s>n, it is combined with bisection. The root bracket interval x1<x<x2 is maintained and the next iteration x[k+1] is assigned to the midpoint of the interval if the Newton formula does not give sufficiently rapid convergence. The initial root bracket interval can be taken as x[0], 2*x[0].

Real powers (as opposed to integer powers and roots) are computed by using the exponential and logarithm functions, a^b=Exp(b*Ln(a)).


Logarithm

There are two functions for the logarithm: one for the integer argument and one for the real argument.

The "integer logarithm", defined as the integer part of Ln(x)/Ln(b), where x and b are integers, is computed using a special routine IntLog(x, b) with purely integer math. This is much faster than evaluating the full logarithm when both arguments are integers and only the integer part of the logarithm is needed. The algorithm consists of (integer) dividing x by b repeatedly until x becomes 0 and counting the number of divisions. A speed-up for large x is achieved by first comparing x with b, then with b^2, b^4, etc., until the factor b^2^n is larger than x. At this point, x is divided by that power of b and the remaining value is iteratively compared with and divided by successively smaller powers of b.

The logarithm function Ln(x) for general (complex) x can be computed using its Taylor series,

Ln(1+x)=x-x/2+x^2/3-...

This series converges only for Abs(x)<1, so for all other values of x one first needs to bring the argument into this range by taking several square roots and then using the identity Ln(x)=2^k*Ln(x^2^(-k)). This is implemented in the Yacas core (for real x).

Currently the routine LnNum uses the Halley method for the equation Exp(x)=a to find x=Ln(a). This is currently much faster than other methods.

A much faster algorithm based on the AGM sequence was given by Salamin (see R. P. Brent: Multiple-precision zero-finding methods and the complexity of elementary function evaluation, in Analytic Computational Complexity, ed. by J. F. Traub, Academic Press, 1975, p. 151; also available online from Oxford Computing Laboratory, as the paper rpb028). The formula is based on an asymptotic relation,

Ln(x)=Pi*x*(1+4*x^(-2)*(1-1/Ln(x))+O(x^(-4)))/(2*AGM(x,4)).

If x is large enough, the numerator is very close to 1 and can be disregarded. "Large enough" for a desired precision of P decimal digits means that 4*x^(-2)<10^(-P). The AGM algorithm gives P digits only for such large values of x, unlike the Taylor series which is only good for x close to 1.

The required number of AGM iterations is approximately 2*Ln(P)/Ln(2). For smaller values of x (but x>1), one can either raise x to a large integer power s (this is quick only if x is an integer or a rational) and compute 1/r*Ln(x^r), or multiply x by a large integer power of 2 (this is better for floating-point x) and compute Ln(2^s*x)-s*Ln(2). Here the required powers are

r=Ln(10^P*4)/(2*Ln(x)),

s=P*Ln(10)/(2*Ln(2))+1-Ln(x)/Ln(2).

These parameters can be found quickly by using the integer logarithm procedure IntLog, while constant values such as Ln(10)/Ln(2) can be simply approximated by rational numbers because r and s do not need to be very precise (but they do need to be large enough). For the second calculation, Ln(2^s*x)-s*Ln(2), we must precompute Ln(2) to the same precision. Also, the subtraction of a large number s*Ln(2) leads to a certain loss of precision, namely, about Ln(s)/Ln(10) decimal digits are lost, therefore the operating precision must be increased by this number of digits. (The quantity Ln(s)/Ln(10) is computed, of course, by the integer logarithm procedure.)

If x<1, then (-Ln(1/x)) is computed. Finally, there is a special case when x is very close to 1, where the Taylor series converges quickly but the AGM algorithm requires to multiply x by a large power of 2 and then subtract two almost equal numbers, leading to a great loss of precision. Suppose 1<x<1+10^(-M), where M is large (say of order P). The Taylor series for Ln(1+epsilon) needs about N= -P*Ln(10)/Ln(epsilon)=P/M terms. If we evaluate the Taylor series using the rectangular scheme, we need 2*Sqrt(N) multiplications and Sqrt(N) units of storage. On the other hand, the main slow operation for the AGM sequence is the geometric mean Sqrt(a*b). If Sqrt(a*b) takes an equivalent of c multiplications (Brent's estimate would be c=13/2 but it may be more in practice), then the AGM sequence requires 2*c*Ln(P)/Ln(2) multiplications. Therefore the Taylor series method is more efficient for

M>1/c^2*P*(Ln(2)/Ln(P))^2.

In this case it requires at most c*Ln(P)/Ln(2) units of storage and 2*c*Ln(P)/Ln(2) multiplications.

For larger x>1+10^(-M), the AGM method is more efficient. It is necessary to increase the working precision to P+M*Ln(2)/Ln(10) but this does not decrease the asymptotic speed of the algorithm. To compute Ln(x) with P digits of precision for any x, only O(Ln(P)) long multiplications are required.


Exponential

The exponential function is computed using its Taylor series,

Exp(x)=1+x+x^2/2! +...

This series converges for all (complex) x, but if Abs(x) is large, it converges slowly. A speed-up trick used for large x is to divide the argument by some power of 2 and then square the result several times, i.e.

Exp(x)=Exp(2^(-k)*x)^2^k,

where k is chosen sufficiently large so that the Taylor series converges quickly at 2^(-k)*x. The threshold value for x is in the variable MathExpThreshold in stdfuncs. If x is large and negative, then it is easier to compute 1/ Exp(-x). If x is sufficiently small, e.g. Abs(x)<10^(-M) and M>Ln(P)/Ln(10), then it is enough to take about P/M terms in the Taylor series. If x is of order 1, one needs about P*Ln(10)/Ln(P) terms.

An alternative way to compute x=Exp(a) at large precision would be to solve the equation Ln(x)=a using a fast logarithm routine. A cubically convergent formula is obtained if we replace Ln(x)=a by an equivalent equation

(Ln(x)-a)/(Ln(x)-a-2)=0.

For this equation, Newton's method gives the iteration

x'=x*(1+(a+1-Ln(x))^2)/2.

This iteration converges for initial values 0<x<Exp(a+2) with a cubic convergence rate and requires only one more multiplication, compared with Newton's method for Ln(x)=a. A good initial guess can be found by raising 2 to the integer part of a/Ln(2) (the value Ln(2) can be approximated from above by a suitable rational number, e.g. 7050/10171).


Trigonometric

Trigonometric functions Sin(x), Cos(x) are computed by subtracting 2*Pi from x until it is in the range 0<x<2*Pi and then using Taylor series. Tangent is computed by dividing Sin(x)/Cos(x).

Inverse trigonometric functions are computed by Newton's method (for ArcSin) or by continued fraction expansion (for ArcTan),

ArcTan(x)=x/(1+x^2/(3+(2*x)^2/(5+(3*x)^2/(7+...)))).

The convergence of this expansion for large Abs(x) is improved by using the identities

ArcTan(x)=Pi/2*Sign(x)-ArcTan(1/x),

ArcTan(x)=2*ArcTan(x/(1+Sqrt(1+x^2))).

Thus, any value of x is reduced to Abs(x)<0.42. This is implemented in the standard library scripts.

By the identity ArcCos(x):=Pi/2-ArcSin(x), the inverse cosine is reduced to the inverse sine. Newton's method for ArcSin(x) consists of solving the equation Sin(y)=x for y. Implementation is similar to the calculation of pi in PiMethod0().

For x close to 1, Newton's method for ArcSin(x) converges very slowly. An identity

ArcSin(x)=Sign(x)*(Pi/2-ArcSin(Sqrt(1-x^2)))

can be used in this case. Another potentially useful identity is

ArcSin(x)=2*ArcSin(x/(Sqrt(2)*Sqrt(1+Sqrt(1-x^2)))).

Inverse tangent can also be related to inverse sine by

ArcTan(x)=ArcSin(x/Sqrt(1+x^2)),

ArcTan(1/x)=ArcSin(1/Sqrt(1+x^2)).

Hyperbolic and inverse hyperbolic functions are reduced to exponentials and logarithms: Cosh(x)=1/2*(Exp(x)+Exp(-x)), Sinh(x)=1/2*(Exp(x)-Exp(-x)), Tanh(x)=Sinh(x)/Cosh(x),

ArcCosh(x)=Ln(x+Sqrt(x^2+1)),

ArcSinh(x)=Ln(x+Sqrt(x^2-1)),

ArcTanh(x)=1/2*Ln((1+x)/(1-x)).

The idea to use continued fraction expansions for ArcTan comes from the book by Jack W. Crenshaw, MATH Toolkit for REAL-TIME Programming (CMP Media Inc., 2000). In that book the author explains how he got the idea to use continued fraction expansions to approximate ArcTan(x), given that the Taylor series converges slowly, and having a hunch that in that case the continued fraction expansion then converges rapidly. He then proceeds to show that in the case of ArcTan(x), this is true in a big way. Now, it might not be true for all slowly converging series. No articles or books have been found yet that prove this. The above book shows it empirically.

One disadvantage of both continued fraction expansions and approximation by rational functions, compared to a simple series, is that it is in general not easy to do the calculation with one step more precision, due to the nature of the form of the expressions, and the way in which they change when expressions with one order better precision are considered. The coefficients of the terms in the polynomials defining the numerator and the denominator of the rational function change. This contrasts with a Taylor series expansion, where each additional term improves the accuracy of the result, and the calculation can be terminated when sufficient accuracy is achieved.

The convergence of the continued fraction expansion of ArcTan(x) is indeed better than convergence of the Taylor series. Namely, the Taylor series converges only for Abs(x)<1 while the continued fraction converges for all x. However, the speed of its convergence is not uniform in x; the larger the value of x, the slower the convergence. The necessary number of terms of the continued fraction is in any case proportional to the required number of digits of precision, but the constant of proportionality depends on x.

This can be understood by the following elementary argument. The difference between two partial continued fractions that differ only by one extra last term can be estimated by

Abs(delta):=Abs(b[0]/(a[1]+b[1]/(...+b[n-1]/a[n]))-b[0]/(a[1]+b[1]/(...+b[n]/a[n+1])))<(b[0]*b[1]*...*b[n])/((a[1]*...*a[n])^2*a[n+1]).

(This is a conservative estimate that could be slightly improved with more careful analysis. See also the section on numerical continued fractions.) For the above continued fraction for ArcTan(x), this directly gives the following estimate,

Abs(delta)<(x^(2*n+1)*n! ^2)/((2*n+1)*(2*n-1)!! ^2)<>x*(x/2)^(2*n).

This formula only gives a meaningful bound if x<2, but it is clear that the precision generally becomes worse when x grows. If we need P digits of precision, then, for a given x, the number of terms n has to be large enough so that the relative precision is sufficient, i.e.

delta/ArcTan(x)<10^(-P).

This gives n>P*Ln(10)/(Ln(4)-2*Ln(x)) and for x=1, n>3/2*P. This estimate is very close for small x and only slightly suboptimal for larger x: numerical experimentation shows that for x<=1, the required number of terms for P decimal digits is only about 4/3*P, and for x<=0.42, n must be 3/4*P. If x<1 is very small then one needs a much smaller number of terms n>P*Ln(10)/(Ln(4)-2*Ln(x)). The number of terms is set equal to this number (computed at low precision) in the routine ContArcTan. Roundoff errors may actually make the result less precise if a larger number of terms is used.


Calculation of Pi

In Yacas, the constant pi is computed by the library routine Pi() which uses the internal routine MathPi() to compute the value to current precision Precision(). The result is stored in the global variable PiCache which is a list of the form {precision, value} where precision is the number of digits of pi that have already been found and value is the multiple-precision value. This is done to avoid recalculating pi if a precise enough value for it has already been found.

Efficient iterative algorithms for computing pi with arbitrary precision have been recently developed by Brent, Salamin, Borwein and others. However, limitations of the current multiple-precision implementation in Yacas (compiled with the "internal" math option) make these advanced algorithms run slower because they require many more arbitrary-precision multiplications at each iteration.

The file examples/pi.ys implements five different algorithms that duplicate the functionality of Pi(). See http://numbers.computation.free.fr/Constants/ for details of computations of pi and generalizations of Newton-Raphson iteration.

PiMethod0(), PiMethod1(), PiMethod2() are all based on a generalized Newton-Raphson method of solving equations.

Since pi is a solution of Sin(x)=0, one may start sufficiently close, e.g. at x0=3.14159265 and iterate x'=x-Tan(x). In fact it is faster to iterate x'=x+Sin(x) which solves a different equation for pi. PiMethod0() is the straightforward implementation of the latter iteration. A significant speed improvement is achieved by doing calculations at each iteration only with the precision of the root that we expect to get from that iteration. Any imprecision introduced by round-off will be automatically corrected at the next iteration.

If at some iteration x=pi+epsilon for small epsilon, then from the Taylor expansion of Sin(x) it follows that the value x' at the next iteration will differ from pi by O(epsilon^3). Therefore, the number of correct digits triples at each iteration. If we know the number of correct digits of pi in the initial approximation, we can decide in advance how many iterations to compute and what precision to use at each iteration.

The final speed-up in PiMethod0() is to avoid computing at unnecessarily high precision. This may happen if, for example, we need to evaluate 200 digits of pi starting with 20 correct digits. After 2 iterations we would be calculating with 180 digits; the next iteration would have given us 540 digits but we only need 200, so the third iteration would be wasteful. This can be avoided by first computing pi to just over 1/3 of the required precision, i.e. to 67 digits, and then executing the last iteration at full 200 digits. There is still a wasteful step when we would go from 60 digits to 67, but much less time would be wasted than in the calculation with 200 digits of precision.

Newton's method is based on approximating the function f(x) by a straight line. One can achieve better approximation and therefore faster convergence to the root if one approximates the function with a polynomial curve of higher order. The routine PiMethod1() uses the iteration

x'=x+Sin(x)+1/6*Sin(x)^3+3/40*Sin(x)^5+5/112*Sin(x)^7

which has a faster convergence, giving 9 times as many digits at every iteration. (The series is the Taylor series for ArcSin(y) cut at O(y^9).) The same speed-up tricks are used as in PiMethod0(). In addition, the last iteration, which must be done at full precision, is performed with the simpler iteration x'=x+Sin(x) to reduce the number of high-precision multiplications.

Both PiMethod0() and PiMethod1() require a computation of Sin(x) at every iteration. An industrial-strength arbitrary precision library such as gmp can multiply numbers much faster than it can evaluate a trigonometric function. Therefore, it would be good to have a method which does not require trigonometrics. PiMethod2() is a simple attempt to remedy the problem. It computes the Taylor series for ArcTan(x),

ArcTan(x)=x-x^3/3+x^5/5-x^7/7+...,

for the value of x obtained as the tangent of the initial guess for pi; in other words, if x=pi+epsilon where epsilon is small, then Tan(x)=Tan(epsilon), therefore epsilon=ArcTan(Tan(x)) and pi is found as pi=x-epsilon. If the initial guess is good (i.e. epsilon is very small), then the Taylor series for ArcTan(x) converges very quickly (although linearly, i.e. it gives a fixed number of digits of pi per term). Only a single full-precision evaluation of Tan(x) is necessary at the beginning of the algorithm. The complexity of this algorithm is proportional to the number of digits and to the time of a long multiplication.

The routines PiBrentSalamin() and PiBorwein() are based on much more advanced mathematics. (See papers of P. Borwein for review and explanations of the methods.) They do not require evaluations of trigonometric functions, but they do require taking a few square roots at each iteration, and all calculations must be done using full precision. Using modern algorithms, one can compute a square root roughly in the same time as a division; but Yacas's internal math is not yet up to it. Therefore, these two routines perform poorly compared to the more simple-minded PiMethod0().


Continued fractions with numeric terms

The function ContFracList converts a (rational) number r into a regular continued fraction,

r=n[0]+1/(n[1]+1/(n[2]+...)).

Here all numbers n[i] ("terms" of a continued fraction) are integers and all except n[0] must be positive. (Continued fractions may not converge unless their terms are positive and bounded from below.)

The algorithm for converting a rational number r=n/m into a continued fraction is simple. First, we determine the integer part of r, which is Div(n,m). If it is negative, we need to subtract one, so that r=n[0]+x and the remainder x is nonnegative and less than 1. The remainder x=Mod(n,m)/m is then inverted, r[1]:=1/x=m/Mod(n,m) and so we have completed the first step in the decomposition, r=n[0]+1/r[1]; now n[0] is integer but r[1] is perhaps not integer. We repeat the same procedure on r[1], obtain the next integer term n[1] and the remainder r[2] and so on, until such n that r[n] is an integer and there is no more work to do. This process will always terminate because all floating-point values are actually rationals in disguise.

Continued fractions are useful in many ways. For example, if we know that a certain number x is rational but have only a floating-point representation of x with a limited precision, say, 1.5662650602409638, we can try to guess its rational form (in this example x=130/83). The function GuessRational uses continued fractions to find a rational number with "optimal" (small) numerator and denominator that is approximately equal to a given floating-point number.

Consider the following example. The number 17/3 has a continued fraction expansion {5,1,2}. Evaluated as a floating point number with limited precision, it may become something like 17/3+0.00001, where the small number represents a roundoff error. The continued fraction expansion of this number is {5, 1, 2, 11110, 1, 5, 1, 3, 2777, 2}. The presence of an unnaturally large term 11110 clearly signifies the place where the floating-point error was introduced; all terms following it should be discarted to recover the continued fraction {5,1,2} and from it the initial number 17/3.

If a continued fraction for a number x is cut right before an unusually large term, and evaluated, the resulting rational number is very close to close to x but has an unusually small denominator. This works because partial continued fractions provide "optimal" rational approximations for the final (irrational) number, and because the magnitude of the terms of the partial fraction is related to the magnitude of the denominator of the resulting rational approximation.

GuessRational(x, prec) needs to choose the place where it should cut the continued fraction. The algorithm for this is somewhat imprecise but works well enough. We can try to find an upper bound for the difference of continued fractions that differ only by an additional last term,

Abs(delta):=Abs(1/(a[1]+1/(...+1/a[n]))-1/(a[1]+1/(...+1/a[n+1])))<1/((a[1]*...*a[n])^2*a[n+1]).

Thus we should compute the product of successive terms a[i] of the continued fraction and stop at a[n] at which this product exceeds the maximum number of digits. The routine GuessRational has a second parameter prec which is by default 1/2 times the number of decimal digits of current precision; it stops at a[n] at which the product a[1]*...*a[n] exceeds 10^prec.

The above estimate for delta hinges on the inequality

1/(a+1/(b+...))<1/a

and is suboptimal if some terms a[i]=1, because the product of a[i] does not increase when one of the terms is equal to 1, whereas in fact these terms do make delta smaller. A somewhat better estimate would be obtained if we use the inequality

1/(a+1/(b+1/(c+...)))<1/(a+1/(b+1/c)).

This does not lead to a significant improvement if a>1 but makes a difference when a=1. In the product a[1]*...*a[n], the terms a[i] which are equal to 1 should be replaced by

a[i]+1/(a[i+1]+1/a[i+2]).

Since the comparison of a[1]*...*a[n] with 10^prec is qualitative, it it enough to do calculations for it with only limited precision.

This algorithm works well if x is computed with enough precision; namely, it must be computed to at least as many digits as there are in the numerator and the denominator of the fraction combined. Also, the parameter prec should not be too large (or else the algorithm will find a rational number with a larger denominator that approximates x even better).

The related function NearRational(x, prec) works somewhat differently. The goal is to find an "optimal" rational number, i.e. with smallest numerator and denominator, that is within the distance 10^(-prec) of a given value x. The algorithm for this comes from the 1972 HAKMEM document, Item 101C. Their description is terse but clear:

Problem: Given an interval, find in it the
rational number with the smallest numerator and
denominator.
Solution: Express the endpoints as continued
fractions.  Find the first term where they differ
and add 1 to the lesser term, unless it's last. 
Discard the terms to the right.  What's left is
the continued fraction for the "smallest"
rational in the interval.  (If one fraction
terminates but matches the other as far as it
goes, append an infinity and proceed as above.)

The HAKMEM text (M. Beeler, R. W. Gosper, and R. Schroeppel: Memo No. 239, MIT AI Lab, 1972, available as HTML online from various places) contains several interesting insights relevant to continued fractions and other numerical algorithms.


Factorials and binomial coefficients

The factorial is defined by n! :=n*(n-1)*...*1 for integer n and the binomial coefficient is defined as

Bin(n,m):=n! /(m! *(n-m)!).

A "staggered factorial" n!! :=n*(n-2)*(n-4)*... is also useful for some calculations.

There are two tasks related to the factorial: the exact integer calculation and an approximate calculation to some floating-point precision. Factorial of n has approximately n*Ln(n)/Ln(10) decimal digits, so an exact calculation is practical only for relatively small n. In the current implementation, exact factorials for n>65535 are not computed but print an error message advising the user to avoid exact computations. For example, LnGammaNum(n+1) is able to compute Ln(n!) for very large n to the desired floating-point precision.


Exact factorials

To compute factorials exactly, we use two direct methods. The first method is to multiply the numbers 1, 2, ..., n in a loop. This method requires n multiplications of short numbers with P-digit numbers, where P=O(n*Ln(n)) is the number of digits in n!. Therefore its complexity is O(n^2*Ln(n)). This factorial routine is implemented in the Yacas core with a small speedup: consecutive pairs of integers are first multiplied together using platform math and then multiplied by the accumulator product.

A second method uses a binary tree arrangement of the numbers 1, 2, ..., n similar to the recursive sorting routine ("merge-sort"). If we denote by a *** b the "partial factorial" product a*(a+1)*...(b-1)*b, then the tree-factorial algorithm consists of replacing n! by 1***n and recursively evaluating (1***m)*((m+1)***n) for some integer m near n/2. The partial factorials of nearby numbers such as m***(m+2) are evaluated explicitly. The binary tree algorithm requires one multiplication of P/2 digit integers at the last step, two P/4 digit multiplications at the last-but-one step and so on. There are O(Ln(n)) total steps of the recursion. If the cost of multiplication is M(P)=P^(1+a)*Ln(P)^b, then one can show that the total cost of the binary tree algorithm is O(M(P)) if a>0 and O(M(P)*Ln(n)) if a=0 (which is the best asymptotic multiplication algorithm).

Therefore, the tree method wins over the simple method if the cost of multiplication is lower than quadratic.

The tree method can also be used to compute "staggered factorials" ( n!!). This is faster than to use the identities (2*n)!! =2^n*n! and

(2*n-1)!! =(2*n)! /(2^n*n!).

Staggered factorials are used in the exact calculation of the Gamma function of half-integer argument.

Binomial coefficients Bin(n,m) are found by first selecting the smaller of m, n-m and using the identity Bin(n,m)=Bin(n,n-m). Then a partial factorial is used to compute Bin(n,m)=((n-m+1)***n)/m!. This is always much faster than computing the three factorials in the definition of Bin(n,m).


Approximate factorials

A floating-point computation of the factorial may proceed either via Euler's Gamma function or by a direct method (multiplying the integers). If the required precision is much less than the number of digits in the exact factorial, then almost all multiplications will be truncated to the precision P and the tree method O(n*M(P)) is always slower than the simple method O(n*P).


Euler's Gamma function

Euler's Gamma function Gamma(z) is defined for complex z such that Re(z)>0 by the integral

Gamma(z):=Integrate(z,0,Infinity)Exp(-t)*t^(z-1).

The Gamma function satisfies several identities that can be proved by rearranging this integral; for example, Gamma(z+1)=z*Gamma(z). This identity defines Gamma(z) for all complex z. The Gamma function is regular everywhere except nonpositive integers (0, -1, -2, ...) where it diverges.

For real integers n>0, the Gamma function is the same as the factorial,

Gamma(n+1):=n!,

so the factorial notation can be used for the Gamma function too. Some formulae become a little simpler when written in factorials.

The Gamma function is implemented as Gamma(x). At integer values n of the argument, Gamma(n) is computed exactly. For half-integer values it is also computed exactly, using the following identities (here n is a nonnegative integer):

(+(2*n+1)/2)! =Sqrt(Pi)*(2*n+1)! /(2^(2*n+1)*n!),

(-(2*n+1)/2)! =(-1)^n*Sqrt(Pi)*(2^(2*n)*n!)/(2*n)!.

For efficiency, "staggered factorials" are used in this calculation.

For arbitrary complex arguments with nonnegative real part, the library function GammaNum(x) computes a uniform appoximation of Lanczos and Spouge (with the so-called "less precise coefficients of Spouge"). See: C. J. Lanczos, J. SIAM of Num. Anal. Ser. B, vol. 1, 86 (1964); J. L. Spouge, J. SIAM of Num. Anal., vol. 31, 931 (1994). See also: Paul Godfrey 2001 (unpublished): http://winnie.fit.edu/~gabdo/gamma.txt for some explanations on the method. The method gives the Gamma-function only for arguments with positive real part; at negative values of the real part of the argument, the Gamma-function is computed via the identity

Gamma(x)*Gamma(1-x)=Pi/Sin(Pi*x).

The approximation formula used in Yacas depends on a parameter a,

Gamma(z)=(Sqrt(2*Pi)*(z+a)^(z-1/2))/(z*e^(z+a))*(1+e^(a-1)/Sqrt(2*Pi)*Sum(k,1,N,c[k]/(z+k))),

with N:=Ceil(a)-1. The coefficients c[k] are defined by

c[k]=(-1)^(k-1)*(a-k)^(k-1/2)/(e^(k-1)*(k-1)!).

The parameter a is a free parameter of the approximation that determines also the number of terms in the sum. Some choices of a may lead to a slightly more precise approximation, but larger a is always better. The number of terms N must be large enough to produce the required precision. The estimate of the relative error for this formula is valid for all z such that Re(z)>0 and is

error<(2*Pi)^(-a)/Sqrt(2*Pi*a)*a/(a+z).

The lowest value of a to produce P correct digits is estimated as

a=(P-Ln(P)/Ln(10))*Ln(10)/Ln(2*Pi)-1/2.

In practical calculations, the integer logarithm routine IntLog is used and the constant Ln(10)/Ln(2*Pi) is approximated from above by 659/526, so that a is not underestimated.

The coefficients c[k] and the parameter a can be chosen to achieve a greater precision of the approximation formula. However, the recipe for the coefficients c[k] given in the paper by Lanczos is too complicated for practical calculations in arbitrary precision: the time it would take to compute the array of N coefficients c[k] grows as N^3. Therefore it is better to use less precise but much simpler formulae derived by Spouge.

In the calculation of the sum Sum(k,1,N,c[k]*(z+k)^(-1)), round-off error can lead to a serious loss of precision. At version 1.0.49, Yacas is limited in its internal arbitrary precision facility in that does not support floating-point numbers with mantissa; this hinders precise calculations with floating-point numbers. (This concern does not apply to Yacas linked with gmp.) In the current version of the GammaNum() function, two workarounds are implemented. First, a Horner scheme is used to compute the sum; this is somewhat faster and leads to smaller roundoff errors. Second, intermediate calculations are performed at 40% higher precision than requested. This is much slower but allows to obtain results at desired precision.

If the factorial of a large integer or half-integer n needs to be computed not exactly but only with a certain floating-point precision, it is faster (for large enough Abs(n)) not to evaluate an exact integer product, but to use the GammaNum approximation or Stirling's asymptotic formula,

Ln(n!)<=>Ln(2*Pi*n)/2+n*Ln(n/e)+1/(12*n)-1/(360*n^3)+...

(The coefficients of the series expansion are related to Bernoulli numbers: B[k+1]/(k*(k+1)*n^k) for k>=1.) This method is currently not implemented in Yacas.


Riemann's Zeta function

Riemann's Zeta function zeta(s) is defined for complex s such that Re(s)>0 as a sum of inverse powers of integers:

zeta(s):=Sum(n,0,Infinity,1/n^s).

This function can be analytically continued to the entire complex plane except the point s=1 where it diverges. It satisfies several identities, for example, a formula useful for negative Re(s),

zeta(1-s)=(2*Gamma(s))/(2*Pi)^s*Cos((Pi*s)/2)*zeta(s),

and a formula for even integers that helps in numerical testing,

zeta(2*n)=((-1)^(n+1)*(2*Pi)^(2*n))/(2*(2*n)!)*B[2*n],

where B[n] are Bernoulli numbers.

The classic book of Bateman and Erdelyi, Higher Transcendental Functions, vol. 1, describes many results concerning analytic properties of zeta(s).

For the numerical evaluation of Riemann's Zeta function with arbitrary precision to become feasible, one needs special algorithms. Recently P. Borwein gave a simple and quick approximation algorithm for positive Re(s) (P. Borwein, An efficient algorithm for Riemann Zeta function (1995), published online and in Canadian Math. Soc. Conf. Proc., 27 (2000), 29-34.) See also: J. M. Borwein, D. M. Bradley, R. E. Crandall: Computation strategies for the Riemann Zeta function, preprint CECM-98-118, 1999, for a review of methods.

It is the "third" algorithm (the simplest one) from P. Borwein's paper which is implemented in Yacas. The approximation formula valid for Re(s)> -(n-1) is

zeta(s)=1/(2^n*(1-2^(1-s)))*Sum(j,0,2*n-1,e[j]/(j+1)^s),

where the coefficients e[j] for j=0, ..., 2*n-1 are defined by

e[j]:=(-1)^(j-1)*(Sum(k,0,j-n,n! /(k! *(n-k)!))-2^n),

and the empty sum (for j<n) is taken to be zero. The parameter n must be chosen high enough to achieve the desired precision. The error estimate for this formula is approximately

error<8^(-n)

for the relative precision, which means that to achieve P correct digits we must have n>P*Ln(10)/Ln(8).

The function Zeta(s) calls ZetaNum(s) to compute this approximation formula for Re(s)>1/2 and uses the identity above to get the value for other s.

For very large values of s, it is faster to use another method implemented in the routine ZetaNum1(s, N). If the required precision is P digits and s>1+Ln(10)/Ln(P)*P, then it is enough to compute the defining series for zeta(n),

zeta(n)<=>Sum(k,1,N,1/k^n),

up to a certain number of terms N. The required number of terms N is given by

N=10^(P/(s-1)).

For example, at 100 digits of precision it is advisable to use ZetaNum1(s) only for s>50, since it would require N<110 terms in the series, whereas the expression used in ZetaNum(s) uses n=Ln(10)/Ln(8)*P terms (of a different series).


Bessel functions

Bessel functions are a family of special functions solving the equation

Deriv(x,2)w(x)+1/x*D(x)w(x)+(1-n^2/x^2)*w(x)=0.

There are two linearly independent solutions which can be taken as the pair of Hankel functions H1(n,x), H2(n,x), or as the pair of Bessel-Weber functions J[n], Y[n]. These pairs are linearly related, J[n]=1/2*(H1(n,x)+H2(n,x)), J[n]=1/(2*I)*(H1(n,x)-H2(n,x)). The function H2(n,x) is the complex conjugate of H1(n,x). This arrangement of four functions is very similar to the relation between Sin(x), Cos(x) and Exp(I*x), Exp(-I*x), which are all solutions of Deriv(x,2)f(x)+f(x)=0.

For large values of Abs(x), there is the following asymptotic series:

H1(n,x)<>Sqrt(2/(Pi*x))*Exp(I*zeta)*Sum(k,0,Infinity,I^k*A(k,n)/x^k),

where zeta:=x-1/2*n*Pi-1/4*Pi and

A(k,n):=((4*n^2-1^2)*(4*n^2-3^2)*...*(4*n^2-(2*k-1)^2))/(k! *8^k).

From this one can find the asymptotic series for J[n]<>Sqrt(2/(Pi*x))*Cos(zeta)*Sum(k,0,Infinity,(-1)^k*A(2*k,n)*x^(-2*k))-Sqrt(2/(Pi*x))*Sin(zeta)*Sum(k,0,Infinity,(-1)^k*A(2*k+1,n)*x^(-2*k-1)) and Y[n]<>Sqrt(2/(Pi*x))*Sin(zeta)*Sum(k,0,Infinity,(-1)^k*A(2*k,n)*x^(-2*k))+Sqrt(2/(Pi*x))*Cos(zeta)*Sum(k,0,Infinity,(-1)^k*A(2*k+1,n)*x^(-2*k-1)).

The error of a truncated asymptotic series is not larger than the first discarded term if the number of terms is larger than n-1/2. (See the book: F. W. J. Olver, Asymptotic and special functions, Academic Press, 1974.)

Currently Yacas can compute BesselJ(n,x) for all x where n is an integer and for Abs(x)<=2*Gamma(n) when n is a real number. Yacas currently uses the Taylor series when Abs(x)<=2*Gamma(n) to compute the numerical value:

BesselJ(n,x):=Sum(k,0,Infinity,(-1)^k*x^(2*k+n)/(2^(2*k+n)*k! *Gamma(k+n+1))).

If Abs(x)>2*Gamma(n) and n is an integer, then Yacas uses the forward recursion identity:

BesselJ(n,x):=2*(n+1)/x*BesselJ(n+1,x)-BesselJ(n+2,x)

until the given BesselJ function is represented in terms of higher order terms which all satisfy Abs(x)<=2*Gamma(n). Note that when n is much smaller than x, this algorithm is quite slow because the number of Bessel function evaluations grows like 2^i, where i is the number of times the recurrence identity is used.

We see from the definition that when Abs(x)<=2*Gamma(n), the absolute value of each term is always decreasing (which is called absolutely monotonely decreasing.) From this we know that if we stop after i iterations, the error will be bounded by the absolute value of the next term. So given a set precision, turn this into a value epsilon, so that we can check if the current term will contribute to the sum at the prescribed precision. Before doing this, Yacas currently increases the precision by 20% to do interim calculations. This is a heuristic that works, it is not backed by theory. The value epsilon is given by epsilon:=5*10^(-prec), where prec was the previous precision. This is directly from the definition of floating point number which is correct to prec digits: A number correct to prec digits has a rounding error no greater than 5*10^(-prec). Beware that some books incorrectly have .5 instead of 5.

Bug: Something is not right with complex numbers, but pure imaginary are OK.


Bernoulli numbers and polynomials

The Bernoulli numbers B[n] come from a sequence of rational numbers defined by the series expansion of the following generating function,

z/(e^z-1)=Sum(n,0,Infinity,B[n]*z^n/n!).

The Bernoulli polynomials B(x)[n] are defined similarly by

(z*Exp(z*x))/(e^z-1)=Sum(n,0,Infinity,B(x)[n]*z^n/n!).

The Bernoulli polynomials are related to Bernoulli numbers by

B(x)[n]=Sum(k,0,n,x^k*B[n-k]*Bin(n,k)),

where Bin(n,k) are binomial coefficients.

Bernoulli numbers and polynomials are used in various Taylor series expansions, in the Euler-Maclauren series resummation formula, in Riemann's Zeta function and so on. For example, the sum of (integer) p-th powers of consecutive integers is given by

Sum(k,0,n-1,k^p)=(B(n)[p+1]-B[p+1])/(p+1).

The Bernoulli polynomials B(x)[n] can be found by first computing an array of Bernoulli numbers up to B[n] and then applying the above formula for the coefficients.

In this definition, the first Bernoulli numbers are B[0]=1, B[1]= -1/2, B[2]=4, B[3]=0, B[4]= -1/30.

We consider two distinct computational tasks: evaluate a Bernoulli number exactly as a rational, or find it approximately to a specified floating-point precision. There are also two possible problem settings: either we need to evaluate all Bernoulli numbers B[n] up to some n, or we only need one isolated value B[n] for some n. Depending on how large n is, different algorithms need to be used in these cases.


Exact evaluation of Bernoulli numbers

In the Bernoulli() routine, Bernoulli numbers are evaluated exactly (as rational numbers) via one of the two algorithms. The first, simpler algorithm (BernoullliArray()) uses the recursion relation,

B[n]= -1/(n+1)*Sum(k,0,n-1,B[k]*Bin(n+1,k)).

This formula requires to know the entire set of B[k] with k up to a given n to compute B[n]. Therefore at large n this algorithm becomes very slow.

Here is an estimate of the cost of BernoullliArray. Suppose P is the required number of digits and M(P) is the time to multiply P-digit integers. The required number of digits P to store the numerator of B[n] is asymptotically P<>n*Ln(n). At each of the n iterations we need to multiply O(n) large rational numbers by large coefficients and take a GCD to simplify the resulting fractions. The time for GCD is logarithmic in P. So the complexity of this algorithm is O(n^2*M(P)*Ln(P)) with P<>n*Ln(n).

For large (even) values of the index n, the Bernoulli numbers B[n] are computed by a more efficient procedure: the integer part and the fractional part of B[n] are found separately.

First, by the theorem of Clausen -- von Staudt, the fractional part of (-B[n]) is the same as the fractional part of the sum of all inverse prime numbers p such that n is divisible by p-1. To illustrate the theorem, take n=10 with B[10]=5/66. The number n=10 is divisible only by 1, 2, 5, and 10; this corresponds to p=2, 3, 6 and 11. Of these, 6 is not a prime. Therefore, we exclude 6 and take the sum 1/2+1/3+1/11=61/66. The theorem now says that 61/66 has the same fractional part as -B[10]; in other words, -B[10]=i+f where i is some unknown integer and the fractional part f is a nonnegative rational number, 0<=f<1, which is now known to be 61/66. Indeed -B[10]= -1+61/66. So one can find the fractional part of the Bernoulli number relatively quickly by just checking the numbers that might divide n.

Now one needs to obtain the integer part of B[n]. The number B[n] is positive if Mod(n,4)=2 and negative if Mod(n,4)=0. One can use Riemann's Zeta function identity for even integer values of the argument and compute the value zeta(n) precisely enough so that the integer part of the Bernoulli number is determined. The required precision is found by estimating the Bernoulli number using the same identity in which one approximates zeta(n)=1 and uses Stirling's asymptotic formula for the factorial of large numbers,

Ln(n!)<=>Ln(2*Pi*n)/2+n*Ln(n/e).

At such large values of the argument n, it is feasible to compute the defining series for zeta(n),

zeta(n)<=>Sum(k,1,N,1/k^n).

This sum is calculated by the routine ZetaNum1(n, N). The remainder of the sum is of order N^(-n). By simple algebra one obtains a lower bound on N,

N>n/(2*Pi*e),

for this sum to give enough precision to compute the integer part of the Bernoulli number B[n].

Alternatively, one can use the infinite product over prime numbers p[i]

1/zeta(n)<=>Factorize(k,1,N,1-1/p[k]^s),

where k must be chosen such that the k-th prime number p[k]>N and N is chosen as above, i.e. N>n/(2*Pi*e). This formula is realized by the routine ZetaNum2(n, N). Since only prime numbers p[i] are used, this formula is asymptotically faster than ZetaNum1(n, N). The number of primes up to N is asymptotically pi(N)<>N/Ln(N) and therefore this procedure is faster by a factor O(Ln(N))<>O(Ln(n)). However, for n<250 it is faster (with Yacas internal math) to use the ZetaNum1(n, N) because it involves fewer multiplications.

For example, let us compute B[20] using this method.

All these steps are implemented in the routine Bernoulli1. The variable Bernoulli1Threshold determines the smallest n for which B[n] is to be computed via this routine instead of the recursion relation. Its current value is 20.

The complexity of Bernoulli1 is estimated as the complexity of finding all primes up to n plus the complexity of computing the factorial, the power and the Zeta function. Finding the prime numbers up to n by checking all potential divisors up to Sqrt(n) requires O(n^(3/2)*M(Ln(n))) operations with precision O(Ln(n)) digits. In the second step we need to evaluate n!, Pi^n and zeta(n) with precision of P=O(n*Ln(n)) digits. The factorial is found in n short multiplications with P-digit numbers (giving O(n*P)), the power of pi in Ln(n) long multiplications (giving O(Ln(n)*M(P))), and ZetaNum2(n) (the asymptotically faster algorithm) requires O(n*M(P)) operations. The Zeta function calculation dominates the total cost because M(P) is more costly than O(P). So the total complexity of Bernoulli1 is O(n*M(P)) with P<>n*Ln(n).

Note that this is the cost of finding just one Bernoulli number, as opposed to the O(n^2*M(P)*Ln(P)) cost of finding all Bernoulli numbers up to B[n] using the first algorithm BernoulliArray. If we need a complete table of Bernoulli numbers, then BernoulliArray is only marginally (logarithmically) slower. So for finding complete Bernoulli tables, Bernoulli1 is better only for very large n.


Approximate calculation of Bernoulli numbers

If Bernoulli numbers do not have to be found exactly but only to a certain floating-point precision P (this is usually the case for most numerical applications), then the situation is rather different. First, all calculations can be performed using floating-point numbers instead of exact rationals. This significantly speeds up the recurrence-based algorithms.

However, the recurrence relation used in BernoulliArray turns out to be numerically unstable and needs to be replaced by another (R. P. Brent, "A FORTRAN multiple-precision arithmetic package", ACM TOMS vol. 4, no. 1 (1978), p. 57). Brent's algorithm computes the Bernoulli numbers divided by factorials, C[n]:=B[2*n]/(2*n)! using a (numerically stable) recurrence relation

2*C[k]*(1-4^(-k))=(2*k-1)/(4^k*(2*k)!)-Sum(j,1,k-1,C[k-j]/(4^j*(2*j)!)).

The numerical instability of the usual recurrence relation

Sum(j,0,k-1,C[k-j]/(2*j+1)!)=(k-1/2)/(2*k+1)!

and the numerical stability of Brent's recurrence are not obvious. Here is one way to demonstrate them. Consider the usual recurrence (above). For large k, the number C[k] is approximately C[k]<=>2*(-1)^k*(2*Pi)^(-2*k). Suppose we use this recurrence to compute C[k] from previously found values C[k-1], C[k-2], etc. and suppose that we have small relative errors e[k] of finding C[k]. Then instead of the correct C[k] we use C[k]*(1+e[k]) in the recurrence. Now we can derive a relation for the error sequence e[k] using the approximate values of C[k]. It will be a linear recurrence of the form

Sum(j,0,k-1,(-1)^(k-j)*e[k-j]*(2*Pi)^(2*j)/(2*j+1)!)=(k-1/2)/(2*k+1)! *(2*Pi)^(-2*k).

Note that the coefficients for j>5 are very small but the coefficients for 0<=j<=5 are of order 1. This means that we have a cancellation in the first 5 or so terms that produces a very small number C[k] and this may lead to a loss of numerical precision. To investigate this loss, we find eigenvalues of the sequence e[k], i.e. we assume that e[k]=lambda^k and find lambda. If Abs(lambda)>1, then a small initial error e[1] will grow by a power of lambda on each iteration and it would indicate a numerical instability.

The eigenvalue of the sequence e[k] can be found approximately for large k if we notice that the recurrence relation for e[k] is similar to the truncated Taylor series for Sin(x). Substituting e[k]=lambda^k into it and disregarding a very small number (2*Pi)^(-2*k) on the right hand side, we find

Sum(j,0,k-1,(-lambda)^(k-j)*(2*Pi)^(2*j)/(2*j+1)!)<=>lambda^k*Sin((2*Pi)/Sqrt(lambda))<=>0,

which means that lambda=4 is a solution. Therefore the recurrence is unstable.

By a very similar calculation one finds that the inverse powers of 4 in Brent's recurrence make the largest eigenvalue of the error sequence e[k] almost equal to 1 and therefore the recurrence is stable. Brent gives the relative error in the computed C[k] as O(k^2) times the roundoff error in the last digit of precision.

The complexity of Brent's method is given as O(n^2*P+n*M(P)) for finding all Bernoulli numbers up to B[n] with precision P digits. This computation time can be achieved if we compute the inverse factorials and powers of 4 approximately by floating-point routines that know how much precision is needed for each term in the recurrence relation. The final long multiplication by (2*k)! computed to precision P adds M(P) to each Bernoulli number.

The non-iterative method using the Zeta function does not perform much better if a Bernoulli number B[n] has to be computed with significantly fewer digits P than the full O(n*Ln(n)) digits needed to represent the integer part of B[n]. (The fractional part of B[n] can always be computed relatively quickly.) The Zeta function needs 10^(P/n) terms, so its complexity is O(10^(P/n)*M(P)) (here by assumption P is not very large so 10^(P/n)<n/(2*Pi*e); if n>P we can disregard the power of 10 in the complexity formula). We should also add O(Ln(n)*M(P)) needed to compute the power of 2*Pi. The total complexity of Bernoulli1 is therefore O(Ln(n)*M(P)+10^(P/n)*M(P)).

If only one Bernoulli number is required, then Bernoulli1 is always faster. If all Bernoulli numbers up to a given n are required, then Brent's recurrence is faster for certain (small enough) n.

Currently Brent's recurrence is implemented as BernoulliArray1() but it is not used by Bernoulli because the internal arithmetic is not yet able to correctly compute with floating-point precision.