11
2- \chapter {Floating point }
2+ \chapter {Floating point arithmetic }
33\label {floatingpoint }
44
5- Starting with version 5.0 \FORM \ is equiped with arbitrary floating point
6- capability. The low level routines are part of the GMP and mpfr libraries
7- which should be available on most systems. If not they can be picked up
8- easily from the internet. The main commands involving the floating point
9- system are
5+ Starting with version 5.0, \FORM {} is equiped with arbitrary precision floating point
6+ arithmetic. The low level routines are handled by the GMP and MPFR libraries,
7+ which are available on most systems and if missing can be easily picked up
8+ from the internet. This chapter describes the commands, functions, and behaviour
9+ of \FORM 's floating point sytem.
10+
11+ \section {Initializing and closing the floating point system }
12+ Before any floating-point operations can be performed, \FORM {} must activate the
13+ floating point system and set the working precision. This initialization allocates
14+ the internal data structures used by the GMP and MPFR libraries. The system remains
15+ active until the end of the program, or until it is explicitly closed.
16+ The two statements that control these operations are:
1017\begin {description }
11- \item [\# startfloat ] This instruction is needed to startup the floating
12- point system. Invoking it will allocate a number of arrays. The instruction
13- has either one or two arguments:
18+ \item [\# StartFloat ] This instruction initializes the floating
19+ point system and allocates the necessary internal arrays.
20+ It takes either one or two arguments:
1421\begin {verbatim }
15- #startfloat <precision> [,MZV=<maximumweight>]
22+ #StartFloat <precision> [,MZV=<maximumweight>]
1623\end {verbatim }
1724The first argument is mandatory and specifies the desired precision. It must
18- be a positive integer followed by either \texttt {b } (for precision in bits)
25+ be a positive integer followed by either a \texttt {b } (for precision in bits)
1926or \texttt {d } (for precision in decimal digits).
20- \FORM {} will round to at least this precision. Because the internal
21- routines work with WORDs, the precision (in bits) will internally be rounded up to the nearest
22- integer number of WORDs. The second argument is optional for when one wants
23- to work with multiple zeta values (MZVs) or Euler sums. It specifies the
24- maximum weight that will be used. The evaluation of the sums requires a
25- number of auxiliary arrays. The default value is zero. If one would like to
26- change the precision during a run, this is possible. The effect would be
27- that the existing arrays are released and new arrays will be allocated.
28- \item [\# endfloat] This instruction releases all arrays allocated for the
29- floating point system.
27+ \FORM {} will round to at least this precision.
28+ The second argument is optional and only needed when working with multiple
29+ zeta values (MZVs) or Euler sums. It specifies the maximum weight
30+ that will be used. The evaluation of the sums requires a
31+ number of auxiliary arrays that depend on this weight. The default weight is zero.
32+ \item [\# EndFloat] This instruction releases all arrays allocated for the
33+ floating point system. Note that if one would like to change the precision during a run,
34+ this is now possible with a new \texttt {\# StartFloat } instruction.
3035\end {description }
36+ Example programs that illustrate the use of these statements and the
37+ functionality of \FORM 's floating point system are given below.
38+
39+
40+ \section {Conversion between rational and floating point coefficients }
41+ A term in an expression can have a rational or floating point coefficient.
42+ The following statements convert between the two.
3143\begin {description }
32- \item [tofloat] Converts the rational coefficients at the ground level to
33- floating point numbers in the precision specified in the \# startfloat
34- instruction. From this point on the coefficient at this level will be
35- floating point. If one needs to convert numbers inside a function argument
36- one should use the argument environment. This can be nested.
37- \item [torational] Tries to convert the floating point coefficients to
38- rational numbers. To this end it uses repeated fractions as in
44+ \item [ToFloat] Converts rational coefficients to
45+ floating point numbers in the precision specified by \texttt {\# StartFloat }.
46+ From this point on, the coefficient will be floating point.
47+ \item [ToRational] Attempts to convert floating point coefficients to
48+ rational numbers. To this end it uses continued fractions as in
3949\begin {eqnarray }
40- x & \rightarrow & n_0 + 1/(n_1+1/(n_2+1/(n_3+\cdots ))) \nonumber
50+ x \; \rightarrow \; n_0 + \frac {1}{\, n_1 + \frac {1}{\, n_2 + \frac {1}{\, n_3 + \cdots }}}\; ,
51+ \nonumber
4152\end {eqnarray }
4253with $ x$ a floating point number. The algorithm keeps track of the
4354remaining precision and if $ 1 /n_i$ is close to this precision it truncates
44- the sequence at $ n_{i-1}$ . After that it works out the fraction. It could
45- be that $ x$ cannot be expressed as a fraction within the given precision.
55+ the sequence at $ n_{i-1}$ . After that it works out the corresponding fraction.
56+ It could be that $ x$ cannot be expressed as a fraction within the given precision.
4657This can usually be seen by that the fractions are `rather wild', or that
4758the result changes when the precision is increased. This statement can also
48- be abbreviated to `torat'.
49- \item [evaluate] If this command has no arguments all floating point
50- functions that \FORM {} knows about will be evaluated. The currently allowed
51- arguments are the functions mzv\_ , euler\_ , sqrt\_ and mzvhalf\_ . If any
52- (or more than one) of these are specified only those functions will be
53- evaluated.
54- \item [strictrounding] This statement rounds floating point numbers to a
55- given precision. The syntax is
59+ be abbreviated as \texttt {ToRat }.
60+ \end {description }
61+
62+ The above statements operate on ground level coefficient only. To convert numbers
63+ inside a function argument, one must use the \texttt {Argument } environment.
64+ For example:
65+ \begin {verbatim }
66+ CFunction f;
67+ #StartFloat 10d
68+ Local F = 0.1666666666*f(0.1428571429);
69+ ToRat;
70+ Print "<1> %t";
71+ Argument f;
72+ ToRat;
73+ EndArgument;
74+ Print "<2> %t";
75+ .end
76+ <1> + 1/6*f(1.428571429e-01)
77+ <2> + 1/6*f(1/7)
78+ \end {verbatim }
79+ The argument environment may be nested.
80+ Similarly, the statements \texttt {Evaluate }, \texttt {StrictRounding } and \texttt {Chop } act at
81+ the ground level. To have them act on function argument, one uses the \texttt {Argument } environment.
82+ These statements are explained further below.
83+
84+ \section {Evaluation of functions and symbols }
85+ Before version 5.0, \FORM {} already reserved function names for many common mathematical
86+ functions. These functions can now be evaluated numerically using:
87+
88+ \begin {description }
89+ \item [Evaluate] This statement evaluates the mathematical functions and or symbols numerically:
90+ \begin {verbatim }
91+ Evaluate [function(s)],[symbol(s)];
92+ \end {verbatim }
93+ where the argument specifies the function(s) and/or symbol(s) to evaluate.
94+ More than one function and/or symbol may be listed.
95+ If this statement is used without arguments, all floating point functions and symbols that \FORM {}
96+ knows will be evaluated. Currently, the full list of functions that can be evaluated numerically reads
97+ \begin {verbatim }
98+ sqrt_, ln_, eexp_, li2_, gamma_, agm_,
99+ sin_, cos_, tan_, asin_, acos_, atan_, atan2_,
100+ sinh_, cosh_, tanh_, asinh_, acosh_, atanh_,
101+ mzv_, euler_, mzvhalf_,
102+ \end {verbatim }
103+ where the functions on the last line denote the multiple zeta values, Euler sums and
104+ harmonic polylogarithms of argument $ 1 /2 $ respectively.
105+ The list of symbols/constants that can be evaluated is
56106\begin {verbatim }
57- strictrounding [precision];
107+ pi_, ee_, em_,
58108\end {verbatim }
59- where precision is an optional argument that specifies the rounding
109+ where \texttt {ee\_ }\index {ee\_ } denotes the basis of the natural logarithm
110+ and \texttt {em\_ }\index {em\_ } the Euler-Mascheroni constant.
111+
112+ In addition, the functions \texttt {lin\_ }, \texttt {hpl\_ } and \texttt {mpl\_ } are reserved function names,
113+ but currently have no numerical evaluation.
114+ \end {description }
115+
116+
117+ \section {Rounding behaviour }
118+ \begin {description }
119+ \item [StrictRounding] This statement rounds floating point numbers to a
120+ given precision:
121+ \begin {verbatim }
122+ StrictRounding [<precision>];
123+ \end {verbatim }
124+ where \texttt {<precision> } is an optional argument that specifies the rounding
60125precision in either digits or bits, using the same syntax as
61- \texttt {\# startfloat }. If no argument is given, this statement rounds
62- the floating point coefficients to the default precision. Internally,
63- the GMP and mpfr libraries may use extra precision beyond that set by
64- \texttt {\# startfloat }. As a result, terms may not merge due to this
65- extra precision. For example:
126+ \texttt {\# startfloat }. If omitted, the default precision is used.
127+
128+ Internally, the GMP and mpfr libraries may use extra precision beyond that set by
129+ \texttt {\# startfloat }. As a result, terms that print the same may still differ slightly
130+ due to this extra precision and therefore fail to merge . For example:
66131\begin {verbatim }
67132 #startfloat 6d
68133 CFunction f;
@@ -89,13 +154,13 @@ \chapter{Floating point}
89154$ 1.1100110101011111101 *2 ^{-14}$ . When rounded to 5 bits, this becomes
90155$ 1.1101 *2 ^{-14}$ , which in decimal digits appears as
911561.10626220703125e-04.
92- \item [Chop] This statement removes floating point numbers that are smaller
93- in absolute magnitude than a specified threshold. It takes one argument delta :
157+ \item [Chop] This statement removes floating point numbers that are { \em smaller}
158+ in absolute magnitude than a specified threshold. It takes one argument:
94159\begin {verbatim }
95160 Chop <delta>;
96161\end {verbatim }
97- All floating point numbers with absolute value less than delta are replaced by 0.
98- Terms with no floating point coefficient are left untouched. The threshold delta
162+ All floating point numbers with absolute value { \em less} than \texttt { < delta> } are replaced by 0.
163+ Terms with no floating point coefficient are left untouched. The threshold \texttt { < delta> }
99164can be a floating point number, integer, rational number, or power. Because
100165statements in \FORM {} act term by term, it is often important to sort before invoking the
101166chop statement. Otherwise, terms might be removed individually, while after
@@ -109,33 +174,21 @@ \chapter{Floating point}
109174 Format floatprecision;
110175\end {verbatim }
111176\FORM {} prints floats with the number of digits specified by the current
112- \# startfloat instruction. With
177+ \texttt { \ # startfloat} instruction. With
113178\begin {verbatim }
114179 Format floatprecision <precision>;
115180\end {verbatim }
116181\FORM {} prints the number of digits specified by \texttt {<precision> }.
117- The syntax is the same as for the precision in \# startfloat: a positive
118- integer followed by either \texttt {b } (for bits) or \texttt {d } (for decimal
119- digits). If the requested precision exceeds the precision specified by
120- \# startfloat, only the available digits are printed. Finally, with
182+ The syntax is the same as for the precision in \texttt {\# startfloat }.
183+ If the requested precision exceeds the precision specified by
184+ \texttt {\# startfloat }, only the available digits are printed. Finally, with
121185\begin {verbatim }
122186 Format floatprecision off;
123187\end {verbatim }
124- the floating point numbers are printed in raw internal format.
188+ the floating point numbers are printed in raw internal format, see also section \ref { sec:float_raw } .
125189\end {description }
126- In addition to the above commands there are the following functions that
127- can be evaluated sqrt\_ , ln\_ , eexp\_ , li2\_ , gamma\_ , agm\_ , sin\_ , cos\_ , tan\_ ,
128- asin\_ , acos\_ , atan\_ , atan2\_ , sinh\_ , cosh\_ , tanh\_ , asinh\_ , acosh\_ , atanh\_ .
129- For the function lin\_ there is currently no code.
130- The agm\_ function is the arithmetic geometric mean of its two input
131- values.
132-
133- In addition to the above functions there are also the constant
134- pi\_ \index {pi\_ }, the basis of the natural logarithm ee\_ \index {ee\_ } and the
135- Euler-Mascheroni constant em\_ \index {em\_ }. These constants will also be
136- expanded with the evaluate command. When given as an argument to evaluate,
137- only the specified constants will be evaluated.
138190
191+ \section {Examples }
139192The following example shows some work with Multiple Zeta Values (MZV's):
140193\begin {verbatim }
141194 #StartFloat 500b, MZV=15
@@ -190,10 +243,10 @@ \chapter{Floating point}
190243
191244 0.08 sec out of 0.09 sec
192245\end {verbatim }
193- The \ # startfloat initializes the floating point system and allocates arrays
194- for 500 bits of precision. If there is a second number it indicates the
195- maximum weight for MZVs and Euler sums. The functions are only evaluated
196- when the proper command is given . In the second module we divide the
246+ In the first module, \texttt { \ # startfloat} initializes the floating point system with
247+ 500 bits of precision and a maximum weight for the MZVs and Euler sums of 15.
248+ The \texttt { mzv \_ } functions are then evaluated with the \texttt { Evaluate }
249+ statement . In the second module we divide the
197250numbers and convert the result to a rational. It is a good idea to try this
198251with various precisions to see whether this is stable. With 60 bits the
199252final answer would be
@@ -202,5 +255,51 @@ \chapter{Floating point}
202255\end {verbatim }
203256while at 150 bits we have already the same answer as with 500 bits. The
204257fraction that is obtained by this program can be proven to be correct.
205- \vspace {3mm}
206258
259+
260+ \section {Raw form }
261+ \label {sec:float_raw }
262+ Internally, floating point numbers are represented by the function \texttt {float\_ },
263+ i.e. \texttt {float\_ (prec, size, exp, limbs) }. The integer arguments encode the
264+ internal representation of the floating point number as in the GMP library:
265+ \begin {description }
266+ \item [prec] The precision of the mantissa in limbs.
267+ \item [size] The number of limbs currently in use.
268+ \item [exp] The exponent, determining the location of the implied radix point.
269+ \item [limbs] The limbs packed as the numerator of a \FORM {} rational.
270+ \end {description }
271+ In a normalized term containing \texttt {float\_ }, the rational coefficient must
272+ be either $ 1 /1 $ or $ -1 /1 $ , where the sign of the term is absorbed into the rational
273+ coefficient.
274+ Furthermore, the \texttt {float\_ } is protected from the pattern matcher and from
275+ statements that act on functions -- such as \texttt {Transform }, \texttt {Argument },
276+ \texttt {Normalize } etc.
277+ The following program illustrates this:
278+ %
279+ \begin {verbatim }
280+ CFunction f;
281+ #StartFloat 10d
282+ Local F = 1.23456789 + f(1,2);
283+ Identify f?(?a) = f(10);
284+ Print "<1> %t";
285+ .sort
286+ <1> + 1.23456789e+00
287+ <1> + f(10)
288+ #EndFloat
289+ Normalize;
290+ Print "<2> %t";
291+ .sort
292+ <2> + float_(2,3,1,420101683733788795657820481376616399786)
293+ <2> + 10*f(1)
294+ #StartFloat 5d
295+ Print "<3> %t";
296+ .end
297+ <3> + 1.2346e+00
298+ <3> + 10*f(1)
299+ \end {verbatim }
300+ %
301+ As shown, the \texttt {id }-statement does not effect the \texttt {float\_ } function.
302+ Here we also see the use of the preprocessor statement \texttt {\# EndFloat } which closes
303+ the floating point system. After this statement, the \texttt {float\_ } function becomes a
304+ regular function. Its protected status, however, persists so that \texttt {id }-statements
305+ or statements like \texttt {Normalize } still do not modify it.
0 commit comments