A Tour of NTL: Some Performance Data

A Tour of NTL: Some Performance Data

Here are some timing figures from using NTL. The figures were obtained using a 700MHz Pentium III running Linux, and using gcc version 2.95.2. Also, GMP version 3.1.1 was used as the primary long integer package.

Basic arithmetic

For various values of n, we give the running times for the following tasks:

IMUL: multiply two n-bit numbers.
IREM: compute the remainder upon dividing a 2 n-bit number by an n-bit number.
IINV: compute an inverse modulo an n-bit prime number.
PMUL: multiply two (n-1)-degree polynomials modulo an n-bit prime number.

The time for IMUL, IREM, IINV is stated in units of 10^{-6}s, and that of PMUL in units of seconds.

n IMUL IREM IINV PMUL
512 4 5 90 0.11
1024 11 13 260 0.74
2048 33 53 580 2.71
4096 110 168 1460 16.64

The IMUL, IREM, and IINV times essentially reflect the performance of the underlying GMP routines (NTL's interface to these routines is very "light weight").

The PMUL time reflects the performance of NTL's FFT implementation of polynomial arithmetic.

Factoring polynomials over finite fields

We next consider the problem of factoriing of univariate polynomials modulo a prime p. As test polynomials, we take the family of polynomials defined in [V. Shoup, J. Symb. Comp. 20:363-397, 1995]. For every n, we define p to be the first prime greater than 2^{n-2}*PI, and the polynomial is

sum(a[n-i]*X^i, i = 0..n),

where a[0] = 1, and a[i+1] = a[i]^2 + 1.

Here are some running times, in seconds, for various values of n:

n 64 128 256 512 1024
0.4 2.8 25.3 238.8 2255.4

Also of interest is space usage. The n = 512 case used about 5MB main memory, and the n= 1024 case used 18MB main memory.

Factoring polynomials over the integers

The second problem considered is factoring univariate polynomials over the integers. In NTL 5.2, we have implemented Mark van Hoeij's new method for polynomial factorization. Prior to this, NTL offered an implementation of the classical ``Zassenhaus method,'' which can take exponential time in the worst case, although the NTL algorithm implements several novel ``pruning techniques'' that can subsantially speed up the brute-force search stage of the Zassenhaus method. However, van Hoeij's algorithm seems to have made all of these techniques more or less obsolete.

Note that van Hoeij's method is more a general algorithmic technique than a specific algorithm. To obtain a specific algorithm, several parameters and strategies must be chosen. We have attempted to make these choices so as to obtain a good, general-purpose algorithm that works fairly well on a wide variety of input polynomials. All of these choices are a bit heuristic, and any feedback refarding possible improvements is most welcome.

We consider a number of test polynomials, collected and/or suggested by Paul Zimmermann and Mark van Hoeij.

P1 has degree 156, coefficients up to 424 digits, and 36 factors (12 of degree 2, 15 of degree 4, 9 of degree 8).
P2 has degree 196, coefficients up to 419 digits and 12 factors (2 of degree 2, 4 of degree 12 and 6 of degree 24).
P3 has degree 336, coefficients up to 597 digits and 16 factors (4 of degree 12 and 12 of degree 24).
P4 has degree 462, coefficients up to 756 digits, and two factors (degrees 66 and 396).
P5 has degree 64, coefficients up to 40 digits, and is irreductible.
P6 has degree 144, coefficients up to 165 digits and has 6 factors (4 of degree 12 and 2 of degree 48).
P7 has degree 384 and coefficients up to 57 digits, and is irreducible.
P8 has degree 972, coefficients up to 213 digits, and is irreducible.
M12_5 has degree 792, coefficients up to 2813 digits, and is irreducible.
M12_6 has degree 924, coefficients up to 3937 digits, and two factors (degrees 132 and 792).

P1 comes from the Rational Univariate Representation (RUR) of the Cyclic 6 system, and P2 and P3 come from the RUR of Cyclic 7.

P4 was contributed by A. Hulpke and H. Matzat: it is the 5-set resolvent of the polynomial f = x^11 + 101*x^10 + 4151*x^9 + 87851*x^8 + 976826*x^7 + 4621826*x^6 - 5948674*x^5 - 113111674*x^4 - 12236299*x^3 + 1119536201*x^2 - 1660753125*x - 332150625 and its factorization would prove that f has Galois group M11.

P5 is the Swinnerton-Dyer polynomial for 2,3,5,7,11,13, i.e. the product of all terms of the form x+/-sqrt(2)+/-...+/-sqrt(13). It is a "bad" polynomial for the method of factoring in Fp and combining the factors in Z, because all its factors in Fp are of degree at most 2.

P6 is the resultant with respect to x of the polynomial p = -2566974800*x^2 +134697056*x^4 -3312297*x^6 +43109*x^8 -308*x^10 +x^12 +1142440000 with p(y-2*x); it was contributed by Frederic Lehobey and Nicolas Rennert.

P7 (contributed by John Abbott) is the norm of x^6 + (a1+a2+a3+a4+a5+a6)*(x^4-x^2) + 1 with a6^2-2, a5^2-3, a4^2-5, a3^2-7, a2^2-11 and a1^2-13, i.e. the resultant of that polynomial with a6^2-2 with respect to a6, then with a5^2-3 wrt a5, and so on.

P8 (contributed by Jean-Charles Faugere) comes from the Groebner basis of Cyclic 9.

M12_5 and M12_6 are the 5th and 6th resolvents of the polynomial f with Galois group M_{12} from the example in van Hoeij's paper. These polynomials are non-monic (unlike the others).

We also used the polynomials S6, S7, S8, S9, and S10, where Si is the Swinnerton-Dyer polynomial of degree 2^i corresponding to the first i primes. These are all irreducible polynomials, and are particularly bad for the Zassenhaus method.

The following two polynomials, H1 and H2, were suggested by Mark van Hoeij as examples that may be particularly challenging to factor using his algorithm:

H1 is the polynomial (x-1)^960-1.
H2 is a degree 4096 polynomial that is the product of a number of cyclotomic polynomials (with factors of degrees 128, 128, 256, 512, 1024, and 2048).

The polynomial H1 is defined as it is, rather than as x^960-1, simply to "hide" the specail structure of the polynomial -- many factorizers would apply special techniques to factor x^960-1, but will most likely not apply these techniques to (x-1)^960-1. To factor this polynomial with van Hoeij's method, one has to consider very high traces, and thus it is essential that a good implementation at some point crosses over to a "dense" strategy (or alternatively, implements a brute-force factor search in combination with van Hoeij's method).

You can download all of the above polynomials (except H1, which is easy trivial to generate from scratch).

We collected timing data for both the Zassenhaus and van Hoeij algorithms, both with and without the so-called "power hack". By setting a run-time switch, the user can choose to employ a "power hack", which attempts to speed up the factorization if the polynomial f(x) is of the form g(x^m) for some m > 1. However, while this "power hack" may sometimes speed things up substantially, it can also slow things down -- for the Zassenhaus method, this slowdown is usually negligible, but for our implementation of van Hoeij's algorithm, this slowdown can be quite substantial. Because of this, in our implementation of van Hoeij, we have modified the power hack so that if it "seems" to be going slowly, it is abandoned. This "early abort" power hack strategy is "on" by default, but can be turned off by the user by setting a run-time switch.

In the table below, running time is presented in seconds for each of the 4 methods:

Z-: Zassenhaus without power hack.
Z+: Zassenhaus with power hack.
H-: van Hoeij without power hack.
H+: van Hoeij with power hack (NTL's default strategy).

We also state for each polynomial the number r of modular factors of the polynomial (this is not necessarily relevant for the power-hack running times),

For comparison, we also state some running times for Maple version 5.1, running on a 500MHz Alpha 21264/EV6, as reported by Paul Zimmermann.

r Z- Z+ H- H+ maple
P1 60 2.8 0.3 2.8 0.3 0.6
P2 20 4.3 1.5 4.3 1.5 1.8
P3 28 13.4 3.0 13.4 3.0 2.4
P4 42 40.2 40.2 27.2 27.2 > 5000.0
P5 32 39.2 37.5 0.5 0.5 > 5000.0
P6 48 24.0 0.9 2.9 1.0 48.0
P7 76 15.5 13.2
P8 54 3930.0 3940.0 44.3 52.9 > 5000.0
P4*rev(P4) 84 826.0 826.0
H1 131 108.0 108.0
H2 256 379.0
S7 64 3.4 3.8
S8 128 53.4 64.6
S9 256 1200.0 1360.0
S10 512 31300.0 31700.0
S6*S7 96 18.8 7.2
S7*S9 320 3550.0 3630.0
S8*S9 384 8410.0 8600.0
M12_5 72 129.0 129.0
M12_6 84 410.0 396.0

Some remarks:

A blank entry indicates that the corresponding experiment was not performed.
For several of the Maple experiments, the experiment was terminated after 5000 seconds, as indicated in the table.
It would be futile to attempt to factor any of the other polynomials in the table using the Zassenhaus method, as the running time of that method is exponential in r.
We attempted to factor H2 using the H- algorithm, but after several hours of computation, the algorithm had not yet terminated. Using the H+ algorithm, things were much easier: the hardest part was to prove the irreducibility of a degree 2048 polynomial, for which the number of modular factors was 128.
Note that the power hack strategy can slow things down a bit, but can also speed things up, sometimes dramatically.
In the above table, rev(f) denotes the "reverse" of the polynomial f, i.e., x^{deg(f)}f(x^{-1}).

`n`	IMUL	IREM	IINV	PMUL
512	4	5	90	0.11
1024	11	13	260	0.74
2048	33	53	580	2.71
4096	110	168	1460	16.64

	`r`	Z-	Z+	H-	H+	maple
P1	60	2.8	0.3	2.8	0.3	0.6
P2	20	4.3	1.5	4.3	1.5	1.8
P3	28	13.4	3.0	13.4	3.0	2.4
P4	42	40.2	40.2	27.2	27.2	> 5000.0
P5	32	39.2	37.5	0.5	0.5	> 5000.0
P6	48	24.0	0.9	2.9	1.0	48.0
P7	76			15.5	13.2
P8	54	3930.0	3940.0	44.3	52.9	> 5000.0
P4*rev(P4)	84			826.0	826.0
H1	131			108.0	108.0
H2	256				379.0
S7	64			3.4	3.8
S8	128			53.4	64.6
S9	256			1200.0	1360.0
S10	512			31300.0	31700.0
S6*S7	96			18.8	7.2
S7*S9	320			3550.0	3630.0
S8*S9	384			8410.0	8600.0
M12_5	72			129.0	129.0
M12_6	84			410.0	396.0