Signed integers with unlimited range

            (class cBigNumber, version 2.2)

                 The programmer manual

                                      Devoted to my mom.

Contents
========

1. Introduction

1.1. Requirements
1.2. Technical parameters
1.3. Software license
1.4. Disclaimer
1.5. Third party components
1.6. Technical support
1.7. Contact information
1.8. Acknowledgements

2. Contents of the program folder

2.1. Test programs

3. Instructions

3.1. Using of unlimited numbers
3.2. Additional operations, functions and methods
3.3. Function and methods for manual optimization of programs
3.4. Testing numbers for primality and factoring

4. Guidelines

4.1. Possible areas of application
4.2. Performance
4.3. Consumption of memory
4.4. Interaction with operation system
4.5. Prevention of bugs
4.6. Built-in bug prevention tools
4.7. Basic CBNL type and functions

5. Technical information

5.1. Information on implementation
5.2. Peculiarities of implementation of regular operations

Appendix 1. An explanatory slip to work for contest SofTool'99

Appendix 2. Known issues

Appendix 3. What's new


1. Introduction
===============

A C++ class cBigNumber implements integers with unlimited range.
The class provides for all regular operations of language C++,
including arithmetic, logic and bit-wise operations, operations of
comparison, shift operations, and also stream input-output with
all of integer modifiers. Extra functions - power, power by
module and Miller strong probable prime test.

The class uses fast algorithms, such as the binary exponentiation,
which are optimized for numbers containing up to 100,000 and more
bits. More large numbers can also be used; the exponentiation tests
were carried out for numbers containing up to 12,000,000 bits.


1.1. Requirements
-----------------

The class is designed to conform to the standard C++
ISO/IEC 14882:1998(E) and C++ 11 ISO/IEC 14882:2011.

Implementation of class does not depend on CPU digit capacity,
but it is assumed that numbers are internally represented in
the complementary binary code. The class uses integer machine
arithmetic and do not use operations on floating-point numbers.

All mathematical operations of class may be performed using only
additive, shift and logical operations of processor. Hardware
operations for multiplication and division are used for improving
of performance.

The class have been tested under:
- Microsoft Visual C++ 6.0, 7.0, SDK 2003 R2, 2005, 2010, 2015-2022.
- Borland C++ 3.1 (16 bit).
- Borland C++ 4.5 (32 bit).
- Borland C++ Builder 1.0.
- GNU g++ 2.9.6 (32 bit, Red Hat Linux 7.1).
- GNU g++ 3.3.3 (ARM, Pocket GCC, Windows Mobile 5, 6.1).
- GNU g++ 4.1.2 (32 and 64 bit, SuSE Linux 10).
- GNU g++ 4.2.3 (32 bit, Workbench).
- GNU g++ 4.2.1 (64 bit, FreeBSD 9.3).
- Clang++ 3.4.1 (64 bit, FreeBSD 9.3).
- GNU g++ 6.4.0 (64 bit, FreeBSD 11).
- Clang++ 4.0.0 (64 bit, FreeBSD 11).
- GNU g++ 9.4.0 (64 bit, Ubuntu 20.04).
- Clang++10.0.0 (64 bit, Ubuntu 20.04).

On Microsoft Visual C++ 2005 and above the class uses 32/64 and
64/128 bit intrinsics for hardware multiplication, shifts etc.
The same optimization for wider scope of compilers and also
for 64/32 and 128/64 bit hardware division is provided by
optional assembler add-on package (ref. below).

On Microsoft Visual C++ 2013 and above the class uses intrinsics
for subtraction with borrow when dividing to short divider (two
machine words). The same acceleration for larger dividers is
provided by optional assembler add-on package (ref. below).

On Microsoft Visual C++ 2019 and above the class uses hardware
multiplication and intrinsics for dividing of double 64/128
bits words to accelerate division and module operations
(except for programs, complied for Windows XP/2003).

Optional 32 bit code with assembler optimization is available
as separate add-on package for:
- Microsoft Visual C++ 6.0, 7.0, 2003-2022 and above.
- Borland C++ Builder 1.0 and above.
- Borland C++ 4.5 (except for multiplication with accumulation).

Optional 64 bit code with assembler optimization is available
as separate add-on package for Microsoft Visual C++ 2005-2022
and above. It have been tested under:
- Microsoft Visual C++ 2010, SDK 7.1 x64
- Microsoft Visual C++ 2015-2022

Optional code with 32/64/128 bits assembler multiplication and
division is available as separate add-on package for GNU g++,
have been tested under:
- GNU g++ 4.1.2 (32 and 64 bit, SuSE Linux 10).

The class supports multithreading, ref. Sections 3 and 4.2.4
for more information.


1.2. Technical parameters
-------------------------

Technical parameters are given in assumption that byte contains
8 bit, processor word contains 16, 32 or 64 bits of type size_t.
Some estimations depends of size of CBNL word that contains either
32 or 64 bits integer of either C type long, C++ 11 type long long
or compiler-specific type defined in the file Cbnl.h.

- Size of descriptor of number, in processor words:      2

- Representation of number: normalized complementary binary code
                            with variable number of CBNL words
                            in the order from low to high word.

- Initial value of number by default:                    0

- Initial amount of dynamic memory allocated
  for number 0, in bytes:                                0

- Size of service information allocated for
  a number, in CBNL words:      number 0            0 or 2
                                non-0 number:            2

- Minimal amount of dynamic memory required for
  non-0 number, in CBNL words (*):                       3

- Initial amount of dynamic memory allocated for
  non-0 number, in bytes (*):   16 bits:                44
                                32 bits:               104
                                64 bits:               224
                                      sizeof(size_t)*30-16

- Maximal amount of memory to allocate
  for a number, in bytes (*):   16 bits:            34,988
                                32 bits:     1,607,077,576
                                64 bits:            system

- Maximal number of operations of memory
  allocation for a number depending on
  its size n, in bytes (*):     16 bits: 2*log3((n+12)/ 96)
                                32 bits: 2*log3((n+16)/224)
                                64 bits: 2*log3((n+32)/480)
                2*log3((n+sizeof(size_t)*2+sizeof(CBNL)*2)/
                                    (sizeof(size_t)*64-32))

- Size of automatically reserved memory
  in relation to the actual size of a code
  of number (**)                         In average: 37.5%
                                            Maximum:  100%

- Maximal amount of memory required for
  automatic optimization of multiplication,
  in bytes (***,****)           16 bits:              1.5K
                                32 bits and more:      90K

- Amount of memory required for automatic
  optimization of division, in relation to
  size of divider (***,*****):  32 bits and less     3200%
                                64 bits              6400%
                                         sizeof(CBNL)*800%

- Maximal amount of memory required for
  automatic optimization of division,
  in bytes (***,*****):         16 bits:                2K
                                32 bits:               64M
                                64 bits:              128M
                                          sizeof(CBNL)*16M

(*)     Includes service information.

(**)    It is supposed that size of number is above minimal.

(***)   Except for operations that use table of shifts
        explicitly. Maximal amount of memory depends on
        settings in the file Cbignumf.h.

(****)  Not including amount of memory required for copying
        of operands.

(*****) Optimization is not necessary for new algorithm based
        on hardware multilpication in version 2.2 and above.
        In the previous algorithm it is provided for dividers
        up to 60 bytes in 16 bit mode and up to 2M bytes in
        32 and 64 bit modes.


1.3. Software license
---------------------

The following copyright statement describes authorship of
the software.

(C) Copyright 1998-2024 by Dr. Raul N. Shakirov, IMach of RAS (UB).

Source code written entirely on C++ without explicit use of
machine assembler instructions is distributed as "Public
domain", that is permission has been granted to copy,
distribute and modify software in any context without fee,
including a commercial application, provided that the
aforesaid copyright statement is present in user-accessible
supporting documentation, as well as exhaustive description
of changes.

This does not affect your ownership of the derived work
itself, and the intent is to assure proper credit for the
authors of this software, not to interfere with your
productive use of this software. "Derived works" includes
all programs that utilize this software. Credit must be
given in user-accessible documentation.


1.4. Disclaimer
---------------

Author ensures that he has secured all necessary consents
and approvals to use third party intellectual property
rights for this software.

Author confirms that this software does not contain parts,
which are intended for the purposes, others than central
purpose announced in the software documentation.

THIS SOFTWARE IS PROVIDED "AS IS". NO WARRANTY OF ANY KIND
IS EXPRESSED OR IMPLIED. YOU USE AT YOUR OWN RISK. THE
AUTHOR WILL NOT BE LIABLE FOR DATA LOSS, DAMAGES, LOSS OF
PROFITS OR ANY OTHER KIND OF LOSS WHILE USING OR MISUSING
THIS SOFTWARE.


1.5. Third party components
---------------------------

1. Strong Probable Primarily test for base b
   by Olivier Langlois <olanglois@sympatico.ca>
   Public available source.

2. Tausworth based random number generator.
   Ref. Glenn Rhoads' home page at
   http://remus.rutgers.edu/~rhoads/Code/code.html
   Public available source.

3. Microsoft(R) Visual C++ 6.0 Compiler (Enterprise Edition).
   Licensed to Institute of Engineering Science of RAS (UB).

4. MSDN Subscriptions library, April 2001.
   Licensed to Institute of Engineering Science of RAS (UB).

5. Microsoft(R) Visual C++ 2010 Express Compiler.
   Public available software.

6. Microsoft(R) Windows SDK v7.1 x64.
   Public available software.

7. Microsoft(R) Visual C++ 2015 Community Compiler.
   Public available software.

8. GNU C++ compiler g++ 2.9.6
   Public available software.

9. GNU C++ compiler g++ 4.1.2
   Public available software.


1.6. Technical support
----------------------

Technical support is carried out at a web cite
http://www.imach.uran.ru/cbignum

Please, send your comments and bug reports to the author.


1.7. Contact information
------------------------

Author: Dr. Raul N. Shakirov,
Institute of Engineering Science, Russian Ac. of Sci. (UB),
34 Komsomolskaya str., Ekaterinburg, 620219, Russia.

Phone:  +7 (343) 375-35-65
Email:  raul@imach.uran.ru


1.8. Acknowledgements
---------------------

My gratitude to Dr. Nickolai Nevesenko for live discussing
of various ideas about numeric algorithms and DemiS for testing
the class under modern gcc and clang.


2. Contents of the program folder
=================================

The class cBigNumber is implemented as the following files:

   Cbignum.cpp  - implementation of class cBigNumber
   Cbignum.h    - declaration of class cBigNumber
   Cbignum.txt  - the programmer manual (eng)
   Cbignumr.txt - the programmer manual (rus)
   Cbignumf.cpp - implementation of base arithmetic functions for
                  unlimited numbers with machine-level instructions
                  (in add-on package)
   Cbignumf.inl - implementation of base arithmetic functions for
                  unlimited numbers on standard C++
   Cbignumf.h   - macro parameters for base arithmetic functions
   Cbignums.cpp - implementation of stream input-output operators
   Cbignums.h   - declarations of stream input-output operators
   Cbnl.h       - declarations for compiler detection and CBNL functions
   Cbnl.inl     - declarations for compiler detection and CBNL functions
   Cbnl64.inl     (in add-on package)
   Cios.h       - declarations for standard stream input-output
   Cthr.h       - declarations for multithreading support
   Ctty.cpp     - handler functions for independent stream output
   Ctty.h       - class for compiler independent stream output
   Ctty.txt     - the programmer manual for independent output (eng)
   Cttyr.txt    - the programmer manual for independent output (rus)
   Exarray.cpp  - memory allocation functions
   Exarray.h    - template of dynamic arrays
   Exarray.txt  - the manual for Exarray templates (eng)
   Exarrayr.txt - the manual for Exarray templates (rus)
   Exdebug.h    - debugging macro definitions
   Exstring.h   - redefinition of functions from string.h
   Exthread.h   - macro for thread local storage
   Prime.cpp    - functions for determining of prime numbers
   ENG\         - sources with comments (eng)
   RUS\         - sources with comments (rus)

Add-on package also contains:
   Cbnl32.obj   - 32 bit assembler object file
   Cbnl64.obj   - 64 bit assembler object file
   Cbnl64x.obj  - 64 bit assembler object file for BMI2

File Cbignum.txt describes documented functions of the class
which will be kept in the subsequent versions of the class if
other is not stipulated in the appropriate section.

For convenience of studying explanatory comment are built into
the headers and source files. These comments can be considered
as the additional documentation. If programmers will experience
difficulties at acquaintance with texts the author will be glad
to help.

WARNING:
All definitions is headers and source files, which are not
providing for regular C operations and not listed in file
Cbignum.txt, are considered to be not documented. They can be
excluded or modified in the subsequent versions with no notices.


2.1. Test programs
------------------

Test programs, included in the distribution, are compiled
with assembler add-in package under 32 and 64 bit modes for
operational systems Windows and Linux. Also are available
special 64 bit programs that use BMI2 instruction set of
modern processors (Intel Haswell/AMD Excavator/AMD Zen).

NOTE: The class determines support for BMI2 by presence of
      macro __AVX2__ under Microsoft Visual C++.

      To compile special programs under Visual C++ set in
      Project Properties C/C++  Code Generation - Enable
      Enhanced Instruction Set - AVX2
      You can compile them with or without assembler add-on
      package, but in the latter case AVX2 is not
      significantly effective.

On large numbers programs for Windows work several times faster
than appropriate programs for Linux, because programs for
Windows were compiled using built-in assembler.

To compile all the programs, use the following files:

   Cbignum      - commands for GNU g++
   Cbignum.dsw  - workspace for Microsoft Visual C++ 6.0
   Cbignumc     - commands for clang++
   Cbignumd     - commands for GNU g++ debug mode

Test programs and compiling instructions for PocketPC are in
separate add-on package.


2.1.1. Integer calculator Arifexp
---------------------------------

Program Arifexp carries out arithmetic calculations, which are
set as the following expressions:
a=      (print number)
a+b=    (addition)
a-b=    (subtraction)
a*b=    (multiplication)
a/b=    (division and module)
a%b=    (module)
a\b=    (power)
a<<b=   (left shift)
a>>b=   (right shift)
a|b=    (disjunction)
a&b=    (conjunction)
a^b=    (excluding or)
a?b=    (comparing, returns -1,0,1)
a+b*c=  (multiplication with addition)
a-b*c=  (multiplication with subtraction)
a\b%c=  (power by module)
2Va=    (square root with remainder)
Each number may have suffix R, which provides for substitution
of random number with given sign and number of meaning bits.

One can insert blanks, tabulation and translations of lines
between operands and marks of operations. Number of expressions
is not restricted; all expressions are calculated independently
from each other.

The program recognizes the following options:
-idle   idle priority
-high   high priority
-par n  run up to n concurrent threads (actually 1 or 2)
-hex    hexadecimal input/output
-hexi   hexadecimal input
-hexo   hexadecimal output
-div0   allow division by 0
-rand   generate random numbers instead of computing
    all including degree of power and shift
      0 generate -8,-4,-2,-1,0,1,2,4,8 instead of 0
-exp n  append up to n trailing zeroes to random numbers
-dn  n  add/remove extra sign words to fit code to n words
-size   show high word and size of numbers
-time   show estimation of computing time in Athlon cycles
-mhz n  CPU rate for estimation of computing time in ms
-rep n  repeat operation n times
-v      write expression before the result
-check  check results of operations, if possible
-help   information on the program

Priority and multithreading options are supported under Windows
depending on compiler. If supported, program tells "Idle priority",
"High priority" and/or "Run concurrent thread for output".

After options there may be name of file with expressions.

   NOTE: To start random test, apply the following commands:
   Arifexp -rand >input
   Arifexp input >output

Program returns the following codes:
    0 - success
    1 - detected error in result of calculation (-check option)
  255 - detected error in either options or input data

Files of the program:

   Arif          - short test
   Arif.bat      - start of test
   Arif.res      - sample for comparison with result
   Arif.sh       - start of test under Linux
   Arif.wrk      - result

   Arif1         - short test of multiplication with accumulation
   Arif1.bat     - start of test
   Arif1.res     - sample for comparison with result
   Arif1.sh      - start of test under Linux
   Arif1.wrk     - result

   Arif2         - test of power by module
   Arif2.bat     - start of test
   Arif2.res     - sample for comparison with result
   Arif2.sh      - start of test under Linux
   Arif2.wrk     - result

   Arif3         - test of multiplication
   Arif3.bat     - start of test
   Arif3.res     - sample for comparison with result
   Arif3.sh      - start of test under Linux
   Arif3.wrk     - result

   Arif4         - test of division with the rest
   Arif4.bat     - start of test
   Arif4.res     - sample for comparison with result
   Arif4.sh      - start of test under Linux
   Arif4.wrk     - result

   Arif5         - test of square root with the remainder
   Arif5.bat     - start of test
   Arif5.res     - sample for comparison with result
   Arif5.sh      - start of test under Linux
   Arif5.wrk     - result

   Arifr         - pattern of test
   Arifr.bat     - start of the random generator test
   Arifr.sh      - start of the random generator test under Linux
   Arifr.wrk     - last received result

   Arifrand      - pattern of cyclic test
   Arifrand.1    - test with short random numbers
   Arifrand.2    - test with large random numbers
   Arifrand.b1   - Arifrand.1 for last incorrect result
   Arifrand.b2   - Arifrand.2 for last incorrect result
   Arifrand.bad  - last incorrect result (look for NO MATCH)
   Arifrand.bat  - start of cyclic test under Windows
   Arifrand.res  - accumulated return codes for test cycles
   Arifrand.sh   - start of cyclic test under Linux
   Arifrand.wrk  - last result obtained

   Arifexp       - executable file of the program for Linux
   Arifexp.cpp   - source of the program
   Arifexp.dsp   - project file for Microsoft Visual C++ 6.0
   Arifexp.exe   - executable file of the program for Win32
   Arifexp.prj   - project file for Borland C++ 3.1
   Arifexp64     - executable file of the program for Linux x64
   Arifexp64.exe - executable file of the program for Win64
   Arifexp64x.exe- executable file of the program for Win64 BMI2

   Gettimer.c    - functions for measuring of performance
   Gettimer.h      and setting of priority

   Random3.c     - generator of random numbers
   Random3.h
   Random3.txt   - documentation (eng)
   Random3r.txt  - documentation (rus)

Tests Arif1-Arif5 estimate number of CPU cycles to complete
to test; if you need to get estimation in ms, set environment
variable MHZ by actual value of CPU rate, for example:
   SET MHZ=2000

NOTE: Tests automatically choice 32 or 64 bit programs depending
      on operation system, but do not select automatically
      programs for Win64 BMI2; in order to run the programs
      for Win64 BMI2 you must edit the tests manually.


2.1.2. Program for multiplication of square matrixes Matrix
-----------------------------------------------------------

Program for multiplication of square matrixes is implemented
thereby class cBigNumber and template of dynamic arrays exarray.

The program demonstrates modern technique of programming, at
which there is no necessity for a programmer to manage allocation
of memory, since the dynamic arrays have the unlimited sizes on
all dimensions. On a sight of the author, the idea of a unlimited
array successfully supplements the concept of unlimited numbers.

Source of the program contains macro definitions which allows
at compilation time to refuse from dynamic arrays for the benefit
of ordinary static memory allocation and also to change a format
of numbers.

Files of the program:

   10            - input data for multiplication of matrixes 10*10
   10.bat        - test of multiplication of matrixes 10*10
   10.res        - sample for comparison with result of multiplication
   10.sh         - test of multiplication of matrixes under Linux
   10.wrk        - result of multiplication

   100           - input data for multiplication of matrixes 100*100
   100.bat       - test of multiplication of matrixes 100*100
   100.res       - sample for comparison with result of multiplication
   100.sh        - test of multiplication of matrixes under Linux
   100.wrk       - result of multiplication

   Gettimer.c    - functions for measuring of performance
   Gettimer.h      and installation of a priority

   Matrix        - executable file of the program for Linux
   Matrix.cpp    - source of the program
   Matrix.dsp    - project file for Microsoft Visual C++ 6.0
   Matrix.exe    - executable file of the program for Win32
   Matrix.prj    - project file for Borland C++ 3.1
   Matrix.txt    - instruction on preparation of the input data (eng).
   Matrix64      - executable file of the program for Linux x64
   Matrix64.exe  - executable file of the program for Win64
   Matrix64x.exe - executable file of the program for Win64 BMI2
   Matrixr.txt   - instruction on preparation of the input data (rus).


2.1.3. Program for primality testing Miller
-------------------------------------------

Program Miller asks for a number and estimates its
primality using a technique stated on page
   http://www.utm.edu/research/primes/prove/prove3.html

The program recognizes the following options:
-idle   low priority
-high   high priority
-hex    hexadecimal input/output
-hexi   hexadecimal input
-hexo   hexadecimal output
-factor "brute force" full factor primality test (the slowest)
-proved fast SPRP and full factor primality test (much better)
-miller fast factor and full SPRP primality test (much faster)
-strong fast factor and fast SPRP primality test (the fastest, but probable)
-scan n number of increments by 2
After options there may be name of file with number to check.

If no test is selected program carry out full independent SPRP
and factorization tests. If the programs works correctly results
of these tests should not contradict to each other.

Program returns the following codes:
    0 - composite
    1 - prime by factoring
    2 - prime by fast SPRP
    3 - probable prime by Miller
    7 - probable prime by fast SPRP
   11 - prime by factoring, but has missed SPRP  (error)
   12 - prime by fast SPRP, but has factor       (error)
   13 - probable prime by Miller, but has factor (error)
  255 - detected error in either options or input data
If a row of numbers is tested with option -scan, maximal code
is returned.

Files of the program:

   Gettimer.c    - functions for measuring of performance
   Gettimer.h      and installation of a priority

   Mill.bat      - start of tests for numbers Miller.9 - Miller.1
   Mill.sh       - start of tests for numbers Miller under Linux
   Miller        - executable file of the program for Linux
   Miller.1      - large simple number:
                   tests do not reach the end, but helps
                   to estimate a degree of probability...
   Miller.2-9    - the numbers, which are smaller.
   Miller.cpp    - source of the program
   Miller.dsp    - project file for Microsoft Visual C++ 6.0
   Miller.exe    - executable file of the program for Win32
   Miller.prj    - project file for Borland C++ 3.1
   Miller64      - executable file of the program for Linux x64
   Miller64.exe  - executable file of the program for Win64
   Miller64.exe  - executable file of the program for Win64 BMI2

   Millrand      - pattern of cyclic test
   Millrand.1    - random number
   Millrand.2    - odd random number to test
   Millrand.b1   - Millrand.1 for last incorrect result
   Millrand.b2   - Millrand.2 for last incorrect result
   Millrand.bad  - last incorrect result (look for NO MATCH)
   Millrand.bat  - start of cyclic test under Windows
   Millrand.res  - accumulated return codes for test cycles
   Millrand.sh   - start of cyclic test under Linux
   Millrand.wrk  - last result obtained


3. Instructions
===============

To use class cBigNumber in your program do the following:

1.  Copy to the project folder the following files:
        Cbignum.cpp
        Cbignum.h
        Cbignumf.cpp
        Cbignumf.h
        Cbignumf.inl
        Cbignums.cpp (option for stream input-output)
        Cbignums.h   (option for stream input-output)
        Cbnl.h
        Cbnl.inl
        Cbnl64.inl
        Cios.h       (option for input-output)
        Ctty.cpp     (option for console output)
        Ctty.h
        Exarray.cpp
        Exarray.h
        Exdebug.h
        Exthread.h
        Prime.cpp    (option for check of numbers for primality)
    If add-on package is in use add to the folder the following file:
        Cbnl32.obj   (for 32 bit compiling mode, if the macro _CBNL_ML
                      is set in Cbnl.inl)
        Cbnl64.obj   (for 64 bit compiling mode, if the macro _CBNL_ML
                      is set in Cbnl64.inl)

2.  Include to the project the following files:
        Cbignum.cpp
        Cbignumf.cpp
        Cbignums.cpp (option for stream input-output)
        Ctty.cpp     (option for console output)
        Exarray.cpp
        Prime.cpp    (option for check of numbers for primality)
    If add-on package is in use add to the project the following file:
        Cbnl32.obj   (for 32 bit compiling mode, if the macro _CBNL_ML
                      is set in Cbnl.inl)
        Cbnl64.obj   (for 64 bit compiling mode, if the macro _CBNL_ML
                      is set in Cbnl64.inl)

    NOTE: Optional files for stream and console input-output use
          C++ stream library which actually exists in two variants:
          old (iostream.h) and standard (iostream). Old compilers
          may not include standard library, whereas new compilers
          produce obsolete warnings for old library or ever do not
          include it at all, starting from Microsoft Visual C++ 2005.
          If Microsoft compiler supports both libraries, you must
          select one of them for the entire project, because this
          libraries can not be mixed.

          The class uses standard library for the following compilers:
          - Microsoft Visual C++ .NET, 2003-2022 and higher,
          - GNU C++ 3.x and higher compatible,
          - Conformed to C++ 11 standard,
          otherwise it uses old library. To change library, add to
          the compiler options appropriate macro:
            _CIOS_STDSTREAM (use standard library)
            _CIOS_OLDSTREAM (use old library)

3.  Insert into the source text the following directives:
        #include "Cbignum.h"
        #include "Cbignums.h" (option for stream input-output)

When compiling in the release mode for maximal speed set macro NDEBUG
to turn off debug asserts and check of indexes. For more details ref.
Sections 4.6, 5.1. Macro NDEBUG is set by default at compilation under
IDE Visual C++ in Release mode. On command line compilation macro can
be set in options.

For example, to compile program Arifexp (Section 2.1.1) under Linux
you can use the following command lines:

Debug mode:
   g++ Arifexp.cpp Cbignum.cpp Cbignumf.cpp Cbignums.cpp
       Ctty.cpp Exarray.cpp Gettimer.c Random3.c -o Arifexp

Release mode:
   g++ -O5 -DNDEBUG Arifexp.cpp Cbignum.cpp Cbignumf.cpp Cbignums.cpp
       Ctty.cpp Exarray.cpp Gettimer.c Random3.c -o Arifexp

NOTE: Current version of class is adapted for compiling under
      Borland C++, Microsoft Visual C++ and GNU C++ compatible
      (version 3.3 or higher for multithreaded applications).
      If you use another compiler, for better performance it is
      recommended to define macro EXTHREAD_LOCAL as described below.

      For compiling of single-treaded application macro EXTHREAD_LOCAL
      should contain either no value or compiler-dependant prefix of
      thread local storage (commonly __thread).

      For compiling of multithreaded application macro EXTHREAD_LOCAL
      should contain compiler-dependant prefix of thread local storage
      (commonly __thread). Look for compiler documentation if it
      supports thread local storage. If your compiler does not support
      thread local storage, you MUST NOT define macro EXTHREAD_LOCAL
      to compile multithreaded application in compatibility mode.

      Note that program compiled in compatibility mode without macro
      EXTHREAD_LOCAL may work significantly slower on small numbers,
      for example Matrix in Section 2.1.2. Look also Section 4.2.4.


3.1. Using of unlimited numbers
-------------------------------

1.  Unlimited number is declared as:
        cBigNumber num

2.  Initial values to cBigNumber variables can be set as expressions
    and also as either numeric or string assigning constructors.

    Numeric assigning constructor is limited to signed long integer
    range, for example:
        cBigNumber num = 2147483647;
    For 64 bit compilers it is also possible assign constants i64
    if they are supported:
        cBigNumber num = 2147483648i64;
    For assignment of i64 type CBNL must be defined in the file
    Cbnl.h as 64 bit type - in the original file this is true for
    all compilers with 64 or more bit long type, for C++ 11
    compilers with 64 or more bit long long type and additionally
    for Visual C++ in 64 bit mode.

    As against, string assigning constructor has no limiting
    range. It requires to indicate radix in range 2..16 or 0;
    the latter means either decimal, hexadecimal or octal
    constant formatted by rules of C, for example:
        cBigNumber num ("0x80000000", 0):

    Expressions and constructors are suitable for definition
    of unlimited constants,for example:
        const cBigNumber big_const ("2147483648", 10);

3.  The class provides for all regular operations of language C++,
    including arithmetic, logic and bitwise operations, operations
    of comparison, shift operations, and also stream input-output
    with all of integer modifiers. The class supports combined
    operations on unlimited and signed integer numbers. Therefore
    rules for processing of unlimited numbers basically coincide
    with rules of built-in signed integer arithmetic.

* Ref. Section 5.2 for essential peculiarities, including operations
  with unsigned and floating-point numbers.


3.2. Additional operations, functions and methods
-------------------------------------------------

Operations with result in stack.
   cBigNumber (s,radix) Conversion of char string s to unlimited number
                        for radix 2..16. Radix 0 means either decimal,
                        hexadecimal or octal constant formatted by rules
                        of C, that us hexadecimal after prefix 0x/0X,
                        octal after prefix 0 and decimal otherwise.
   cBigAbs    (a)       Absolute value of a.
   cBigUnsign (a)       Unsigned value of a (sign bits become meaning bits).
   cBigPow    (a,b)     Power a to b.
   cBigPowMod (a,b,mod) Power a to b by module mod.
   cBigSqrt   (a)       Integer part of square root of a.
   cBigBits   (a)       Number of meaning bits in binary complement code of a.
        Meaning bits are most senior bit, which is distinct from sign bit
        and all younger bits. For example, 0 and -1 contain no meaning bits,
        1 and -2 contain 1 meaning bit, 127 and -128 contain 7 meaning bits.
        Absolute value of number containing n meaning bits is not greater
        than 2 in power n.
   cBigExBits (a)        Number of meaning low 0-bits in binary code of a.
        Low 0-bits are all junior meaning zero bits.
        For example, 0 and odd numbers contain no low 0-bits,
        2 and -2 contain 1 low 0-bit, 128 and -128 contain 7 low 0-bits.
        Absolute value of number containing n low 0-bits is not less
        than 2 in power n.
   cBigRandom (rand,n)  Random number of no more than n meaning bits.
        Result is obtained thereby use of external random generator rand
        getting no parameters and returning unsigned long integers with
        uniform distribution within the range 0..ULONG_MAX. Please,
        initialize this random generator properly before using.
        Ref. file Random3.txt for an example of generator.

Operations with result in stack and remainder in 1st argument.
   cBigDivMod (a,b)     Division a/b with module stored in a.
   cBigSqrtRm (a)       Square root of a with remainder stored in a.

Console streams and methods.
   cBigNumberMessages   Stream for console messages.
   cBigNumberProgress   Stream for console progress indicators.
        Here streams are objects of class cTTY used by various
        methods and functions to provide for their output. Stream
        accept insertion operator << for chars, strings and signed
        numbers, including unlimited numbers cBigNumber. Additional
        methods for format output are described in file Ctty.txt.

   ATTENTION:   By default, console output is turned off. To view
                it, include to project file Ctty.cpp and assign
                to streams cTTY handlers:
                        cBigNumberMessages = cTTY_StandardOutput;
                        cBigNumberProgress = cTTY_ProgressOutput;

   NOTE: As these streams are global, assign them handlers
         in main thread of program only.

   a.dump     ()        Full dump of hexadecimal internal code
                        in order from high to low words.
   a.info     ()        Short dump of hexadecimal internal code
                        containing the high word followed by
                        total number of words.
   a.erange   ()        Show message "cBigNumber value out of range"
                        and short dump then call abort().

   NOTE: These methods yields no output if handler
         cBigNumberMessages is not set.

Machine-dependent access to number.
   a.length   ()        Number of CBNL words in the internal code.
   a.code     ()        Array of CBNL words containing the internal
                        code, most young word have index 0.
        Method code() returns const pointer, which is valid until no
        arithmetic or memory allocation operations affect the number.
        Convert it to CBNL* if you need for write access to the code.

        To change length of number, write it to element with index -1.
        Number of words can be decreased down to 1 or increased within
        memory allocated for a number (ref. Section 3.3). For better
        performance the number should be normalized, that is it must
        contain minimal number of words. The most convenient way to
        normalize number is to call method fit(). Ref. Section 5.2
        for an example of code.

        NOTE: Number of value 0 may contain either 1 word of value 0
              or 0 words.

   a.fit      ()        Normalization.
        Normalization consists in deleting of redundant senior sign
        words. It is not required for numbers generated exclusively
        by methods of class, because they are normalized.
        Non normalized numbers can be correctly processed by
        all methods of class, but normalization or result thus
        obtained generally is not guaranteed.

        NOTE: 0 has two normalized forms - standard of 1 word
              and compact of 0 words (compact form allows do
              no allocate dynamic memory for 0 numbers),
              Method fit() converts compact 0 to standard 0,

   a.loword   ()        Most young CBNL word of the internal code.
   a.hiword   ()        Most senior CBNL word of the internal code.
        Both methods work for number 0 containing 0 words returning 0.
        Method hiword() returns senior word of code taking no account
        for normalization of number, that is if the internal code
        contains redundant high either 0 or ~0 words, the method
        returns accordingly either 0 or ~0.
   a.words    ()        Number of meaning words in the internal code.
        Meaning words are most senior word, which does not contain
        only sign and sign extension bits and all younger words.
   a.exwords  ()        Number of meaning low 0-words in the internal code.
        Low 0-words are all junior meaning zero words.
        If all words are 0, there are no low 0-words.

Information on number with check of range.
   a.bits     ()        Number of meaning bits in binary complement code.
   a.exbits   ()        Number of meaning low 0-bits in binary code.
        Methods returns CBNL number, if number of bits > CBNL_MAX
        diagnostic is typed and abort() is called.

Transformation of number with check of range.
   a.toCBNL   ()        Transformation to CBNL integer.
   a.tolong   ()        Transformation to long integer.
   a.toint    ()        Transformation to integer.
   a.toshort  ()        Transformation to short integer.
        If number does no fit diagnostic is typed and abort() is called.

Transformation of number to string.
   a.toa (str)          Formatting of decimal representation of number a.
   a.toa (str,radix)    Formatting of char representation of number a
                        for radix 2..16.
   Here str is object of type cBigString that can be automatically
   converted to const char*, for example:
        cBigString str;
        a.toa (str);
        cout << str;
   Method toa() returns char* to content of cBigString that is consistent
   until cBigString is not changed, for example:
        cBigString str;
        cout << a.toa (str);

Excluded non-reenterable method (do not use).
   a.toatmp   ()        Formatting of decimal representation of number a
                        in the static buffer (*).
   a.toatmp   (radix)   Formatting of char representation of number a
                        in the static buffer (*) for radix 2..16.
(*) The content of static buffer is replaced by each call of toatmp().

   NOTE: This method are not available if macro _CBIGNUM_MT is set in
         file Cbignum.h and macro _CBIGNUM_TOATMP is not set that is
         the case by default beginning from version 1.2c. Use
         reenterable method toa() instead.


3.3. Function and methods for manual optimization of programs
-------------------------------------------------------------

Comparison.
   a.comp     (b)       0 if a == b; -1 if a < b; 1 if a > b.
   a.comp     ()        0 if a == 0; -1 if a < 0; 1 if a > 0.

Operations with accumulation of result.
   a.neg      ()        Inversion of sign.
   a.abs      ()        Absolute value.
   a.unsign   ()        Unsigned value.
   a.add      (b)       Fast addition    (*).
   a.sub      (b)       Fast subtraction (*).
   a.mul2     ()        Multiplication by 2.
   a.div2     ()        Division by 2 (integer part).
   a.pow2     ()        Square.
   a.pow      (b)       Power.
   a.powmod   (b,mod)   Power by module mod.
   a.sqrt     ()        Integer part of square root.
(*) Optimized for operand containing 3 or more words and size of
    accumulator not less than size of operand. Same as += and -=.

Combined operations with accumulation of result.
   c.addmul   (a,b)     Multiplication with accumulation c += a * b.
   c.submul   (a,b)     Multiplication with accumulation c -= a * b.

Operations with assigning of result to variable c.
   c.set      (a)       Copying.
   c.set      (s)       Conversion of C-formatted decimal string s.
   c.set      (s,radix) Conversion of char string s to unlimited number
                        for radix 2..16. Radix 0 means either decimal,
                        hexadecimal or octal constant formatted by rules
                        of C, that us hexadecimal after prefix 0x/0X,
                        octal after prefix 0 and decimal otherwise.
   c.setneg   (a)       Inversion of sign.
   c.setabs   (a)       Absolute value.
   c.setunsign(a)       Unsigned value.
   c.setcompl (a)       Bit-by-bit inversion.
   c.setxor   (a,b)     Bit-by-bit exclusive or.
   c.setand   (a,b)     Bit-by-bit conjunction.
   c.setor    (a,b)     Bit-by-bit disjunction.
   c.setadd   (a,b)     Addition.
   c.setsub   (a,b)     Subtraction.
   c.setmul   (a,b)     Multiplication.
   c.setdiv   (a,b)     Division (integer part).
   c.setmod   (a,b)     Module (remainder of division).
   c.setshl   (a,b)     Left shift.
   c.setshr   (a,b)     Right shift.
   c.setpow   (a,b)     Power.
   c.setpowmod(a,b,mod) Power by module mod.
   c.setsqrt  (a)       Integer part of square root.
   c.setbits  (a)       Number of meaning bits in binary complement code.
   c.setexbits(a)       Number of meaning low 0-bits bits in binary code.
   c.setrandom(rand,n)  Random number of no more than n meaning bits.

Combined operations storing the result to variables c and a.
   c.setdivmod(a,b)     Division with module stored in a.
   c.setsqrtrm(a)       Square root with remainder stored in a.

   NOTE: Variables a and c must not overlap.

Combined operations on preliminary prepared operands.
   a.tab      ()        Normalization and preparing of table of shifts.
   a.smp      ()        Normalization and preparing of table of shifts,
                        if hardware multiplication is not in effect.

   NOTE: Table of shifts does not prevent from conventional use of
         the number. Table of shifts remains be consistent until
         the number accepts result of some operation.

   c.addmultab    (a,b) Multiplication with accumulation c += a * b.
   c.submultab    (a,b) Multiplication with accumulation c -= a * b.
                        Multiplicand a must contain table of shifts.
                        Multiplier b must be non-negative.
                        Operands must not overlap with buffer of result c.

   NOTE: Methods addmultab() and submultab() do not use block
         multiplication and fast hardware multiplication. These methods
         are outdated, it is recommended to replace them by methods
         addmulsmp() and submulsmp().

   c.addmulsmp    (a,b) Multiplication with accumulation c += a * b.
   c.submulsmp    (a,b) Multiplication with accumulation c -= a * b.
                        Multiplicand a must contain table of shifts
                        that is be prepared by either smp() or tab().
                        Multiplier b must be non-negative.
                        Operands must not overlap with buffer of result c.

   NOTE: Methods addmulsmp() and submulsmp() use extra fast hardware
         multiplication if it is in effect and table of shifts if hardware
         multiplication is not available. In both cases these methods do
         not use block multiplication targeted to large numbers and so
         have less overhead for small numbers containing no more than
         3000-6000 bits. Providing that at least one operand is small,
         they can be several percents faster than addmul() and submul().

   a.divtab         (b) Division a /= b (integer part).
   a.modtab         (b) Module   a %= b.
                        Operands must have identical signs.
                        Divider b must contain table of shifts and
                        must not overlap with dividend a.

   c.setdivtab    (a,b) Division c = a / b (integer part).
   c.setmodtab    (a,b) Module   c = a % b.
                        Operands must have identical signs.
                        Divider b must contain table of shifts and
                        must not overlap with buffer of result c.

   c.setdivmodtab (a,b) Division with module c = a / b, a %= b.
                        Operands must have identical signs, must not
                        overlap with each other and buffer of result c.
                        Divider b must contain table of shifts.

   NOTE: These division methods have less overhead than general methods
         setdiv(), setmod() and setdivmod() if divider contains more
         than _CBIGNUM_SMALL_DIV words, but for smaller dividers they
         are much less effective.

Machine-dependent operations (depends on size of word).
   c.set       (a,n)    Copying c = a shifted left by n words.
   c.setr      (a,n)    Copying c = a shifted right by n words.
   c.add       (a,n)    Fast addition    c += a shifted left by n words.
   c.sub       (a,n)    Fast subtraction c -= a shifted left by n words.

   c.addmultab (a,b,n)  Multiplication c += a * b shifted left by n words.
   c.submultab (a,b,n)  Multiplication c -= a * b shifted left by n words.
                        Multiplicand a must contain table of shifts.
                        Multiplier b must be non-negative.
                        Operands must not overlap with buffer of result c.

   NOTE: Ref. details above in description of combined operations.

   c.addmulsmp (a,b,n)  Multiplication c += a * b shifted left by n words.
   c.submulsmp (a,b,n)  Multiplication c -= a * b shifted left by n words.
                        Multiplicand a must contain table of shifts
                        that is be prepared by either smp() or tab().
                        Multiplier b must be non-negative.
                        Operands must not overlap with buffer of result c.

   NOTE: Ref. details above in description of combined operations.

   a.divtab       (b,n) Division a /= b after b shifted left by n words.
   a.modtab       (b,n) Module   b %= b after b shifted left by n words.
                        Operands must have identical signs.
                        Divider b must contain table of shifts and
                        must not overlap with dividend a.

   c.setdivtab  (a,b,n) Division c = a / b after b shifted left by n words.
   c.setmodtab  (a,b,n) Module   c = a % b after b shifted left by n words.
                        Operands must have identical signs.
                        Divider b must contain table of shifts and
                        must not overlap with buffer of result c.

   c.setdivmodtab       Division with module c = a / b, a %= b
                (a,b,n) after b shifted left by n words.
                        Operands must have identical signs, must not
                        overlap with each other and buffer of result c.
                        Divider b must contain table of shifts.

   NOTE: Ref. details above in description of combined operations.

Memory allocation.
   a.expand    (n)      Increase number of available words
                        to some optimal value the range n..2n.

   NOTE: This operation does not change number of words in the
         internal code, it only changes number of allocated words.
         The operation does not decreases number of allocated words
         if it is greater than n, it only can increase it.

Optimization of use of memory.
   a.gc        ()       Release superfluous memory.
   a.pack      ()       Release all memory over minimally necessary.
   a.clear     ()       Set number to 0 and release memory.

   NOTE: These operations delete the table of shifts, for other
         technical details ref. Section 4.3.

Control of division.
   cBigNumber::maskdiv0 (0)     Forbid division by 0 (default).
   cBigNumber::maskdiv0 (1)     Permit division by 0 assuming that
                                quotient = 0 and module = dividend.
                                Also permit power by module 0, which
                                will return 1.
   cBigNumber::testdiv0 ()      Have been occurred division by 0?

   NOTE: As this methods are static, use them in main thread of program.

Excluded non-reenterable methods (do not use).
   cBigNumber::lastdivmod ()    Last module generated by methods
                                /, /=, setdiv, setdivtab.
   cBigNumber::lastrootrm ()    Last remainder generated by methods
                                cBigSqrt, sqrt, setsqrt.

   NOTE: These static methods are not available if macro _CBIGNUM_MT
         is set in file Cbignum.h that is the case by default beginning
         from version 1.2c. Use reenterable operations cBigDivMod() and
         cBigSqrtRm() or methods setdivmod() and setsqrtrm() instead.


3.4. Testing numbers for primality and factoring
------------------------------------------------

Basic test.

   b_SPRP         (a,b) Strong Probable Primality Test for base b.
                        Result: 0 = composite,
                                2 = strong probable prime for base b.

   NOTE: Strong probable primes for base 2 are also jokingly called
         "industrial grade primes" because the are approximately
         99.9999% prime. This is a joke, no more. It is recommended
         to combine SPRP tests with Small Factor Primality Test.

Simple tests (prime.cpp).

   SPRP           (a)   Strong Probable Primality Test.
   SPRP           (a,b)         b = either missed or first untested SPRP base.
                        Result: 0 = composite (missed SPRP base found),
                                2 = prime by fast SPRP for initial bases,
                                3 = probable prime by Miller, i.e. prime if
                                    Riemann generalized gypothesis is true.

   FastSPRP       (a)   Fast Strong Probable Primality Test for initial bases.
   FastSPRP       (a,b)         b = either missed or first untested SPRP base.
                        Result: 0 = composite (missed SPRP base found),
                                2 = prime by fast SPRP for initial bases,
                                7 = probable prime, but may be not prime.

   LastSPRP       (a)   Miller Strong Probable Primality Test for upper bases.
   LastSPRP       (a,b)         b = either missed or first untested SPRP base.
                        Result: 0 = composite (missed SPRP base found),
                                3 = probable prime by Miller, i.e. prime if
                                    Riemann generalized gypothesis is true,
                                    providing that FastSPRP() is also passed.

   NOTE: Current implementation prove prime for a < 341,550,071,728,321
         thereby test SPRP 2..17 and probable prime for greater a
         thereby test SPRP 19..2297 and more depending of value of a.
         Bases up to 1,373,639 are prime, larger are 2-3-SPRP prime.

   TestFactor     (a)   Test square, then other possible factors.
   TestFactor     (a,b)         b = either found or first untested factor.
                        Result: 0 = composite (factor found),
                                1 = prime.

   TestSmallFactor(a)   Test square, then small prime factors.
   TestSmallFactor(a,b)         b = either found or first untested factor.
                        Result: 0 = composite (factor found),
                                1 = prime,
                               10 = no small factor found.

   TestLargeFactor(a)   Test square, then large factors from Large table.
   TestLargeFactor(a,b)         b = either found or first untested factor.
                        Result: 0 = composite (factor found),
                                1 = no large factor found.

   NOTE: Current implementation use table of small prime factors
         in the range 2..7919 for a <= 62,837,329, and wheel
         factorization table for module 210 for greater a.
         Square factor and factors above 7927 may not be prime.

Combined tests, in order from fastest to slowest.

   IsStrongPrime  (a)   Fast factor and fast SPRP primality test.
                        Result: 0 = composite,
                                1 = prime by factoring,
                                2 = prime by fast SPRP for initial bases,
                                7 = probable prime, but may be not prime,

   IsMillerPrime  (a)   Fast factor and full SPRP primality test.
                        Result: 0 = composite,
                                1 = prime by factoring,
                                2 = prime by fast SPRP for initial bases,
                                3 = probable prime by Miller, i.e. prime if
                                    Riemann generalized gypothesis is true,

   IsProvedPrime  (a)   Fast SPRP and full factor primality test.
                        Result: 0 = composite,
                                1 = prime by factoring,
                                2 = prime by fast SPRP for initial bases.

   IsPrime        (a)   Full SPRP and factor double-checked primality test.
                        Result: 0 = composite (factor found),
                                1 = prime by both SPRP and factoring,
                         error 11 = prime by factoring, but has missed SPRP,
                         error 12 = prime by fast SPRP, but has factor,
                         error 13 = probable prime by Miller, but has factor.


4. Guidelines
=============

The initial purpose of the class and explanatory slip are cited
in the Appendix 1.


4.1. Possible areas of application
----------------------------------

The class was developed under a principles of maximal simplification
of its external interface according to the standard of C++ and also
maintenance of compatibility with various hardware and software
platforms. The class is characterized by a low overhead of internal
memory allocation routines and call of computing methods, and also
absence of any restrictions related with size of numbers.

The class can be recommended if determining criterion consists in
minimization of time for development of a program. Performance of
class methods and consumption of memory will be close to optimum
for large numbers containing from 500 to 100,000 bits and machine
32/64 bit numbers.


4.2. Performance
----------------

The best results on performance are achieved on 64 bit compilers
under 64 bit version of Windows. 64 bit code works up to 5-10
times faster than 32 bit code.

To obtain best performance for 32/64 bit compilers under Windows,
use assembler optimization for x386/x64, which is available as
separate package.

Assembler code for Visual C++ uses loop unrolling for additive
operations and multiplication, hardware multiplication to double
word and hardware division of double word. It is especially
effective for Intel Core and AMD processors, where it yields
multiple growth of performance on most operations on unlimited
numbers: accumulation +=, -=, <<=, >>=, multiplication, division,
module, power, power by module and square root. Effect of
optimization is great if either addend, subtracter, one of
multipliers, divider, result of power or base of square root
consist of 3 or more machine words. Optimization effect for
outdated Intel Pentium 4 is relatively low.

In the following tables assembler optimized programs Arifexp
and Arifexp64, included into this distribution are compared
with:
- C++ portable versions of Arifexp, which do not use
  assembler optimization;
- Special version of Arifexp compiled with NTL library,
  which uses 53 bit IEEE 754 floating point computations
  (http://www.shoup.net/ntl, version 5.4);
- Results of 64 bit program Arifexp64 are also compared with
  results of the fastest program Arifexp64x with MULX compiled
  for processors supporting AVX2 instructions - this results
  contain in columns asm64 and a64 after / when different.

All programs of version 2.2 are compiled under Microsoft
Visual C++ 2015 Community for generic x86 and x64 processors
and operation systems of generation Windows XP and after,
look for theirs results in columns hardmul + div of tests
for division and power by module.

Other results were obtained for the previous versions of
the class:

32 bit assembler code is compiled under Microsoft Visual C++
6.0 for version 1.2b of the class, which performance on
multiplication and square root is similar to 32 bit
performance of subsequent versions of the class.

32 bit C++ code is compiled under Microsoft Visual C++ 6.0
for version 1.2b of the class without hardware multiplication
(softmul) and on Microsoft Visual C++ 2010 Express for
version 2.1c, which performance is greatly improved due
to use of hardware multiplication (hardmul).

64 bit code, both assembler and C++ is compiled under Microsoft
Visual C++ 2010 Express with SDK 7.1 x64. For assembler code we
use version 2.0 of the class and for C++ code - version 2.0
without hardware multiplication (softmul) and version 2.1
with fast hardware multiplication (hardmul).

NOTE: Unlike test programs for old version of the class, all
      programs for Windows in the distribution of the class
      are compiled under Microsoft Visual C++ 2015 with
      assembler optimization. Using of more recent compiler
      do not affect significantly results of tests Arif1-5
      until the new ones in columns hardmul + div.

                    Time of computing in milliseconds
                                                          v2.1c
                    Test: Arif1 (+*)         softmul    hardmul
CPU                 MHz asm32 asm64   NTL   C32   C64  C32  C64
---------------------------------------------------------------
ARM S3C2440A        400                     130      |
ARM920T PXA312      624                      80      |
Atom N270          1600     2           6    22      |   8
Pentium III/933     933     3          12    30      |   9
Pentium 4C/2400    2400     3           4    16      |   5
Athlon 900          900     2           8    28      |   7
Athlon XP 2500+    1826     1           4    14      |
Athlon 64 X2 3600+ 1900     1     0     4    11     8|   3    1
Athlon 64 X2 3800+ 2000     1     0     3    10     8|   3    1
Athlon 64 X2 4600+ 2400     0           3     9      |
Phenom II X3 710   2600     0     0     2     8     6|
Phenom II X6 1055T 2800     0     0     2     7     5|   1    0
Phenom II X6 1090T 3200     0     0     2     6     5|   2    0
FX-8150            3600     1     0     1     8     5|   2    1
FX-8320            3500     0     0     1     8     5|   2    0
Core Duo T2500     2000     2           4    13      |
Core 2 Duo E6420   2130     1           4    11      |
Core 2 Quad Q8200  2330     1     0     4    10     6|   3    1
Core i7-950        3200     1     0     1     6     4|   2    0
Core i7-6800K      3600     0     0     1     5     3|   1    0
Xeon E3-1230       3200     1     0     1     7     6|
Xeon E3-1240v3     3400     0     0     1     5     3|   1    0
Athlon 200GE       3200     0     0     1     6     4|   1    0
Ryzen 5 2600       3400     0     0     1     6     4|   1    0
                    Accuracy of measurement ~1 ms
                                                          v2.1c        v2.2
                    Test: Arif2 (powmod)     softmul    hardmul  hardmul + div
CPU                 MHz asm32 asm64   NTL   C32   C64  C32  C64 C32 C64 a32 a64
--------------------------------------------------------------------------------
ARM S3C2440A        400                   62000      |         |
ARM920T PXA312      624                   40000      |         |
Atom N270          1600  2235        1859  5969      |5516     |
Pentium III/933     933  2906        3494  8578      |6750     |
Pentium 4C/2400    2400  2343         906  4297      |3922     |
Athlon 900          900  1892        2123  8051      |5477     |
Athlon XP 2500+    1826   906        1047  4094      |         |
Athlon 64 X2 3600+ 1900   890   562  1031  3406  1888|2512 1373|
Athlon 64 X2 3800+ 2000   813   500   969  3172  1797|2344 1297|
Athlon 64 X2 4600+ 2400   672         828  2640      |         |
Phenom II X3 710   2600   603   344   834  2380  1188|         |
Phenom II X6 1055T 2800   563   313   703  2203  1094|1610  750|409 108 108  29
Phenom II X6 1090T 3200   484   281   625  1938   953|1406  656|358  95  95  25
FX-8150            3600   477   290   463  2289  1100|1575  767|420 106  95  30
FX-8320            3500   453   266   287  2350  1047|1578  719|419  94  91  25
Core Duo T2500     2000  1437         844  3765      |         |
Core 2 Duo E6420   2130  1234         735  2937      |         |
Core 2 Quad Q8200  2330  1125   578   656  2688  1329|1891  922|515 142 358 109
Core i7-950        3200   848   443   307  1686   844|1210  581|346  95 293  84
Core i7-6800K      3600   285   177   289  1342   724|1026  499|
Xeon E3-1230       3200   561   374   453  1747   873|         |
Xeon E3-1240v3     3400   297   172   286  1281   703|1000  485|281  72 104  29
Athlon 200GE       3200   360   188   250  1516   829|1188  563|            /22
Ryzen 5 2600       3400   343   172   187  1422   797|1109  547|310  80  85  23
                      Accuracy of measurement ~10 ms             ~1 ms      /12

NOTE: Program Arifexp64x enhances result of test Arif2 in column a64
      ~33% faster for Intel Haswell (Core i7-6800K and Xeon E3-1240v3)
      ~85% faster for AMD Zen (Athlon 200GE) and Zen+ (Ryzen 5 2600)

                    Test: Arif3 (*)          softmul    hardmul
CPU                 MHz asm32 asm64   NTL   C32   C64  C32  C64
---------------------------------------------------------------
ARM S3C2440A        400                    2200      |
ARM920T PXA312      624                    1400      |
Atom N270          1600    33         109   366      | 130
Pentium III/933     933    55         214   494      | 161
Pentium 4C/2400    2400    45          64   245      |  77
Athlon 900          900    29         135   469      | 125
Athlon XP 2500+    1826    14          66   230      |
Athlon 64 X2 3600+ 1900    12     3    61   189   133|  56   14
Athlon 64 X2 3800+ 2000    11     3    56   175   127|  53   14
Athlon 64 X2 4600+ 2400     9          48   147      |
Phenom II X3 710   2600     8     2    44   133    95|
Phenom II X6 1055T 2800     7     2    42   122    89|  38    9
Phenom II X6 1090T 3200     6     2    36   106    78|  33    8
FX-8150            3600     7     2    27   137    89|  37    9
FX-8320            3500     6     2    23   131    81|  31    8
Core Duo T2500     2000    28          70   217      |
Core 2 Duo E6420   2130    23          70   175      |
Core 2 Quad Q8200  2330    21     8    64   161   105|  42   11
Core i7-950        3200    17     6    21   104    67|  27    7
Core i7-6800K      3600     6    2/1   15    87    57|  19    5
Xeon E3-1230       3200    10     4    23   103    81|
Xeon E3-1240v3     3400     6    2/1   16    88    55|  19    5
Athlon 200GE       3200     5    1/1   19   105    67|  24    6
Ryzen 5 2600       3400     5    1/1   13    98    63|  22    5
                    Accuracy of measurement ~1 ms

NOTE: Program Arifexp64x enhances result of test Arif3 in column asm64:
      ~25% faster for Intel Haswell (Core i7-6800K and Xeon E3-1240v3)
      ~40% faster for AMD Zen (Athlon 200GE) and Zen+ (Ryzen 5 2600)
                                                                        v2.2
                    Test: Arif4 (/)                               hardmul + div
CPU                 MHz asm32 asm64   NTL   C32   C64            C32 C64 a32 a64
--------------------------------------------------------------------------------
ARM S3C2440A        400                   17000                 |
ARM920T PXA312      624                    9000                 |
Atom N270          1600   570         561  1242                 |
Pentium III/933     933   970        1056  2005                 |
Pentium 4C/2400    2400   611         278   981                 |
Athlon 900          900   543         705  1729                 |
Athlon XP 2500+    1826   263         345   900                 |
Athlon 64 X2 3600+ 1900   264   153   305   758   374           |
Athlon 64 X2 3800+ 2000   236   141   298   697   358           |
Athlon 64 X2 4600+ 2400   195         256   581                 |
Phenom II X3 710   2600   161    91   233   479   211           |
Phenom II X6 1055T 2800   149    84   216   442   197           | 89  19  24   5
Phenom II X6 1090T 3200   128    74   189   386   170           | 78  16  21   4
FX-8150            3600   124    79   142   479   206           | 90  22  22   6
FX-8320            3500   122    75    83   478   203           | 90  20  21   6
Core Duo T2500     2000   359         241   786                 |
Core 2 Duo E6420   2130   313         205   606                 |
Core 2 Quad Q8200  2330   286   149   186   553   242           |114  27  79  20
Core i7-950        3200   218   121    89   347   155           | 76  17  64  16
Core i7-6800K      3600    70    48    88   262   126           |
Xeon E3-1230       3200   111    78    97   323   150           |
Xeon E3-1240v3     3400    72    45    88   252   128           | 63  14  23 6/4
Athlon 200GE       3200    91    47    55   296   152           |
Ryzen 5 2600       3400    86    44    52   277   142           | 69  15  19 4/2
                    Accuracy of measurement ~10 ms                ~1 ms

NOTE: Program Arifexp64x enhances result of test Arif4 in column a64:
      ~25% faster for Intel Haswell (Core i7-6800K and Xeon E3-1240v3)
       ~2x faster for AMD Zen (Athlon 200GE) and Zen+ (Ryzen 5 2600)

                    Test: Arif5 (sqrt)
CPU                 MHz asm32 asm64   NTL   C32   C64
-----------------------------------------------------
ARM S3C2440A        400                   32000
ARM920T PXA312      624                   12000
Atom N270          1600  2922       23562  3125
Pentium III/933     933  2360       44453  4453
Pentium 4C/2400    2400  1531       11312  1891
Athlon 900          900  1312       27380  4376
Athlon XP 2500+    1826   656       13391  2141
Athlon 64 X2 3600+ 1900   484   297 13088  1594   780
Athlon 64 X2 3800+ 2000   453   234 12046  1469   750
Athlon 64 X2 4600+ 2400   391       10031  1234
Phenom II X3 710   2600   364   203 10278  1140   563
Phenom II X6 1055T 2800   344   172  9031  1063   531
Phenom II X6 1090T 3200   297   156  7875   921   453
FX-8150            3600   396   321  5898  1094   512
FX-8320            3500   344   219  3474  1002   515
Core Duo T2500     2000  1062        9826  1844
Core 2 Duo E6420   2130  1141        8548  1422
Core 2 Quad Q8200  2330  1047   469  7767  1297   594
Core i7-950        3200   745   390  3658   960   408
Core i7-6800K      3600   331   162  3610   624   295
Xeon E3-1230       3200   530   328  3978   749   406
Xeon E3-1240v3     3400   328   156  3656   562   281
Athlon 200GE       3200   266   131  2283   656   344
Ryzen 5 2600       3400   250   125  2156   609   313
                    Accuracy of measurement ~10 ms

Test programs for Windows are compiled in Release mode using
"Maximize Speed" optimization settings. NTL library is compiled
similarly with macro NTL_STD_CXX disabled. Test code is
single-threaded, so it does not use effect of multiple cores.

Testing on processors Intel Xeon, AMD Phenom, FX and Ryzen has
been performed with disabled technologies Intel Turbo Boost,
AMD Turbo Core and Core Performance Boost, that is, without
increasing of CPU frequency in single-threading mode. Energy
saving mode of operation system has been set to Max Performance.

32 bit ARM code is compiled under Pocket GCC 3.3.3 -O5 -DNDEBUG.

The test programs are available from:
http://www.imach.uran.ru/cbignum/arifrun.htm

Programs for Linux have not been included in this test, because
their results must be close to results of test programs for
Windows with portable C++ code. Assembler add-on package has
limited support for Linux compilers because of differences in
assembler support. By now, it implements 32/64/128 bit hardware
MUL and DIV instructions.

4.2.1. Optimal length of number

In 32 bit mode all operations except square root can will be
carried out for numbers containing up to 12,800,000,000 bits if
enough amount of RAM memory is available for storing of input,
output and (for some operations) internal numbers. Limit for
square root is CBNL_MAX = 2,147,483,647 bits in 32 bit mode.
Also, degree of shift must fit into the range -CBNL_MAX..CBNL_MAX.

The operations of addition, subtraction, multiplication, power
in a power, shift, input-output, and also bit-by-bit and logic
operations are optimized for operands of unlimited size. The
overhead charge for multiplication is minimized for a case,
when size even of one of operands is not less than 500 bits.
For optimal performance of other operations size of result
must be greater than 200-500 bits.

The operations of division, module and power by module are
optimized for single-word, double-word and multi-word dividers
of size from ~500 up to ~16,000,000 bits.

For new algorithm of division the best parameters are provided
if size of divider is no more than half of processor cache,
that is ~2,000,000 bits for cache of 512 Kbytes.
For processors with larger cache optimal size of divider will
be proportionally larger (i.e. for Intel Xeon with
8 Mbytes cache the limit will be ~32,000,000 bits).

For old algorithm of division:

In 32 bit mode the best parameters are provided if size of divider
is no more than 1/32 of size of on-chip processor cache, that is
is ~120,000 bits for cache of 512 Kbytes.

In 64 bit mode the best parameters are provided if size of divider
is no more than 1/64 of processor cache, that is ~60,000 bits for
cache of 512 Kbytes.

The operation of square root is optimized for operands of size
no greater than twice of on-chip processor cache.


4.2.2. Estimation of performance
--------------------------------

Performance estimation is given for case when x386/x64 assembler
optimization is in use under processors AMD Phenom/Athlon/Sempron,
which in this task provides for better performance, than old Intel
Pentium and Core processors.

These estimations and also corrected estimations for portable
C++ programs are used in the program Arifexp to calculate number
of cycles and time, required for operations by option -time.

For processors of other types recalculation is required, based
on tables in this Section.

NOTE: Assembler add-on package is not included into the public
      distribution of class.

Estimations for 32 bit mode
---------------------------

The operations of addition, subtraction, shift, and also bit-by-bit
and logic operations require from 1/2 up to 1/5 processor cycles
per bit of result, if size of result is more than 200 bits.
Optimized operations += and -= take about 1/15 cycle per bit
of result if size of result is more than 500 bits.

The number of processor cycles for multiplication of numbers with
at least one size from 1,000 bits and more can be roughly estimated
as m * n / 200, where m and n are quantity of bits allocated for
operands, rounded up to mod 32. Overhead is further reduced by
Karatsuba method: if both operands contain 1,600 or more bits
the estimation must be divided to 4/3 in power log2 (n / 2000),
where n is quantity of bits in shorter multiplier.

New algorithm:
The number of processor cycles for division with size of divider
from 2,000 bits to 1/2 of on-chip processor cache can be roughly
estimated as (m - n) * n / 200, where m and n are quantity of bits
allocated for dividend and divider.

Old algorithm:
The number of processor cycles for division with size of divider
from 2,000 bits to 1/32 of on-chip processor cache can be roughly
estimated as (m - n) * n / 30, where m and n are quantity of bits
allocated for dividend and divider.

The similar rough estimation for calculation of square root is
n * n / 120, where n is quantity of bits allocated for the number.

Estimation for input and conversion from string is n * n / 400
for numbers up to 100,000 bits. For larger numbers it must be
divided to 4/3 in power log2 (n / 20000).

Estimation for output and conversion to string is
n * n / 250 for new division algorithm and
n * n / 75  for old division algorithm.

Estimations for 64 bit mode
---------------------------

The operations of addition, subtraction, shift, and also bit-by-bit
and logic operations require from 1/4 up to 1/10 processor cycles
per bit of result, if size of result is more than 400 bits.
Optimized operations += and -= take about 1/25 cycle per bit
of result if size of result is more than 1000 bits.

The number of processor cycles for multiplication of numbers with
at least one size from 2,000 bits and more can be roughly estimated
as m * n / 800, where m and n are quantity of bits allocated for
operands, rounded up to mod 64. Overhead is further reduced by
Karatsuba method: if both operands contain 3,200 or more bits
the estimation must be divided to 4/3 in power log2 (n / 4000),
where n is quantity of bits in shorter multiplier.

New algorithm:
The number of processor cycles for division with size of divider
from 4,000 bits to 1/2 of on-chip processor cache can be roughly
estimated as (m - n) * n / 800, where m and n are quantity of bits
allocated for dividend and divider.

Old algorithm:
The number of processor cycles for division with size of divider
from 4,000 bits to 1/64 of on-chip processor cache can be roughly
estimated as (m - n) * n / 50, where m and n are quantity of bits
allocated for dividend and divider.

The similar rough estimation for calculation of square root is
n * n / 200, where n is quantity of bits allocated for the number.

Estimation for input and conversion from string is n * n / 1200
for numbers up to 200,000 bits. For larger numbers it must be
divided to 4/3 in power log2 (n / 40000).

Estimation for output and conversion to string is
n * n / 1000 for new division algorithm and
n * n / 125 for old division algorithm.

Other estimations
-----------------

Number of big numeric operations for calculation of power and
power by module is proportional to size of degree in bits.

Rough estimations for power by module for new division
algorithm are
n * m * m / 120 in 32 bit mode and
n * m * m / 400 in 64 bit mode, where
where n is quantity of bits allocated for degree and
      m is quantity of bits allocated for module.

Rough estimations for power by module for old division
algorithm are
n * m * m / 19 in 32 bit mode and
n * m * m / 32 in 64 bit mode,
where n is quantity of bits allocated for degree and
      m is quantity of bits allocated for module.

NOTE: Power is based on multiplications whereas power
      by module is based on multiplication-division
      pairs. The slowest operation is division, so it
      has the greatest impact on overall performance.
      Arifexp uses more accurate estimation for time of
      operation, based on sum of estimations for time
      of multiplication and division.

Algorithm for primality checking IsStrongPrime() proves primality
of numbers less than 341,550,071,728,321 for time less than 1 ms.
For greater numbers it detects probable prime with certainty
approximately 99,99999999%. Time of checking is proportional
n in power 3, when n is quantity of bits in a number.

More strict algorithm, IsMillerPrime() proves primality for
numbers of any size providing that Riemann generalized hypothesis
is true. It works approximately 100 times slower for closer
numbers and for numbers or greater size time is proportional
to n in power 5, when n is quantity of bits in a number.


4.2.3. Technical features of implementation
-------------------------------------------

For best optimization of programs it is necessary to take into
account the following differences between this implementation
of unlimited numbers and built-in long arithmetic of C ++:

1.  Assigning operations +=, -=, *=, /=, %= etc. are more effective
    (up to 20% on short numbers) than operations, which create
    temporary stack objects: +, -, *, /, % etc. For the same
    reason prefix increment and decrement work faster, than
    postfix variants of these operations.

2.  Compiler do not carries out automatic optimization of
    operations with unlimited numbers and numeric constants.
    For example, no automatic replacement of addition/subtraction
    of 1 to increment/decrement and multiplication/division by
    power of 2 to left/right shift is provided. Nevertheless,
    multiplication by power of 2 is implemented near optimality
    (may be ever better than left shift), but division by power
    of 2 is much slower than right shift.

3.  If necessary, you can turn off some Cbignumf optimizations
    by undefining the following macro in Cbignumf.h:

    _CBIGNUM_HARDWARE_CBR   Use hardware add/subtract/shift
                            operations with carry/borrow.

    NOTE: Operations with carry are used from early versions of
          the class and are completely implemented in version 1.2.
          On Athlon/Core processors they are about 5 times faster
          than alternative C++ portable code, but hardware and
          compiler dependent (look for Section 1.1). Consequently,
          they are included in separate add-on package instead of
          public distribution of portable class.

          Version 2.1c documents portable CBNL functions, available
          in public distribution of class independently of macro
          _CBIGNUM_HARDWARE_CBR (Section 4.7).

    _CBIGNUM_HARDWARE_MUL   Use hardware multiplication of
                            single-word operands to double-word.

    NOTE: Hardware multiplication is first implemented in assembler
          add-in package for version 1.2a. On Athlon/Core and more
          recent processors it can be dozens times faster than
          alternative multiplication based on bit shifts (depending
          of using in the latter either hardware operations with
          carry/borrow or replacing C++ code).

          Starting from version 2.1, hardware multiplication is
          available in public distribution of class for Microsoft
          Visual C++ 64 bit compilers, and starting from version
          2.1c - for all 32 and 64 bit compilers. But its much
          effective implementation is still contained in separate
          add-on assembler package, which is available for common
          32 and 64 bit compilers (look for Section 1.1).

    _CBIGNUM_HARDWARE_DIV   Use hardware division.

    NOTE: Hardware division is turned off by default in the code
          of class, except for conversion of unlimited number to
          string and factorization of single-word numbers in
          Prime.cpp when compiling in 64 bit mode. The reason
          why is limited effect on most microprocessors.

          Hardware division can be used since version 1.2c for
          division and module of single-word numbers, since
          version 2.1 - to optimize power by single-word module,
          since version 2.1a - to optimize division/module
          of unlimited number by single-word divider. Maximal
          effect is about 3 times and can be reached for limited
          double-word dividends and single-word dividers.

          Starting from version 2.1c all 64 bit test programs
          are being compiled with _CBIGNUM_HARDWARE_DIV turned on
          (this is set by options of compilers, not in the code
          of class). 32 bit programs are still being compiled
          without _CBIGNUM_HARDWARE_DIV.

    _CBIGNUM_KARATSUBA_MUL  Use Karatsuba multiplication
                            if both numbers have 100 or
                            more words, in case of hardware
                            multiplication 50 or more words.

    NOTE: Karatsuba method is first implemented in version 1.2a.
          It is relatively complex in coding and required
          excessive testing, which have been completed in
          version 1.2b public. It is effective for large
          numbers, otherwise one can consider to turn it off
          for more reliable computations.

    _CBIGNUM_BLOCK_MUL      Use block multiplication if both
                            numbers have 715 or more words.

    NOTE: Block method is implemented since version 1.0 to fit
          multiplication into processor cache L1. Now it is
          superceded by much more effective Karatsuba method,
          except for case when one number is shorter than 100
          words in case of bit mutiplication and shorter than
          50 words in case of hardware multiplication.

    _CBIGNUM_TERNARY_MUL    Use 30% faster ternary method for
                            multiplication with table of shifts
                            instead of simpler binary method.

    NOTE: Ternary method is first implemented in version 1.2.
          Now both binary and ternary methods are superceded by
          much more effective hardware multiplication, if the
          latter can be used.

    _CBIGNUM_SHIFTTAB_MUL   Build temporary tables of shifts for
                            accelerating of multiplication, if
                            both numbers have 3 or more words

    NOTE: Tables of shifts are implemented since version 1.0.
          Now they are is superceded by much more effective
          hardware multiplication, if the latter can be used.

    _CBIGNUM_SHIFTTAB_DIV   Build temporary table of shifts for
                            accelerating of division and module,
                            if divider have at least 3 words less
                            than dividend but no more than about
                            512K words in 32 bit mode and 256K
                            words in 64 bit mode.

    NOTE: Division with table of shifts is implemented since
          version 1.0.

    _CBIGNUM_SMALL_DIV      Use special algorithms for small
                            divider and module, currently
                            single-word and double-word.

    NOTE: These algorithms are implementing since version 2.1.

    _CBIGNUM_SMALL_POWMOD   Use special algorithm for power by
                            small, currently single-word module.

    NOTE: The algorithm is implemented in version 2.1.

    _CBIGNUM_REVERSE_MOD    Calculate single-word module thereby
                            reverse multiplication after hardware
                            division.

    NOTE: Option is effective if _CBIGNUM_HARDWARE_DIV is set.
          It is turned off by default because compiler's
          optimization is much effective.

    _CBIGNUM_REDUCE_JUMPS   Use code with extra operations to
                            reduce number of conditional jumps.

    NOTE: Implemented for division of single-word numbers in
          version 1.2c and turned off by default. Ignored since
          version 2.1 because compiler's optimization is much
          effective.

4.  Combined methods with table of shifts may speed up multiple
    operations with the same multiplicand or divider, because
    these operations allow to build table of shifts only once
    a time. Also, this methods do not use internal copying of
    operands assuming that operands do no overlap.

    Combined multiplication addmultab() and submultab() with table
    of shifts is effective only if hardware multiplication is not
    available. Size of first multiplicand should fit into the range
    from 500 bits to 6,000 bits; it can smaller than 500 bits if it
    is larger than multiplier, the latter must not be shorter the
    100 bits. If size of operands goes far out of these restrictions,
    general multiplication will show better performance due to
    automatic block optimization.

    Combined multiplication addmulsmp() and submulsmp() is effective
    also in case when if hardware multiplication is used. Other
    conditions for affectivity are the same as for multiplication
    with table of shifts. Performance gain is several percents due
    to elimination of internal copy operations.

    Combined division and module with table of shifts are effective
    in general for dividers of any size.

5.  Implementation of multiplication, division, module, power and
    power by module contains effective internal optimization for
    single-word numbers and numbers divisible by large power of 2
    (~100 and more).

6.  Transfer of unlimited number to function by reference is faster
    than transfer by value, since in the latter case copy of number
    must be made. If the function does not modify the number passed,
    it is recommended to replace transfer by value with transfer by
    constant reference:
        const cBigNumber&

7.  Construction and destruction if local unlimited numbers,
    determined inside functions and blocks, occupies approximately
    as much time, as 2-3 short arithmetic operations. To avoid this
    overhead, take out local definitions of unlimited numbers from
    critical cycles.

    NOTE: If you do not need to write reenterable code you also can
          prepend "static" keyword and initialize numbers in separate
          assignment operator:
            cBigNumber a = 1;           // Local definition
            static cBigNumber a; a = 1; // Optimized definition

8.  Conversion of very long integer (> 100,000 digits) to
    external decimal format can occupy a lot of time. Conversion
    to external hexadecimal format is 5 times faster, but also
    may be very slow. The fastest output of very long integer
    in hexadecimal format can be carried out by method dump().


4.2.4. Support for multithreaded applications
---------------------------------------------

Complete support of multithreading is introduced in version 1.2c,
where macro _CBIGNUM_MT is set in file Cbignum.h by default.

The macro _CBIGNUM_MT excludes non-reenterable static methods
lastdivmod(), lastrootrm(). Instead, use cBigDivMod(),
cBigSqrtRm() or methods setdivmod(), setsqrtrm().
Also, it excludes non-reenterable method toatmp() unless
macro _CBIGNUM_TOATMP is set, use method toa() instead.

Starting from version 2.1b the macro _CBIGNUM_MT prevents
using of methods fit(), tab(), smp(), gc() and pack() for
const numbers unless macro _CBIGNUM_CONSTCAST is set.

NOTE: If you compile program in compatibility mode on compilers
      that do not support thread local storage (look Section 3),
      you must take into account that macro _CBIGNUM_MT slow down
      binary operations, which create temporary stack objects:
      +, -, *, /, % etc.

      If you compile program do not require for multithreading
      support you can undefine this macro to return to non-reenterable
      implementation. This is especially useful for operations
      on relatively small numbers which becomes 3-4 times faster.
      Also, you can use assigning operations +=, -=, *=, /=, %= etc.
      that are not affected by _CBIGNUM_MT.


4.3. Consumption of memory
--------------------------

From the point of view of consumption of memory, the class is
optimized for numbers, which size is more than 500 bits in
32 bit mode and 1000 bits in 64 bit mode. In this case amount
of allocated memory will be on the average on 37.5% more than
minimal necessary.

The additional optimization can be achieved at observance of the
following rules:

1.  Use common long or special CBNL variables for limited numbers,
    whenever possible, as long variables and constants can be freely
    combined with cBigNumber variables.

2.  Version 2.0 of class and higher versions do not allocate memory
    for numbers on their declaration and when assigning zero value
    in order to save memory allocated for sparse arrays (in which
    most values are 0). Memory will be allocating when assigning
    non-zero value and permodifying operations.

    If necessary, you can turn off this optimization by defining
    the following macro in Cbignum.h:

    _CBIGNUM_DEF_ALLOC      allocate memory in default constructor
                            as in versions 1.x of class.

3.  For duly destruction of the unlimited numbers place them in
    local variables. At use for this purpose global and static
    variable it is necessary to take into account, that due to
    reasons of performance optimization class does not implement
    automatic freeing of superfluous memory allocated for unlimited
    numbers.

4.  To free superfluous memory use method gc().

    * The method gc() is optimized to minimize fragmentation of
      dynamic memory. It leaves zone of expansion, which size is
      average 37.5% of size of number, except for relatively small
      numbers, because the final size of memory allocated for
      cBigNumber variable will be not less than 104 bytes in 32 bit
      mode and 224 bytes in 64 bit mode. Exception is normalized 0
      that is converted to compact zero-length form with releasing
      of all allocated dynamic memory if macro _CBIGNUM_DEF_ALLOC
      is not set.

    * The method gc() does not normalize numbers.

5.  Maximal clearing of memory is achieved at application of method
    pack().

    * The method pack() reduces size of number down to minimal,
      determined exclusively by actual length of stored number.
      The zone of expansion no longer persists, and the class keeps
      in memory only indispensable service information (which size is
      two CBNL words) and binary code of stored number. Exception is
      normalized 0 that is converted to compact zero-length form
      with releasing of all allocated dynamic memory if macro
      _CBIGNUM_DEF_ALLOC is not set.

    * The method pack() does not normalize numbers.

    * Packed condition are not preserved at assignment of any kind,
      including transfer to function by value, because these
      operations are performed by copying binary code of number
      to new object.

    * Packed variable is automatically unpacked at applying of
      any modifying operation except if it is necessary to allocate
      additional memory.

    * The method pack() increases fragmentation of dynamic memory
      that can result in fall of performance.

6.  To set value 0 and free all allocated dynamic memory one can
    use method clear(), which works as gc() after assigning 0.


4.4. Interaction with operation system
--------------------------------------

The class does not limit size of numbers, for what the automatic
allocation of memory is implemented. Depending on size of numbers,
formed by algorithm, amount of allocated memory can result in
exhaustion of system resources and sharp delay of work of operational
system. Therefore in crucial cases it is necessary to restrict maximal
volume of operative memory selected for program.


4.5. Prevention of bugs
-----------------------

The external interface of a class is constructed in maximum exact
conformity with the agreements of language C++ (ref. details in
section "Technical information"), therefore programmers should
pay attention basically to observance of common rules of safe
programming in language C++:

1.  At application of the operators of assignment inside expressions
    it is necessary to take into account, that the order of execution
    is not fixed.

2.  At input of numbers it is recommended to set explicit radix
    thereby modifier dec, oct or hex. Otherwise, radix will be
    determined automatically depending on presence of 0 before
    number.

3.  At usage of the optimized methods it is necessary to observe
    all restrictions, stipulated in the documentation. In particular,
    some optimized methods do not permit their operands to overlap.


4.6. Built-in bug prevention tools
----------------------------------

All methods of a class include tools for the prevention of internal
bugs, which are divided into two classes:

1.  Tools for diagnostics of errors are included at compilation
    in a debugging mode. The given tools are intended for
    revealing the special situations, at which the probability
    of occurrence of so-called irregular errors is increased. In
    particular, the situations of escaping of index out of array
    bounds, missed fulfillment of internal invariants etc. are
    traced. When diagnostic situation is triggered, program
    shows assert message and stops its execution.

    NOTE: If it is necessary to switch off control of indexes at
          the debugging stage, compile program with macro NCHECKPTR.

          Also, you can switch off control of indexes only for
          input (const) arrays by compiling program with macro
          _CBIGNUM_NCHECKPTR.

2.  Tools for correction of errors are included at compilation
    of the program in a release mode with macro NDEBUG. These
    tools are effective enough meaning that they will neutralize
    consequences of some widespread latent errors, including
    some buffer overflow errors.

Due to presence of tools of correction, triggering of tools of
diagnostics at debug mode does not mean that in release mode
methods of class cBigNumber will work incorrectly. Nevertheless,
some probability of incorrect work exists, therefore it is
necessary to inform author of a class about all such cases.

NOTE: Bug prevention tools prevent about 70% of internal bugs,
      but some bugs may remain undiscovered, in particular is
      recently implemented algorithms. It is possible to turn
      recent algorithms off and revert to the previous version
      of code to provide for more reliable computations. Look
      for specific information in Section 4.2.3.

4.7. Basic CBNL type and functions
----------------------------------

File Cbnl.h declares basic type CBNL and some additional functions with
high-efficient intrinsic implementation for Visual C++ 2005 and above
(Visual C++ 2013 and above for carry/borrow operations) and portable
implementation for generic compiler:

    _addCBNL    (l1,l2,*p)  add with returning of carry
    _adcCBNL  (c,l1,l2,*p)  add with carry, returning next carry
    _subCBNL    (l1,l2,*p)  sub with returning of borrow
    _sbbCBNL  (c,l1,l2,*p)  sub with borrow, returning next borrow
    _muldCBNL   (l1,l2,*p)  signed multiplication to double-word *p,ret
    _umuldCBNL  (l1,l2,*p)  unsigned multiplication to double-word *p,ret
    _ushldCBNL  (ll,lh,sh)  high word of unsigned left shift of double-word
    _ushrdCBNL  (ll,lh,sh)  low word of unsigned right shift of double-word
    _ushld1CBNL (ll,lh)     unsigned 1-bit left shift of double-word
    _ushrd1CBNL (ll,lh)     unsigned 1-bit right shift of double-word
    _ushlCBNL   (l,sh)      unsigned left shift  (for code readability)
    _ushrCBNL   (l,sh)      unsigned right shift (for code readability)
    _btCBNL     (num,sh)    extract bit by number
    _ulzcntCBNL (num)       count high zero bits
    _ubsfCBNL   (*p,num)    find lowest meaning bit  (Visual C++ only)
    _ubsrCBNL   (*p,num)    find highest meaning bit (Visual C++ only)

Multiplication finctions _muldCBNL and _umuldCBNL can be implemented
without hardware multiplication if macro _CBNL_MUL is undefined in Cbnl.h.


5. Technical information
========================

This section contains technical information about current
implementation of class without granting guarantees that all
information will be true for the subsequent versions of a class.


5.1. Information on implementation
----------------------------------

The template of dynamic arrays Exarray.h, which is carrying out
allocation of operative memory is put in a basis of a class
cBigNumber. Also, this file contains template of restricted
pointers exptr, which is used in a file Cbignumf.cpp for
organization of check of indexes at a stage of debugging.

To turn off check of indexes set either macro NDEBUG or NCHECKPTR.
Macro NDEBUG is set by default at compilation under IDE Visual C++
in Release mode.

Check of indexes, if turned on, slows down performance of methods
(approximately 150-400 percents), but all incorrect references to
arrays trapped, just as in Java or C#. Thus it is possible to use
safe index arithmetic, which is present neither in Java, nor in
C#.


5.2. Peculiarities of implementation of regular operations
----------------------------------------------------------

The operations on unlimited numbers are carried out by regular
rules of language C++ with use of all regular operators. Some
distinctive features of class are caused by restrictions of
language C++:

1.  Operation sizeof provides for size of descriptor of unlimited
    number (object of class cBigNumber), instead of size of
    internal representation of number in bytes. Size of internal
    representation of unlimited number in CBNL words is provide
    by method length(). Size of unlimited number in bytes is equal
    to production length() * sizeof (CBNL).

2.  Operation & provides for the pointer to descriptor of
    unlimited number (object of a class cBigNumber), instead of
    pointer to internal representation of number in memory.
    The pointer to internal representation of unlimited number
    as array of CBNL words is provided by method base().

3.  No operation on unlimited numbers can cause integer overflow.
    If not enough virtual memory is available for storing of
    intermediate data and result of the operation, the class
    invokes abort() to terminate execution of program.

4.  Before assignment of unsigned C number to unlimited number
    or operation on these numbers the unsigned number is converted
    to signed CBNL number. If unsigned number is greater than
    maximal CBNL number, it silently transmutes to negative number.

    To assign exact unsigned number, use method code():
        cBigNumber a;                   // Unlimited number.
        unsigned CBNL n = 0xFFFFFFFFUL; // Large unsigned number to assign,
        {                               // value is set as an example.
          a.expand (2);                 // Allocate 2 words.
          CBNL *pa = (CBNL*) a.code();  // Get pointer to modify code.
          pa [0] = n;                   // Assign number.
          pa [1] = 0;                   // Set zero sign word.
          pa [-1] = 2;                  // Set length.
          a.fit();                      // Normalize.
        }                               // Delete the pointer.
    Instead of calling fit(), here it is possible to use the following
    assignment:
          pa [-1] = 1 + ((CBNL)n < 0);  // Set normalized length.

5.  Before assignment of floating-point number to unlimited number
    or operation on these numbers the floating-point number is
    converted to signed CBNL number. On conversion, fraction bits
    are discarded; if resulting number does not fit to the CBNL
    range, result of conversion is undefined.

6.  As against usual numbers, for unlimited numbers the operations
    of reduction of a type are not stipulated.
    Conversion of unlimited number to an CBNL integer is carried out
    explicitly by method loword(), which returns CBNL integer.
    Conversion is correct if unlimited number fits into CBNL integer
    range. To ensure that CBNL range is enough, check that size
    size of internal representation of unlimited number is not
    greater than 1:
        if (bignum.length() <= 1) num = bignum.loword();
        else bignum.erange()    /* Error */

    To convert unlimited number with check of range use methods
    toCBNL(), tolong(), toint() and toshort().

7.  To prevent naming conflict with library math.h the function
    of power is named cBigPow().

8.  The power to negative degree is interpreted, as the power of
    reciprocal value to positive degree, where fractional part of
    reciprocal value is discarded. That is, result of operation
    will be as the following: divide error for base 0, 1 for base 1
    and 0 for any base > 1.

9.  The negative shift index is interpreted, as change of a
    direction of shift. In standard C result of negative shift
    is not defined.

10. The square root of negative number is equal to 0.

11. Beginning with version 1.2 output of numbers with hex and oct
    modifier is signed. If you need for unsigned output, use function
    cBigUnsign() or compile with macro _CBIGNUM_UNSIGN_OCT_HEX.


Appendix 1. An explanatory slip to work for contest SofTool'99
==============================================================

                 Task 1 (company Aladdin)

Write a C++ class cBigNumber, allowing to work with integers of
any length. To realize the overloaded operators of addition,
subtraction, multiplication, integer division, reception of the
rest from division and power. Arguments of the operators can be
objects cBigNumber and usual integers (int).

Implement functions of input of number from the keyboard and
output to the screen in decimal and hexadecimal format.

                     Explanatory slip.

The task is formulated briefly, therefore I have directed to a
competitive commission the letter with the request to specify
criteria of an estimation of competitive works. As the answer was
not received, the conclusion was made that all necessary items of
information contain in conditions of a task in either explicit
or implicit kind.

On the basis of task conditions there were formulated the following
requirements:

1) Size of numbers and choice of algorithms of multiplication and
   division.

The optimal algorithms of multiplication and division significantly
depend of size numbers; it is not obviously possible to implement
all algorithms in competitive work

Analyzing conditions of a task, it is possible to assume, that the
algorithms should br oriented to numbers of any size and be optimal:
- for operations on large and short number, since these
  operations are especially stipulated in a task 1;
- for operations on small large numbers (up to 1000 bits),
  since successive task 2 devoted to check for simplicity
  of number.
Because of this:
1. The numbers are represented in an binary complement code.
2. For multiplication and division there were choused "school"
   algorithms in binary variant.
3. At implementation of the class the special attention was
   given to reduction of overheads of call of computing methods.

The performance of a class at work with numbers located within the
limits of a 32 bit numbers, makes about 500 thousand arithmetic
operations per one second on Pentium-166, that is approximately 20
times slowly, than built-in arithmetic.

On increase of size of both operands of addition and subtraction
and one of operands of of multiplication and division performance
falls proportionally to size of operand; on increase of size of
both operands of multiplication and division performance falls
proportional to production of sizes.

The algorithms were tested for numbers of size up to 100,000 bits.

2) Portability.

As program platform hardware is not stipulated in conditions of the
task, and the compiler Visual C++ is mentioned only, it means,
that the class should be portable at least between those
platforms, for which work brave fellows from Microsoft. That is,
Intel, Alpha, Strong ARM etc.

Therefore class cBigNumber is written on standard C++ without
application of machine-sensitive assembler instructions and
non-standard types of the data, such, as _int64. I believe,
that this decision will create the good preconditions for
portability of the class.

3) Completeness of realization of regular operations of language C.

The programmer using class cBigNumber, should have an opportunity
to use all regular operations of language C without any
restrictions.


Appendix 2. Known issues
========================

1. Some compilers reject operator ? if parts contains expressions
   with different operations, for example: (a < 0? -a: a).

2. Class can not get square root for numbers of size greater
   than CBNL_MAX bits and provide for shift for degree outside the range
   of -CBNL_MAX..CBNL_MAX bits (reports "cBigNumber value out of range").

3. On assignment of C number to unlimited number and on any operation
   on these numbers the C number undergo preliminary conversion
   to signed long type. This conversion will not be correct if C
   number is of either unsigned or floating-point type and value
   of number is greater than CBNL_MAX.

4. For some unclear reason GNU g++ 2.9.6 builds strange programs with
   malfunctioned streams cout and cerr if global cBigNumber is
   initialized by string values, for example:
        static const cBigNumber big_const ("2147483648", 10);

   If you need write Unix portable code, please, either declare
   such objects locally inside {} or initialize them separately.

5. Compiler C++ of Microsoft SDK 2003 R2 (64 bit) generates messy
   warnings when working with new iostream library. To complete
   with compilation, one must either force use of old library by
   macro _CIOS_OLDSTREAM or turn off option "treat warning as
   errors" (/WX-).

6. Since version 1.2c test programs may not compile under
   16 bit Borland C++ 3.1 due to memory issues. To use this
   compiler you should set macro NCHECKPTR and/or turn off
   redundant block of code in the file Cbignumf.h.

7. Since version 2.0 it may be possible to catch INDEX_RANGE_ERROR
   alert in debug mode for zero values in some methods of class.
   These alert is not significant and do no affect accuracy of
   computations. To turn off these alerts in debug mode, use macro
   _CBIGNUM_DEF_ALLOC. Alternatively, it it possible to turn off
   check of indexes for input arrays by macro _CBIGNUM_NCHECKPTR.


Appendix 3. What's new
======================

Sep 15, 1999 - Work for contest SofTool'99

- Basic long integer arithmetic, power and tests for primality.


Oct  8, 1999 - The class is uploaded to Internet.

- Accelerated division (2 times) using table of shifts.

- Accelerated output (5 times).


Oct 10, 1999

- Accelerated division (2 times) using x386 instructions on
  Borland C++ 4.5.


Apr 12, 2000

- Support for Visual C++ 6.0, except for x386 instructions.


Sep  5, 2001

- Methods tolong(), toint() and toshort() for obtaining of common
  C integers with check of range.


Mar 14, 2003 - Version 1.0 beta public

- Methods for calculation of square root.

- Program Arifexp and command file Arifrand.bat for testing
  of class on random numbers.

- Testing on 20,000,000+ random numbers.

- Accelerated (3-4 times) multiplication using table of shifts,
  block method and optional x386 instructions on Borland C++ 4.5.
- Accelerated input (5 times).

- Optimized shift for Intel x386 and Borland C++ 4.5.

- Methods setdivmod() and setdivmodtab() for division with remainder,
  which does not affect cBigNumber::lastdivmod().

- Output and conversion to string now keeps cBigNumber::lastdivmod()
  intact.

- Methods gc() and pack() for memory release purposes.

- Method set() now suggests decimal number by default.

- Removed method compl(), which name is reserved in C++.

- FIXED: some arithmetical errors in division and power by module.

- FIXED: a number of memory allocation bugs.

- FIXED: some inconsistencies of output operator with C++ standard.

- Compiling under GNU g++ 2.9.6, except for x386 instructions.

- Refined license statement and augmented documentation.

- Documentation is translated to English.


Mar 31, 2003 - Version 1.0 beta public update

- Compatibility with Borland C++ Builder 6.0.


Jul  5, 2003 - Version 1.1 beta public

- Compatibility with Visual C++ 7.0.

- Source code with comments in English.

- Some changes in the template Exarray.h.


Sep 12, 2005 - Version 1.1a beta public

- FIXED: Sign of result for power by module now does not depend
         on sign of module.
  ATTENTION: Check if your programs does not depend on this
             specification bug!

- FIXED: Power by module now work correctly for negative base.

- FIXED: Memory allocation bug on power by negative module with
         high word 0x80000000.

- FIXED: Memory allocation bug on power by module 0 if division
         by 0 is allowed.

- FIXED: Minor memory allocation bug in debug mode when
         multiplying two large (more 480 words) negative
         numbers with high words 0x80000000.

- FIXED: Assert error in debug mode of setdivtab(), setmodtab(),
         setdivmodtab() when dividing negative number by 0.

- FIXED: Shift operations overwrite cBigNumber::lastdivmod
         if number of bits is given by unlimited number.

- FIXED: Overflow bug in square root if size of numbers greater
         INT_MAX bits.

- FIXED: Description of argument n for method addmultab (a,b,n).

- Corrections in Section 1.2 and additions to Section 4.2.3
  of documentation.

- Recently discovered issues, including reenterability issue
  are described in Appendix 2 of documentation.


Sep 28, 2005 - Version 1.1a beta public update

- FIXED: Power by module returns 1 if the high word of module is 0
         (thanks to Alexander).

- Example of compiling for Linux in Section 3.


Oct 31, 2005 - Documentation update

- Accent on issues related with unsigned and floating point numbers.

- Documented method expand() for memory allocation.

- Guidelines for using of method code().


Nov 12, 2005 - Version 1.1a beta public update

- Assembler code now is distributed as separate add-on package.

- Test programs now are built by Microsoft Visual C++ 6.0
  with assembler optimization.

- Performance tests for various processors in Section 4.2.

- Updated estimations of performance in Section 4.2.2.

- FIXED: Method setneg() does not invert sign of number LONG_MIN.

- FIXED: Memory allocation bug in exarray.h.


Nov 26, 2005 - Version 1.1a beta public update

- Files Random.h and Random.c renamed to Random3.h and Random3.c.

- FIXED: Wrong Wheel Factorization Table in Prime.cpp (thanks to Nicolas).

- FIXED: Method b_SPRP() returns probable prime on even numbers (Nicolas).


Sep  5, 2006 - Version 1.2 beta internal

- Compatibility with Visual C++ Express 2005 and other compilers
  that does no support old library iostream.h.

- Link of iostream library now is necessary only for appropriate
  stream input-output operators. In particular, basic conversions
  of number to string and string to number now does not require
  for class iostream.

- Improved performance of multiplication (50%) for numbers
  containing 3 or more words.

- Optimized operations += and -=. In addition to more effective
  C code these operation now take advantage of assembler
  optimization in add-on package.

- More flexible functions for primality proving with optimized
  algorithm. Now SPRP() and trial division works approximately
  two times faster for modules > 7919.

- CHANGED: Function HasFactor() renamed to TestFactor().
           To get it back define macro _CBIGNUM_HASFACTOR.

- CHANGED: Non-0 return codes of SPRP() and IsPrime().

- CHANGED: To use operators of console input-output << and >>
           include new header file Cbignums.h. Header file
           Cbignum.h now does not link iostream library.

- CHANGED: Output of numbers with hex and oct modifier is now signed.
           If you need for unsigned output, use function cBigUnsign()
           or compile with macro _CBIGNUM_UNSIGN_OCT_HEX.

- CHANGED: Methods and functions of class now do console output
           via special streams cTTY, ref. Section 3.2.
           By default, output is TURNED OFF.

- CHANGED: Static methods lastdivmod() and lastrootrm() are deprecated
           and will be excluded in version 2.0. Instead, use
           setdivmod() and setsqrtrm().

- CHANGED: Method erange() now types short dump of number.

- CHANGED: Method bits() now returns long value instead of int,
           and calls erange() on overflow.

- New methods: setbits(), exbits(), setexbits(), words(), exwords().

- New constructors and methods cBigAbs(), abs(), setabs(),
  cBigUnsign(), unsign(), setunsign().

- Random generator: constructor cBigRandom(), method setrandom(),
  long functions in Random3.h and Random3.cpp.

- New methods submul(), submultab().

- Methods setdivtab(), setmodtab(), setdivmodtab() now do not
  require for dividend to be normalized.

- Optimized multiplication of large numbers (more 480 words)
  by small numbers (1 or 2 words).

- Minor optimization of division (for very large numbers).

- Power is allowed for numbers of size greater than INT_MAX bits.

- FIXED: Method dump() is dependant on CPU digit capacity.

- FIXED: Minor bugs in operations with non-normalized 0 numbers.

- FIXED: Minor bugs in diagnostic of insufficient resources
         for left shift and power.

- New version of program Arifexp:
    Keys -div0, -size, -v, -rep n, -rand 0, -rand all.
    New operations ++a, --a, a-b*c, aR, removed operations a+b\c, a*b%c.
    Suffix R for substitution of random number with given number of bits.
    Key -check now handles all operations, except for power with
    degree other than 2,4,8.

- Extra tests in Arifrand.bat.

- Random generator test Arifr.bat.


Nov 29, 2006 - Version 1.2 beta public

- Functions for factor and SPRP test get optional output parameter for
  last tested either factor or SPRP base. If number proved composite,
  this argument contains either obtained factor or missed SPRP base.

- CHANGED: Renamed some functions introduced in internal version:
           FastExactSPRP() to FastSPRP(), PostExactSPRP() to PostSPRP().

- Options to select algorithm in program Miller.

- Extended testing of program Miller on 1,000,000,000+ examples.

- FIXED: Program crushes when compiling in Release mode under compilers
         Visual C++ .NET, SDK 2003 and 2005 with option /Ob2 (auto inline).


Apr 30, 2007 - Version 1.2a beta internal

- Fast multiplication of large numbers by Karatsuba method. Performance
  is greatly improved if both operands contain more than 6,000 bits.

  NOTE: Karatsuba method is relatively complex in implementation
        and testing. It is included in version 1.2a as a part
        of beta test process. If you need for more reliable
        computations, turn this method off thereby undefining
        macro _CBIGNUM_KARATSUBA_MUL in file Cbignumf.inl.

- Due to Karatsuba optimization, recommended upper limit for
  addmultab() and submultab() methods is now 6,000 bits.

- Experimental limited support for multithreading applications,
  under macro _CBIGNUM_MT.

- New operations cBigDivMod() and cBigSqrtRm().

- New machine-dependant methods set() and setr().

- FIXED: Stack overflow for shift with negative degree LONG_MIN.

- FIXED: Power and power by module overwrites cBigNumber::lastdivmod
         if degree is negative.

- FIXED: Mistyping in English introduction to internal functions
         cBigNumberMAddM(), cBigNumberMSubM(), _cBigNumberMSubD()
         (not a bug in the code).

- Class uses standard iostream library as a default under compilers
  Microsoft Visual C++ .NET, GNU C++ 3.x and higher compatible.

- Compiling under GNU g++ 4.1, except for x386 instructions.

- Test of power by module Arif2.bat.

- 32 bit assembler for Microsoft Visual C++ and Borland C++ Builder
  (as add-on package).


Sep 22, 2007 - Version 1.2a beta public

- Hardware multiplication using assembler MUL command
  (as add-on package), approximately 10 times faster.

  NOTE: Assembler add-on package is not included in the public
        distribution of class.

- Two implementations of Karatsuba method, optimized accordingly
  for public C code and for assembler add-on package.

  NOTE: Karatsuba method is relatively complex in implementation
        and testing. It is included in version 1.2a as a part
        of beta test process. If you need for more reliable
        computations, turn this method off thereby undefining
        macro _CBIGNUM_KARATSUBA_MUL in file Cbignumf.inl.

- Fast copying in debug mode if add-on package is used.

- Auxiliary algorithmic acceleration of input of numbers.

- Performance comparison with NTL library.

- FIXED: Multiplication of negative and positive number
         may be incorrect in version 1.2a beta internal

- FIXED: Macro _CBIGNUM_BLOCK_MUL can cause for buffer overflow.
         in version 1.2a beta internal

- FIXED: Stream operator << can ignore ios::uppercase and ios::showbase
         flags (thanks to Nicolas).

- FIXED: Ignored width() modifier of output stream.

- FIXED: Internal function cBigNumberCopyShr() do not work
         for non-normalized 0.

- FIXED: Increased size of buffers for internal functions
         cBigNumberPow(), cBigNumberPowMod().

- FIXED: Size of buffer in specification to internal functions
         cBigNumberMAddMulShl(), cBigNumberMSubMulShl(),
         cBigNumberMAddMulShlKar(), cBigNumberMSubMulShlKar()
         (not a bug in the code).

- FIXED: Program Arifexp do not test remainder during reverse check
         of multiplication.

- FIXED: Option -size of program Arifexp and method dump() do not
         work properly under Visual C++ .NET.

- Improvements in program Arifexp:
  - Average time under option -rep.
  - Much effective multiplication test under option -check.
  - Option -time.

- Tested compatibility with Microsoft Visual C++ Express 2008.


Jun 12, 2009 - Version 1.2b beta public

- Reenterable method toa() instead of deprecated toatmp().

- Operations cBigBits() and cBigExBits().

- New files Cbignumf.h and Cbnl.h provides compiler-time
  information for more accurate time estimation in program Arifexp.

- Program Arifexp now checks for correctness of output conversion, if
  option -check is set.

- Operator ? for comparing in program Arifexp and test of comparing in
  command file Arifrand.bat.

- FIXED: Program Arifexp do not shows error codes for wrong expressions.

- FIXED: Estimation of multiplication time in program Arifexp.

- FIXED: Estimation of division time in program Arifexp and documentation.

- FIXED: Estimations for Linux in program Arifexp.


Jul 28, 2009 - Version 1.2b public

- Optimization of block multiplication in add-on assembler package
  when Karatsuba method is not applicable (performance gain up to 5%
  when one operand is less than 50 long words and second operand is
  10 times and more greater).

- New methods addmulsmp() and submulsmp() use hardware multiplication
  if it is in effect instead of table of shifts. Methods addmultab()
  and submultab() always use table of shifts.

- New method smp() builds table of shifts if hardware multiplication
  is not in effect.

- New version of template Exarray.h for 64 bit mode.

- Generator Random3.c updated to support 64 bit mode.

- Compatibility with GNU g++ 4.1.2 in 64 bit mode.

- Compatibility with GNU g++ 4.2.3.

- Gettimer.c can be compiled under DPMI in Borland C++ 4.5.

- Gettimer.c use long operations instead of double if possible.

- FIXED: Wrong ++ and -- in program Arifexp under option -v.

- FIXED: Programs do not return error code 255.

- Command file Millrand.bat for testing for primality.

- 32 bit executable files for Pocket PC in add-on package.

- 64 bit executable files for Linux and .sh test files.

- Performance in 64 bit mode under Linux in Section 4.2.

- Performance under ARM Pocket PC in Section 4.2.

- FIXED: Performance of assembler code for processors Pentium III/933
         and Pentium 4C/2400 by tests Arif1-3 in Section 4.2.

- Express testing in 64 bit mode.

- Beta test is finished.


Aug 27, 2009 - Version 1.2b public documentation update

- Performance tests for ARM, Intel Atom and AMD Phenom.


Nov 19, 2009 - Version 1.2b public addendum

- Voting by addmultab()/submultab() multiplication in program Arifexp.

- Fast factorization of 64 bit numbers thereby 64 bit machine divide
  in Prime.cpp, actual for 64 bit g++ compiler.

- Program Miller64 for fast factorization of 64 bit numbers
  under 64 bit Linux.


Dec 15, 2009 - Version 1.2c beta internal

- Class now has reenterable code and supports multithreading.

- CHANGED: Macro _CBIGNUM_MT is set by default.

- CHANGED: Static non-reenterable methods lastdivmod(), lastrootrm()
           are excluded unless macro _CBIGNUM_MT is unset, Instead,
           use operations cBigDivMod(), cBigSqrtRm() or methods
           setdivmod(), setsqrtrm().

- CHANGED: Non-reenterable method toatmp() is excluded unless either
           macro _CBIGNUM_MT is unset or macro _CBIGNUM_TOATMP
           is set. Use method toa() instead.

- String<>number conversions and square root are optimized for use
  with macro _CBIGNUM_MT.

- Multiplication, division, module, power and power by module are
  optimized for numbers divisible by large power of 2 (~100 and more),
  when macro _CBIGNUM_MT is set.

- Power by module is optimized for module divisible by large power
  of 2 (~100 and more).

- Special code for ~10 times faster division/module of numbers
  with single meaning word.

- Several times faster code for for bits() and exbits() methods.


Jan 10, 2010 - Version 1.2c beta internal update

- Header file Cthr.h for multithreading support in applications.

- FIXED: Racing condition in class cBigTemp in multithreading mode
         (but class now works slower - will improve performance
         in beta public version).

- FIXED: Incorrect implementation of method set(a,n) for type long.


Jul 28, 2010 - Version 1.2c beta pubic

- Implemented support for thread local storage in class cBigTemp
  make version 1.2c as fast in multithreaded mode as the previous
  version 1.2b in single-threaded mode. Look description of macro
  EXTHREAD_LOCAL in Section 3.

- Header file Exthread.h automatically defines macro EXTHREAD_LOCAL
  for Borland/Microsoft compilers and GNU g++.

- New option -par n in program Arifexp for output of data in parallel
  thread under Windows.

- Program Arifexp now do not output of check data if they conforms
  to output data.

- Command file Arifrand.bat works 2-3 times faster due to output of
  data in parallel thread and regretting of output of check data
  if they conforms to output data.

- FIXED: Several racing conditions in input-output methods in
         multithreading mode.


Sep 27, 2010 - Version 1.2b public update

- FIXED: Incorrect implementation of cBigAbs() and setabs().

- FIXED: Incorrect implementation of cBigExBits() and setexbits(),
         wrong example in description of cBigExBits().


Sep 28, 2010 - Version 1.2c beta public update

- Performance test for Intel Core i7.

- Program Arifexp and test file Arifrand.bat implement operations
  ~, @(abs), U(unsign), M(bits), U(exbits).

- Enhanced time estimation method in program Arifexp.

- FIXED: Incorrect implementation of cBigAbs() and setabs().

- FIXED: Incorrect implementation of cBigExBits() and setexbits(),
         wrong example in description of cBigExBits().

- FIXED: Memory allocation bug in string to number and number to
         string conversion.

- FIXED: Incorrect result of cBigCompl() and setcompl()
         for 0-word numbers.

- FIXED: Incorrect result of +, -, ^, &, | for 0-word and
         machine number.

- FIXED: Possible incorrect result of method loword()
         for 0-word numbers.

- FIXED: Heap damage caused by tab() and smp() for 0-word numbers.

- FIXED: Assert error in debug mode when calculating power by
         module for 0-word base.

- FIXED: Time estimation for add/sub in program Arifexp.

- Updated time estimation for division.


Oct  1, 2010 - Version 2.0 beta internal

- CHANGE: Use type CBNL defined in Cbnl.h instead of long to handle
          64 bit numbers under Visual C++ in 64 bit mode.

- CHANGE: Methods code(), loword(), hiword(), bits(), exbits() now
          returns CBNL value, which may be long, __int64, __int128
          etc. depending of compiler.

- CHANGE: Number of value 0 may contain 0 words in the code.

- All constructors, assign operations and method set now do not
  allocates memory for number 0, if is was not allocated earlier
  (memory is allocated when assigning non-0 value and after
  modifying operations ever if result is 0).

- Methods gc() and pack() frees all memory allocated for number 0.

- 64 bit assembler for Microsoft Visual C++ (as add-on package),
  performance gain 3-40x for AMD processors.

- Compiler detection files Cbnl.inl and Cbnl64.inl (in add-on package).

- Programs Arifexp64, Matrix64, Miller64 for 64 bit Windows.

- Test files .bat call 64 bit programs under 64 bit Windows.

- New conversion method toCBNL().

- Documented and corrected method clear().

- Performance tests and estimation for 64 bit mode in Section 4.2,
  including Intel Core i7 and AMD Phenom.


Nov 20, 2010 - Version 2.0 beta internal update

- Fastcall calling convention for time-critical functions under
  Visual C++ in 32 bit mode as compilation option in Cbnl.h.
  Not used by default because not effective.

- Alternative 32 bit assembler code using fastcall calling
  convention under Visual C++ as compilation option in Cbnl.h
  Not used by default because not effective.

- Remove redundant operations from 64 bit assembler code.

- Documented macro _CBIGNUM_DEF_ALLOC and _CBIGNUM_NCHECKPTR.

- Reviewed Sections 1.1, 1.2 and 4.3.


Nov 22, 2010 - Version 2.0 beta internal addendum

- Hardware 32/64 bit multiplication using assembler MUL instruction
  for GNU g++ (as add-on package), approximately 8 times faster.

- Executable files for Linux with accelerated 32/64 bit hardware
  multiplication.


Mar 30, 2011 - Version 1.2c public

- Documented and corrected method clear().

- Reviewed Sections 1.1, 1.2 and 4.3 of documentation.

- Beta test is finished.


Mar 30, 2011 - Version 2.0 beta public

- Internal test is finished.


Mar  5, 2013 - Version 2.0 public

- Performance test for AMD FX.

- Beta test is finished.


Mar 12, 2013 - Version 2.0 public documentation update

- Removed issue concerning restricted reenterablilty.

- Information about sparse arrays in Section 4.3.


May 05, 2013 - Version 2.0 public documentation update

- Performance test for Intel Xeon E3-1230.


Jan 16, 2015 - Version 2.0 public documentation update

- Performance test for Intel Xeon E3-1240v3.

- Performance test for Core 2 Quad Q8200.


Jul 29, 2016 - Version 2.0a public

- Compatibility with Microsoft Visual C++ 2015.

- 25% faster assembler multiplication code for Intel Haswell
  using MULX instruction.

- Programs Arifexp64x, Matrix64x and Miller64x for processors
  with BMI2 instruction set under 64 bit Windows.


Dec 28, 2016 - Version 2.1 beta internal update to beta public

- Use of Microsoft Visual C++ intrinsics as an option for
  portable 64 bit C code.

- Fast hardware multiplication without add-on assembler package
  under 64 bit Microsoft Visual C++, approximately 10 times
  improve in performance.

- Optimized division of 2-3 words number to 32/64 bit divider
  with option to use hardware division if macro
  _CBIGNUM_HARDWARE_DIV is set.

- Optimized 32/64 bit module of 2-3 words number with option to
  use hardware division if macro _CBIGNUM_HARDWARE_DIV is set.

- Optimized power by 32/64 bit module, 10 times faster under
  Microsoft Visual C++ 2015 (for other compilers effect may be
  smaller). It is possible to use to use hardware division if
  macro _CBIGNUM_HARDWARE_DIV is set.

- Option _CBIGNUM_REDUCE_JUMPS is ignored because optimization
  of compiler Microsoft Visual C++ 2015is more effective.

- Restarting beta test process due to development of new code.

- FIXED: Mistype in test of division within Arifrand test file.


Jun  9, 2017 - Version 2.1a beta internal

ONLY FOR EVALUATION AND TESTING, NOT FOR COMPUTATIONS

- New simplified and faster algorithm for division of unlimited
  number to 32/64 bit divider with option to use hardware division
  if macro _CBIGNUM_HARDWARE_DIV is set. This algorithm replaces
  algorithms previously developed for versions 1.2c and 2.1.

- New simplified and faster algorithm for 32/64 bit module of
  unlimited with option to use hardware division if macro
  _CBIGNUM_HARDWARE_DIV is set. Replaces algorithms previously
  developed for versions 1.2c and 2.1. Accelerates Miller strong
  probable prime test to run approximately twice faster for
  single-word numbers and 20% faster for double-word numbers.

- Updated Millrand test file for larger numbers.


Jun 27, 2017 - Version 2.1a beta internal update to beta public

FOR EVALUATION AND TESTING

- Use of intrinsics for 32 bit Microsoft Visual C++.

- Macro _CBIGNUM_SMALL_DIV and _CBIGNUM_SMALL_POWMOD for
  algorithms implemented since version 2.1 for small divider/module.

  NOTES: Unsetting macro _CBIGNUM_SMALL_DIV allows using this beta
         internal test version for computational purposes.

         Unsetting macro _CBIGNUM_SMALL_DIV and _CBIGNUM_SMALL_POWMOD
         limits this test version to algorithms of version 2.0 public.

- FIXED: Wrong result of division CBNL_MIN/CNBL_MIN.


Jul 27, 2017 - Version 2.1b beta internal

FOR EVALUATION AND TESTING

Unset macro _CBIGNUM_SMALL_DIV and (possibly) _CBIGNUM_SMALL_POWMOD
when using this beta internal test version for computational purposes.

- Fast new algorithm for division of unlimited number to 64/128 bit
  double-word divider, 2-3 times improve in performance.

- Fast new algorithm for 64/128 bit double-word module of unlimited
  number, 2-3 times improve in performance.

- Faster power by 64/128 bit module (2-3 times).

- Using LZCNT instruction instead of BSR if compiling for AVX2
  under Microsoft Visual C++.

- Check of division/module to small divider by alternative
  shift table algorithm in program Arifexp.

- FIXED: Methods toCBNL(), tolong(), toint() and toshort()
         do not work for non-normalized numbers.

- FIXED: Shift to non-normalized number does not work.

- FIXED: Incorrect check of indexes in power of non-normalized
         base.

- FIXED: Memory allocation bug in power by non-normalized degree.

- FIXED: Possibly incorrect clearing of numbers in methods
         clear(), gc() and pack() (bug in version 2.1a beta).


Sep 10, 2017 - Version 2.1b beta public

IMPORTANT NOTE:
Unsetting macro _CBIGNUM_SMALL_DIV and _CBIGNUM_SMALL_POWMOD
limits this beta version to algorithms that have been tested
in version 2.0 public. The class uses new, recently developed
division algorithms only if these macros are set (by default).
The new algorithms now undergo public beta test process and
are not recommended for purposes of reliable computations.

- Optimization of algorithms for division of unlimited number
  to 32/64/128 bit (single-word or double-word) divider
  in order to avoid conditional branches. This optimization
  gains about 2 times grow of performance in comparison with
  initial implementation in versions 2.1a and 2.1b internal.

- Optimization of 32/64/128 bit (single-word or double-word)
  module of unlimited number in order to avoid conditional
  branches. This optimization gains about 2 times grow of
  performance in comparison with initial implementation in
  versions 2.1a and 2.1b internal and also accelerates power
  by double-word module up to 60-80%.

- Optimization of algorithm for power by single-word module
  in order to avoid conditional branches when compiling under
  Microsoft Visual C++ 2010 (gains 2-3 times grow of performance,
  later we can see it only under Visual C++ 2015).

- Minor optimization of hardware division with module. Old
  method is available if macro _CBIGNUM_REVERSE_MOD is set.

- Optimization of output of unlimited number (about 10% or more
  percents of performance for relatively small numbers).

- Optimization of methods of division and module with table
  of shifts in purpose to reduce class overheads.

- CHANGED: Methods fit(), tab(), smp(), gc() and pack() now are
           not available for const numbers because of possible
           multithreading issues. Set macro _CBIGNUM_CONSTCAST
           in file Cbignum.h if you need to use these methods
           as before.

- New methods divtab() and modtab(), extra documentation for
  methods setdivtab(), setmodtab() and setdivmodtab().

- Removed register qualifier from Cbignum.cpp and Cbignumf.inl
  for conformance with C++ 11 standard.

- Tested compatibility with Microsoft Visual C++ 2017.

- All programs for Windows now are compiled under Microsoft
  Visual C++ 2015 Community because of implemented support
  for carry/borrow intrinsic functions. Programs compiled
  under Visual C++ 2012 and below will work remarkably slower
  for double-word divider/module.

- Test programs Miller64 and Miller64x for Windows compiled
  with macro CBIGNUM_HARDWARE_DIV turned on.

- New executable files for Linux with improved code of 32/64
  bit hardware multiplication and accelerated operations with
  small divider/module.

- FIXED: Prime.cpp do not turn on hardware division for
         factorization of single-word numbers when compiling
         under Microsoft Visual C++ in 64 bit mode.

- FIXED: Compiler warning in Prime.cpp related with Ctty.h.

- FIXED: Test programs Arifexp64x, Matrix64x and Miller64x
         for processors with BMI2 do not start under 64 bit
         Windows XP.

- FIXED: Overvalued estimation of division time in program
         Arifexp for large dividers below ~16,000,000 bits.

- FIXED: Random generator may not work properly if compiling
         in more then 64 bit mode (not actual for modern
         compilers).


Nov 10, 2017 - Version 2.1b beta public documentation update

- Minor corrections in documentation and comments.


Nov 12, 2017 - Version 2.1c beta public

FOR EVALUATION AND TESTING
When used for computational purposes it is recommended to unset
macro _CBIGNUM_HARDWARE_MUL for all translators, except for
64 bit Microsoft Visual C++. Look also important note concerning
_CBIGNUM_SMALL_DIV and _CBIGNUM_SMALL_POWMOD in version 2.1b.

- Being prepared to be future high-performance stable version
  to replace the current 2.0 public.

- Operations on machine numbers now are located in file Cbnl.h.

- Using of type long long for CBNL if compiler is compatible
  with C++ 11.

- Fast hardware multiplication without add-on assembler package
  under 32 bit Microsoft Visual C++, approximately 10 times
  improve in performance.

- Using of probably faster hardware multiplication instead
  of binary method for generic C++ compiler (binary method
  is also available if macro _CBIGNUM_HARDWARE_MUL is unset).

- Support for 128/64 and 64/32 bit hardware division in add-on
  assembler package under Visual C++ and GNU C++.

- All 64 bit test programs for Windows and Linux now are being
  compiled with macro _CBIGNUM_HARDWARE_DIV that enables
  use of hardware division.

- Programs Arifexp and Miller show information on bit count
  and methods used to accelerate computations.

- Updated time estimation for multiplication in program Arifexp.

- Documented option -mhz in program Arifexp.

- Test results for 32 bit C++ code with hardware multiplication.


Nov 30, 2017 - Version 2.1c beta public documentation update

- Performance test for Core i7-6800K and Phenom II X6.


Dec 19, 2020 - Version 2.1c public

- 40% faster assembler multiplication code for AMD Zen and Zen+
  processors using MULX instruction.

- Performance test for Athlon 200GE and Ryzen 5 2600.

- Beta test is finished.


Oct 23, 2022 - Version 2.1c public documentation update

- Corrected characteristics of Phenom II X6 in test data.

- Documented inline functions of Cbnl.h (Section 4.7).


Oct 12, 2023 - Version 2.1c public update

- FIXED: Including malloc.h under C++ 98 compatible compiler.


Oct 29, 2023 - Version 2.1c public update

- FIXED: Ambiguous stream operators << and <<= for CBNL long long.

- FIXED: Clang warnings dangling-else and undefined-bool-conversion.

- Cbignumc command file for clang++.

- Class uses standard iostream library for input-output if the
  compiler conforms to C++ 11 standard.

- More information in Section 1.2 of documentation.


Dec 31, 2023 - Version 2.2 beta internal

FOR EVALUATION AND TESTING, NOT FOR COMPUTATIONS

- Fast division by unlimited divider using hardware multiplication
  and division. Average 10-20 times faster on modern processors when
  compiling with add-on assembler package. It is possible to compile
  without add-on package under Microsoft Visual C++ 2019 and higher
  (except for programs, compatible with Windows XP/2003). New
  algorithm is selected by macro _CBIGNUM_SUBMUL_DIV, turned
  on automatically depending of compiling conditions.

- Average 7-14 times faster power by unlimited module on modern
  processors (with macro _CBIGNUM_SUBMUL_DIV).

- Average 5-10 times faster conversion of unlimited number to decimal
  string on modern processors (with macro _CBIGNUM_SUBMUL_DIV).

- Twice less expand of memory when dividing with table of shifts
  in 64 bit mode (experimental for Visual C++ with macro
  _CBNL_TAB_FULL, slower and not actual now as new algorithm does
  not require table of shifts at all, turned off by default).

- Test of division in Arifexp by alternative operation
  (with macro _CBIGNUM_SUBMUL_DIV).

- Additional checks in Arifexp for sign and greatness of remainder
  and module.

- Additional check in Arifexp for sign of remainder of square root.

- Tested compatibility with Microsoft Visual C++ 2019/2022.

- Support for double word dividing intrinsics of Visual C++ 2019
  and higher (except for programs, compatible with Windows XP/2003).

- Option to exclude all hardware multiplication operations by
  undefining macro CBNL_MUL in Cbnl.h.

- Updated Section 4.2 of documentation.

- FIXED: Index range error in debug mode when comparing zero
         denormalized number.

- FIXED: Programs Arifexp and Miller do not compile if
         _CBIGNUM_SMALL_DIV macro is not set.

- FIXED: Possibly incorrect work of random generator on some UNIX
         compilers with different long and long long types.


Jan 03, 2024 - Version 2.2 beta internal update

FOR EVALUATION AND TESTING, NOT FOR COMPUTATIONS

- FIXED: Dividing of numbers with equal length, CBNL_MAX high words
         and different signs.

- FIXED: Unavailable single-word hardware division in programs
         compiled for Windows XP.


Jan 13, 2024 - Version 2.2 beta internal update

FOR EVALUATION AND TESTING, NOT FOR COMPUTATIONS

- Updated estimation of computational time for division and power
  by module in program Arifexp.


Jan 17, 2024 - Version 2.2 beta internal update

FOR EVALUATION AND TESTING, NOT FOR COMPUTATIONS

- Increased size of buffer for remainder for internal functions
  cBigNumberMModDiv() and cBigNumberMMod() in _CBIGNUM_SUBMUL_DIV
  mode.

- Increased size of buffer for base for internal function
  cBigNumberPowMod() in _CBIGNUM_SUBMUL_DIV mode.

- FIXED: Disabled overlapping of module with base and result in
         internal function cBigNumberPowMod().

- FIXED: Requirements for size of buffer for result in internal
         functions cBigNumberMAddMulM() and cBigNumberMSubMulM().

- FIXED: Assert error in debug mode of new algorithm of division.


Jan 26, 2024 - Version 2.2 beta public candidate

FOR EVALUATION AND TESTING, NOT FOR COMPUTATIONS

Unset macro _CBIGNUM_SUBMUL_DIV when using this beta internal
test version for computational purposes.

- Test programs for Linux.

- FIXED: Assert error in debug mode for negative divider with
         high word -1.


Expected changes in version 2.2 beta public:

- Compiler independent support for new division algorithm.


Jan 26, 2024