Beyond Knuth’s notation for “Unimaginable Numbers” within computational number theory

Literature considers under the name unimaginable numbers any positive in-teger going beyond any physical application, with this being more of a vague description of what we are talking about rather than an actual mathematical deﬁnition (it is indeed used in many sources without a proper deﬁnition). This simply means that research in this topic must always consider shortened representations, usually involving recursion , to even being able to describe such numbers. One of the most known methodologies to conceive such numbers is using hyper-operations , that is a sequence of binary functions deﬁned recursively starting from the usual chain: addition - multiplication - exponentiation. The most important notations to represent such hyper-operations have been considered by Knuth, Goodstein, Ackermann and Conway as described in this work’s introduction. Within this work we will give an axiomatic setup for this topic, and then try to ﬁnd on one hand other ways to represent unimaginable numbers, as well as on the other hand applications to computer science, where the algorithmic nature of representations and the increased computation capabilities of


Introduction
Several methods and notations are been developed in the last century to work, or better to try to consider, very large numbers for which in this paper we propose the name of unimaginable numbers. One of the most known methodologies is the so-called Knuth up-arrow notation introduced by D.E. Knuth in 1976 (see [11]) and strictly linked to the concept of hyper-operation and Ackermann function (see [1], [16]). The idea of hyper-operation dates back to the early 1900s by A.A. Bennet (see [3]), and subsequently we refind it in a group of Hilbert's students as W. Ackermann and G. Sudan. But the widespread contemporary names like tetration, pentation, hexation, or in general hyper-n operation were introduced by R.L. Goodstein in 1947 (see [9]) and gained popularity through Rudy Rucker's book Infinity and the Mind [15], published in 1982. Knuth up-arrow is not the only notation used today for very large numbers; there are in fact many other ways to write hyper-operators, as we may recall among others: • square bracket notation, box notation and superscripts and subscripts notation (see [13] and [14]); • Nambiar's notation (see [16]).
Moreover we point out that there are also so enormous numbers that even Knuth's notation and the previous ones, are not sufficient to represent them.
For this purpose J.H. Conway introduced a more powerful notation based on recursivity, to write extremely large numbers. It is known as Conway's chained arrow notation (see for example [7]) and can be viewed as a generalization of Knuth's arrow notation: in fact, in the case of a lenght 2 sequence a → b → n, it is equivalent to a ↑ n b Knuth's notation. Similarly, the Bowers' operator, also called the Bowers' exploding array function (see [4]), is a more powerful numeral system proposed by J. Bowers and published on the web in 2002, which generalizes hyper-operators. The Steinhaus-Moser notation (see [18]) is another way to express by recursion very big numbers. It is in fact more intuitive (thus fitting well within educational purposes) in its definition than the hyper-operations, and for its recursion properties will be applied to find an effective bound for certain couples in Goodstein's theorem (see below).
A relevant link between unimaginable numbers and computer science is related with the so called arbitrary-precision arithmetic and blockchain tools, as one can use such huge numbers to handle machine-computed big data. This work arose indeed from a discussion between the authors (during preparation of "The First Symposium of the International Pythagorean Schoolda Pitagora a Schützenberger: numeri inimmaginabilîîî" 1 ) about the use of gross-one, a recent definition of an arithmetical infinity (see [17], [5], [6] and the references therein), in order to compute limits in a similar fashion to nonstandard analysis; this "infinite number" has the flaw of having still a slightly poor axiomatic definition behind it so that in most applications it becomes more convenient to just consider a very big number, in fact an "unimaginable" one (more precisely its factorial so that all "imaginable" numbers are its divisors, so to respect one of gross-one's fundamental properties).
We will start the paper by giving a complete axiomatic definition of hyperoperators, linking this to Knuth's and Goodstein's notations. We will define the notion of meta-algorithm in order to define precisely the idea behind "repeating" an operation. After that, we will define a graph-theory representation of numbers linked to Goodstein's theorem (see [10]), which has also a simple set-theory interpretation when considering base 2, called rooted tree representation, and we will determine in some cases an explicit recursive algorithm for the number of steps required to reach zero for the so called "Goodstein sequences", as well as an effective bound for this number using Knuth's notation. We will conclude this work by applying various methods, among others from continued fractions (see [8]), to compare unimaginable numberss.

Historical notes
The basic arithmetical operations are defined recursively starting from the successor operation. The exponentiation, for instance, is a repeated multiplication. Knuth and Goodstein (see [11] and [9]) have further extended this definition, so that for example the tetration is a repeated exponentiation.

Arrow function definition
The work from Knuth and Goodstein can be formalized by the following general arrow-function: We can add in the mix also the following cases (satisfying recurrence law 3. as well): This is a slightly modified version of the original one from Goodstein, which is related by the simple equality: and Knuth's notation is as well very similar, writing: The last one is a compact expression for A ↑ . . . ↑ B where A and B are separated by exactly k arrows.
One could also use the symbolˆinstead of each up-arrow, reobtaining the usual notation for exponentiation.
Important remark: after the normal multiplication, all the operations we have defined are no more commutative nor associative, and priority is to compute them all in order from right to left (right associativity).
Example 2.1. Let's compute the following tetration: which is a number with exactly 3638334640025 digits.

Example 2.2.
Let's compute the following pentation: which is for instance the number of characters which can be stored in a 2-byte system on a computer.

Remark 2.3 (Trivial towers).
The following equalities hold for any k ≥ 1:

Steinhaus-Moser notation
See [18] for the original definition.
Definition 2.4. Steinhaus-Moser notation uses geometrical shapes to express big numbers. A number surrounded by a shape will have the following meaning: Using a more functional notation, we will define (f n means we compose f with itself n times): • (n) := n n ; • (n) := n (n); • (n) := n (n); One could also use a regular pentagon instead of the circle and continue the sequence for any regular k-agon; we will denote the generalized Steinhaus-Moser notation using the recursive function: The number Mega is defined as , that is: where the last expression contains already too many triangles to be computed explicitly.
Example 2.6. Another important number expressed with this notation is the Megiston, defined as .

Meta-algorithms
All operations we have considered give an "algorithm" to compute a natural number; we may construct a "meta-algorithm" by considering a string where the instances of " k . . . " mean we should repeat the dotted part k times; for instance: 3 2 ↑ 5 means to construct the algorithm 2 ↑ 2 ↑ 2 ↑ 5, that is 2 2 2 5 . We write the meta-function "EXPAND" meaning the bracketed string should be expanded with the rule just mentioned. We can now define a "generalized arrow function" as: so for instance we have the previous "generalized tetration": ↑ (2, 3, 2, 5) = 2 2 2 5 . In general ↑ (A, B, k) =↑ (A, B, k, 1), so it is indeed a generalization of the previous definition.

Binary case
We consider the set T containing the following elements: , and vice-versa any element of T contains only elements of T without infinite descending chains.
This set has the following properties: • Any element t ∈ T can be associated to a rooted tree: one recursively builds the tree for each element of t, and then connects their roots to a new root for t itself. This tree is also unredundant, in the sense that different branches of the same node are distinct (from the fact that elements in a set are all different from each other). By this definition, the tree associated to the emptyset will be a root with no branches.
• It is defined a height function H : T → N as: which is well defined from the assumption on descending chains.
• There is an "algorithmic" bijection f : T ∼ = → N defined recursively as follows: Before going further we briefly prove bijectivity. Indeed, we must prove that H(B)). By the uniqueness of the binary expansion for natural numbers, f (A) and f (B) have the same non-zero digits, which correspond to elements a ∈ A, b ∈ B where f (a) and f (b) give the position of the digit. For each such couple we must have f (a) = f (b) and by the inductive assumption we deduce a = b, so that A and B must have the same elements QED.
Using this bijection we are authorized from now on to not distinguish between A and f (A). We define: The first one is obtained when A contains all possible elements t of height < k. Thus: The second one is instead obtained by the recursion Considering the recursive sequence: one can immediately prove by induction that M k = a k+1 − 1 and m k = a k , so that height is proven to be a non-decreasing function. Using Knuth's uparrow notation, we have m k = 2 ↑↑ (k − 1) so that every element of T is found in a specific interval depending on its height: The associated rooted tree is the following (we write on each node the integer corresponding to its branch): Remark 3.2. With the usual notation P(A) := {X ⊆ A}, we notice that for any k ≥ 1 hold the following facts: Summing up those results, we have that the tetration 2 ↑↑ (k − 1) = m k represents exactly the cardinality of the set: More generally, the "generalized tetration" gives the cardinality of the nested power set: #P k (A) =↑ (2, k − 1, #(A), 2).

Comparison
Comparing two elements A, B ∈ T is performed with the following rule: one recursively can compare elements of A∆B (symmetric difference), and put them in order; if its biggest element comes from A, then A is the bigger number, otherwise B is the bigger one.

Remark 3.3.
For this purpose, and other following purposes, we remind that (as we are talking about sets) the order in theory doesn't matter, but actually we should consider every set as being already ordered so that finding the biggest element becomes an easy task.

Successor
We want to compute s(A) for some A ∈ T . If A = M k for some k then one has to consider directly s(A) = m k+1 . Otherwise, let n A = A be the unique natural number such that: which is distinct from A precisely because A is not an M k . Then one just has to remove every h smaller than n A from A and insert instead the element n A .

Addition
The sum of A and B is obtained by joining their elements; if an element t is repeated twice, one performs a carry and inserts instead the element s(t), which could as well require another carry.

Multiplication
To multiply A and B one considers: which in usual representation would mean:
More formally, after fixing the base b, one considers the following type of strings: • EMPTY: an empty string representing 0; • SUM: any number of DIGIT strings (see below) separated by the usual "+" symbol and having different exponents, representing the sum of values of the DIGIT components; • MISC: an EMPTY or SUM string; • DIGIT: a digit 0 ≤ d < b followed by a MISC string representing some number s (called "exponent") into brackets, which has value d · b s .
The final string m has the MISC form, and is associated to a uniquely determined value in N (precisely the number represented by m). This kind of approach is typical of computer science definitions for metadata (see for example [2]).

Remark 3.4. We recall that again order doesn't matter in SUM strings, as that's the reason we keep using plus symbol as a separator, but for computational purposes one should always consider sums ordered by digits' exponents.
We also may consider again rooted trees, where now connections between nodes are labeled with a digit from 1 to b − 1.

Example 3.5. Using as "labels" the colors blue=1 and red=2, we have the following representation:
where the bracketed algorithm is: We notice that also in this case we can define the height of a graph, and that the sequences of minimum/maximum elements with a certain height can be found as well:

Goodstein's theorem
Goodstein's theorem (see [10]) has an interesting interpretation within the topic of rooted tree notation. We recall that Goodstein's theorem involves the function which, given a couple (b, A) of a base b ∈ N and a rooted tree in that base, can be interpreted as: where the tree A is reread in the new base b + 1 and then decreased by 1.
Goodstein's theorem says that iterating this function one definitely stops at the value 0 whatever is the first element to which it is applied, and even though the function increases dramatically for almost every element. The proof relies on substituting every basis with the ordinal ω, so that the values obtained by this iteration form a strictly decreasing succession of ordinals for which we know it must stop somewhere, and the only possibility is 0. The rooted tree representation makes clear why the function is decreasing, as any natural number involved in representation is less than ω in the theory of ordinals.
We also point out that reinterpreting the proof using rooted trees doesn't actually require ordinal theory: geometrical properties of rooted trees should be enough to prove the assert without even involving the base, and this could indeed be studied in a more detailed future work on the topic.
We conclude this section by calculating an effective bound for some Goodstein sequences: Theorem 3.6. Let b > 1 andb := b − 1. We denote by B k (b) (k < b) the number of steps required for the couple (b(ǩ) + . . . +b(1) +b(), b) to reach the stopping value −1. Then we have an explicit recursion to describe this function: where the latter exponent means one should repeatedly apply b times the function B k−1 . For example:

Corollary 3.6. If A is a tree in the base b > 2 with height H(A) ≤ 2, then
Goodstein's algorithm applied to the couple (A, b) reaches the stopping point (−1, B) when: where SM k is the generalized k-agon Steinhaus-Moser function (see definition 2.4) and the last inequality comes from the corollary 4.9 proved below.
We remark that this corollary tells us that B − b − 1 is an effective bound for the algorithm to reach 0.
Proof 3.6. The first equality comes from the fact that every step decreases the only digit by 1 while increasing the basis by the same amount; thus going from the digit b − 1 to −1 requires b steps, which increase the basis from b to 2b. The second one derives from the fact that every time the biggest digit decreases by 1, the other k − 1 digits come from the same problem where the basis is updated by applying the function B k−1 , and this has to be done b times.
To prove the corollary, it is known that it is enough to do it for A =b(b)+. . .+ b(1) +b() = m 2 , and we notice that in this case Proof 4.1. It is well known that x is between b a and the next approximant so that the assertion follows immediately.

Lemma 4.2. Given A < B ∈ N such that x = ln B
ln A is an irrational real number, the continued fraction approximants b a to x are such that: where ε := ln A b . Proof 4.2. By lemma 4.1 we have: and we conclude observing that e ±ε = A ± 1 b by definition of ε.

Undistinguishable numbers
See also the introduction to [12].
in the sense that in scientific notation they have the same expression considering only the first k or k + 1 significant digits of their decimal expansion.

Proof 4.3.
Two number whose ratio is bounded by the number 1 1−0.5·10 k+1 ≈ 1 + 0.5 · 10 −(k+1) ≈ exp(0.5 · 10 −(k+1) ) are sure to have the same scientific notation expression to the (k + 1)-th significant digit, possibly differing for the last one (including the possibility of a carry); in this case the (k + 1)-th digit must be the same (the difference between the two approximations is bigger than double the difference of the two numbers) and we will have the same approximation to the k-th digit instead. Now we can apply lemma 4.2, where by hypothesis ε < 0.5 · 10 −(k+1) so that the ratio A b and B a is bounded by e ε , i.e. the number we just talked about, and we know already that in this case the thesis holds. Being b > ln 2 · 2 · 10 7 ≈ 13862944, we know that 2 16785921 and 3 10590737 are 6-undistinguishable powers, and indeed both have the following expression in scientific notation: 5.3191952 . . . · 10 5053065 ≈ 5.31920 · 10 5053065 5.3191955 . . . · 10 5053065 ≈ 5.31920 · 10 5053065 that is, they give the same approximation to the 6-th digit (one of them actually approximate to 5.319196 to the 7-th digit, so we must take one digit less for the exact correspondence).

Comparing Knuth and Steinhaus-Moser notations
We will consider only positive integers when not specified otherwise. Moreover k will be a counter ranging from 0 to n.
Proof 4.5. The first inequality is straightforward, as we have by induction: so that n in k triangles is always ≥ n ↑↑ (k + 1).
Both inductions start from the case k = 1, for which all three quantities are trivially equal to n n (using the rules from remark 2.3). We prove more specifically that: The original estimate is then a tower one level higher but replacing all A + 1 with A, thus abundantly bigger. We proceed by induction, after checking that the case C = 1 is trivial. For the induction step we see immediately that: so the thesis follows from the following elementary inequality:
We point out that both inequalities when k = 0 become equalities (using the rules from remark 2.3).

Lemma 4.8.
When k ≥ 2 one has: Proof 4.8. We start by excluding the trivial cases A = 1 ∨ B = 1. We proceed by induction on k, remarking that lemma 4.6 gives the starting case k = 2, thus supposing that the assertion holds already for k − 1. We will use the abbreviation E := A ↑ k (B − 1).
We prove more specifically that: The original estimate is then a tower of ↑ k−1 -hyperoperations one level higher but replacing all A + 1 with A, thus abundantly bigger. We now proceed by induction on C, after checking that the case C = 1 is trivial. For the induction step we see immediately that: as expected.