Saturday, December 27, 2014

Measuring Kernel Density

As previously discussed, we can compute the expected kernel density S0 of a mask list of size Q which is usually but not necessarily a power of 2. But this is only useful if we can measure the actual kernel density S of an arbitrary such list.

To do this, let's revisit RandomMapGet[] and declare it at the "In[1]:=" prompt:
__________________________________________________________________________
In[1]:= RandomMapGet[bitcount_]:=(intcount0=BitShiftLeft[1, bitcount]; intmax0=intcount0-1; randommap0=Array[RandomInteger[intmax0]&, intcount0]; Return[randommap0])

In[2]:= bitcount=16;

In[3]:= randommap=RandomMapGet[bitcount];
__________________________________________________________________________

Now randommap contains (Q = (2 ^ 16)) 16-bit masks.

How large it its kernel? To answer this question, we need to discover which of all possible 16-bit masks come back to themselves, iteratively speaking. In other words, how many of them are on limit cycles?

We could in theory try every single mask and test whether or not it returns to itself in at most Q iterations. But that would take a long time.

A faster way to do this is to use double buffers, notionally map X and map Y. Begin by creating bitmap Z consisting of all zeroes. Next set map X to the input map, which is randommap in this case. Start of loop: Now, one index at a time, set Y[i] to X[X[i]] (assuming zero-based indexes, which is annoyingly not the case in Mathematica). After Y has been completely populated thus, do the reverse: one index at a time, set X[i] to Y[Y[i]]. Now copy Z to Z'. Then evaluate the "profile" of X: set bits in Z corresponding to masks present in X; if the mask is present one or more times, then the bit is one, else zero. Loop back to "start of loop" until (Z = Z'). Count the number of ones in Z and divide by Q; this is the kernel density S.

This could probably be optimized further, but it's fast enough for some useful minimodel analysis. Here's the code:
__________________________________________________________________________
KernelDensityGet[bitcount_, samplemap_]:=(intcount1=BitShiftLeft[1, bitcount]; iteratedmapa1=samplemap; iteratedmapb1=samplemap; bitmap1=BitShiftLeft[1, intcount1]; profilebitmap1=bitmap1; profilebitmapprev1=BitOr[bitmap1, 1]; While[profilebitmap1!=profilebitmapprev1, profilebitmapprev1=profilebitmap1; For[i1=1, i1<=intcount1, i1++, iteratedmapb1[[i1]]=iteratedmapa1[[iteratedmapa1[[i1]]+1]]]; For[i1=1, i1<=intcount1, i1++, iteratedmapa1[[i1]]=iteratedmapb1[[iteratedmapb1[[i1]]+1]]]; profilebitmap1=bitmap1; For[i1=1, i1<=intcount1, i1++, bit1=iteratedmapa1[[i1]]; If[BitGet[profilebitmap1, bit1]==0, profilebitmap1=BitSet[profilebitmap1, bit1]]]]; s1=(DigitCount[profilebitmap1, 2, 1]-1)/intcount1+0.; Return[s1])
__________________________________________________________________________

Now run it on the previously generated randommap (your answer will vary):
__________________________________________________________________________
In[4]:= s=KernelDensityGet[bitcount, randommap]

Out[4]= 0.0055542
__________________________________________________________________________

You can compute S0 from KernelDensityExpectedGet[] at the aforelinked post. It turns out to be 0.00489068 for 16 bits. Thus the actual kernel density in my one random test above exceeded expectation by about 14%. Unlike the dyspoissonism floor, which is a lower bound, actual kernel density could be greater or lesser than the expected value. I have no idea whether or not 14% is "good enough", but the important thing is that (1) by iterating this process many times, we could obtain a very accurate estimate of adherence to expected kernel density and (2) we have yet another "hard" randomness evaluation tool in our hands by which to measure the relative merits of various PRNGs. Furthermore, kernel skew provides a normalized method of comparing the kernel density among disparate mask lists.

The Kernel Density Randomness Metric

Apart from dyspoissonism, another powerful statistical analysis tool is what I call "kernel density". It measures the fraction of all possible masks which remain after a mask list is iterated through itself until every mask in the list is on a limit cycle (potentially, one of many such cycles). Kernel density assumes a mask list with Q block numbers and Q possible masks, as discussed here. We refer to this metric as "order-Q kernel density". Q is usually a power of 2, but this is not required. And remember that the mask count refers to masks which could occur, but do not necessarily do so. For example, the mask list might consist of the 19-bit outputs of a TRNG or PRNG, each of which is guaranteed to be on [0, (2^19)-1], but none of which are guaranteed to occur.

This mask list, by definition, maps a block number B to a mask M, i.e.

B -> M(B)

But we could take this a step further by treating M as just another block number:

B -> M(B) -> M(M(B))

We could iterate a third time, etc. At some point, because there are at most Q unique masks (and usually about QR, where R is the reflexive density constant), we would repeat a previous mask, whereupon the sequence would continue just as it had after its previous occurrence. We could iterate indefinitely, but would never escape this sequence. The period of the sequence could be as short as 1 (e.g. 5 -> 5...) or as long as Q. We call such a repeating sequence a "limit cycle" of the mask list.

How many 1-cycles, 2-cycles... etc. should we expect? In other words, what is the expected frequency distribution (histogram) of the limit cycle lengths? While there is surely a precise (and complicated) answer to this question, it would not likely constitute a useful randomness quality metric, on account of the wide variability among cycle lengths -- even assuming an ideal TRNG.

However, the density of the union of all limit cycles (said union being called the "kernel") with respect to Q possible masks, is informative. By the way, all limit cycles are disjoint because each mask can only point to one other mask when it is interpreted as a block number. So what I call the "kernel density", S, is then simply the number of unique masks in the kernel, divided by Q. Put another way, S is the probability that a random whole less than Q happens to be a mask which actually occurs and furthermore appears in the kernel.

What is the expected value, S0, of S under conditions of an ideal order-N TRNG? In order to answer this question, we need to look at the probability that some given mask M is a member of a limit cycle of any particular length.

First of all, what's the probability P1 that M belongs to a 1-cycle, e.g. M -> M...? This is obviously

P1 = (1 / Q)

because given Q theoretically possible choices of mask, only one of them equals M.

And what about the probability P2 that M belongs to a 2-cycle? That would be:

P2 = (1 / Q)(1 - (1 / Q))

because the second mask must equal M (probability (1 / Q)), but first must not equal M (probability (1 - (1 / Q))). Similarly, we can compute the probability P3 that M belongs to a 3-cycle:

P3 = (1 / Q)(1 - (1 / Q))(1 - (2 / Q))

By extension, the probability P[C] that M belongs to a C-cycle, where C is on [2, Q] is:

P[C] = (1 / Q)(Π(i = (1, (C - 1)), (1 - (i / Q))))

or, recursively:

P[C + 1] = P[C](1 - (C / Q))

Thus the overall probability S0 that M is part of any limit cycle -- equivalently, that it belongs to the kernel -- is given by:

S0 = (1 / Q)(1 + Σ(i = (2, Q), Π(j = (1, (i - 1)), (1 - (j / Q))))

because all P[i] are orthogonal and thus additive. S0 is, by definition, the expected kernel density.

First of all, the above formula for S0 is given in an intuitive form, but please don't compute it that way! Dyspoissometer computes it very efficiently using logarithmic summation and other optimizations. See dyspoissometer_kernel_density_expected_fast_get() and dyspoissometer_kernel_density_expected_slow_get() in the source code.

Now let's make some actual computations. Here is some rather obtuse MathKernel code for doing so, which adheres to my usual weird coding standards. By the way, you can use the LogGamma[] Mathematica function to do this more efficiently and accurately:
__________________________________________________________________________
KernelDensityExpectedGet[bitcount_]:=(intcount0=BitShiftLeft[1, bitcount]; s0=1.; product0=s0; intcountrecip0=s0/intcount0; For[i0=1, i0<intcount0, i0++, product0*=1-i0*intcountrecip0; s0prev0=s0; s0+=product0; If[s0==s0prev0, i0=intcount]]; s0*=intcountrecip0; Return[s0])
__________________________________________________________________________

So what is S0 for an order-32 mask list, for example?
__________________________________________________________________________
In[1]:= KernelDensityExpectedGet[bitcount_]:=(intcount0=BitShiftLeft[1, bitcount]; s0=1.; product0=s0; intcountrecip0=s0/intcount0; For[i0=1, i0<intcount0, i0++, product0*=1-i0*intcountrecip0; s0prev0=s0; s0+=product0; If[s0==s0prev0, i0=intcount]]; s0*=intcountrecip0; Return[s0])

In[2]:= KernelDensityExpectedGet[32]*2^32

Out[2]= 82136.9
__________________________________________________________________________
We multiplied S0 by (Q = (2 ^ 32)) in order to obtain the actual number of masks expected to appear in the kernel. The answer is approximately 82,137. Note that although this value approximates ((Q ^ 0.5) = 65,536), it's clearly larger than the LMICBR of 77,163. That's because kernel density isn't LMICBR, although the concepts are related.

In the limit that Q approaches infinity, I conjecture that:

(Q * S0) / (((πQ / 2)^0.5) - (1 / 3)) -> 1

which is based on an approximation I found here. In this case, the above approximation returns 82137.2 -- virtually identical to the Mathematica value. The (1 / 3) is most likely the first in an infinite series of rational values which are error correction terms to the approximation.

Now I suppose you're wondering how 
we can efficiently measure the kernel density of "any old" mask list with Q block numbers and Q possible masks. That's the topic of the next post.

Tuesday, December 16, 2014

Asymptotic Symmetric Security with Lambda-One Poisson Distributions

There is a critical question in the development of cryptographic systems and the subject of entropy in general: the computational complexity of invertability. Indeed, this question is at the core of the famous P-vs-NP question. Here we investigate the relationship between Poisson distributions and invertability.

To begin with, we can think of most symmetric ("shared-secret") cryptosystems as pseudorandom mask set generators, wherein a set of 2^N block numbers are mapped to equally many xor masks, each of which is to be xored to a given plaintext block in order to obtain a cipher text, and xored once again to invert the process. (Xor is thus trivially invertable. The inversion problem we will explore below is the problem of deriving the block number from a xor mask, given a partially known plaintext from which a mask fragment is easily obtained.) For their part, the block numbers increase by ones from zero to (2^N - 1). Each block number is a function of an initial value (IV) transmitted in the clear, a private symmetric (shared) key, and perhaps other information such as a subblock value relative to the base of an encrypted file. What's important in this definition is that, given a block number B, anyone can easily generate its corresponding mask M(B); further more, the next subblock would be encrypted with M(B+1); and so on. To put it simply, if an attacker is able to derive the block number corresponding to any subblock of an encrypted file, then he can trivially decrypt the entire file. To be clear: the base of a file is defined as having subblock zero, not block zero.

Realistically, small subsets of encrypted files are often known to an attacker. By xoring these plaintext fragments to the ciphertext, mask fragments may in turn be obtained. The question here is, given a "few" such fragments (or, as we may as well assume, so much as a single mask fragment long enough to be unique), how difficult would it be for the attacker to obtain the corresponding block number, thereby enabling decryption in polytime?

We can begin by considering mask set "invertability", which refers to the property of a function which permits it to be partially or fully inverted.  We might, for example, have the following (horribly weak) 3-bit encryption scheme, which maps a block number B to a xor mask M:

B -> M
000 -> 010
001 -> 111
010 -> 011
011 -> 101
100 -> 000
101 -> 100
110 -> 001
111 -> 110

So for instance, suppose we need to encrypt block 2 (010). Using this table, we find that we must xor the plaintext with 011. A peer with knowledge of the same table can decrypt by applying this same xor again. (Equivalently, we could map a 3-bit plaintext to a 3-bit ciphertext by directly translating the former via the above table; this is however insecure in practice due to likely statistical biases in the plaintext.)

Thus xoring twice undoes the encryption; in this sense, the encryption is invertible in a manner which would succeed regardless of how we might change the values in the M column. But again, as mentioned above, it is instead the inversion of the mask set which is informative as to the degree of security. In this case, all values of M can be inverted to unique values of B. For this reason, this encryption algo is not unlike AES, in the sense that it's (perfectly) invertible. Such algos are said to be "permutative" because the mask set (i.e. the set of all possible values of M) constitutes a permutation of {0, 1, 2... (Q - 1)}, where Q = (2 ^ N) and N is the number of bits in each block number as well as each mask. Although Q need not be a power of 2, we will assume so because this is always the case in practice, and it simplifies the analysis.

For years (and even largely today), such permutative encryption has often been considered ideal because it so optimally "fair", in the sense that no particular mask occurs more or less often than any other. As such, in the limit of infinite Q, the ciphertext would indeed appear to be noise from the perspective of any observer with much-less-than-Q computational resources. Indeed, the essence of good encryption is to render the impression of random noise despite assurance of invertability.

But there is a problem: given enough chances, true random ("real world") unbiased N-bit binary noise will repeat itself with certainty after Q masks, and probably after only O(Q^0.5) masks. In other words, the same M value will occur twice, or even more times, in the mask set. So in the above example, the following mask set would be slightly more credible:

B -> M
000 -> 010
001 -> 111
010 -> 011
011 -> 101
100 -> 000
101 -> 100
110 -> 001
111 -> 111

because 111 occurs twice (and 110 does not occur). With only 8 masks, the difference is trivial, but in the limit of large N, the difference is instrumental to the maintaining or destroying the illusion of true random noise. More importantly, these subtle distinctions can serve as canaries for weak mathematical structure which may not be obvious.

First, some definitions: the "frequency" F of a mask M is the number of times which it occurs in the entire mask set -- for example, 2 in the case of 111. The "population" P of the frequency is the number of M values having frequency F; in this case, we have (F = 0) has (P = 1) because only 110 is missing, (F = 1) has (P = 6), and (F = 2) has (P = 1). The population of all higher F values is zero.

What is more, there is a very specific expected distribution of P. (The expected value of P is simply one, because the number of possible masks is equal to the number of block numbers, given that Q is a power of 2 and we're dealing in bits.) In the specific case of mapping (2 ^ N) unique values of B to equally many but not necessarily unique values of M, we expect, in the limit of large Q, a lambda-one Poisson distribution (LOPD) of the fractional (as opposed to absolute) populations of such frequencies, as illustrated at WolframAlpha and described at Wikipedia.

To be clear, a Poisson distribution assumes the limit of infinite Q, in which P is described as a fraction of Q as opposed to a whole (nonnegative integer). (WolframAlpha stops at the P value for (F = 4) just by way of convenience.)

The "lambda-one" aspect refers to the fact that the expected value λ of this distribution is one, because the number of blocks and masks are equal, so each mask occurs once on average.

Mathematically, a (generic) Poisson distribution F(X) is given by:

F(X) = (λ ^ X) / ((e ^ λ)(X!))

where e is Euler's number and which is defined only for whole values of X. As required for a valid probability distribution, its infinite sum (equivalently in this case, its sum over whole values of X) is one, as in this example where (λ = 2.6).

Substituting (λ = 1) implies that the LOPD G(X) is given by:

G(X) = 1 / (e(X!))

As you can see here, the sum of G(X) over the wholes also converges to one, as expected. 

LOPDs have some elegant properties, above and beyond the properties of Poisson distributions in general:

1. The fractional populations of (F = 0) and (F = 1), are identical (because ((0!) = (1!) = 1)), namely, (1 / e). Thus both absolute populations approach (Q / e) in the limit of infinite Q.

2. The fraction of all possible masks which actually occur is asymptotically the reflexive density constant.

3. The P values for increasing values of F are monotonically nonincreasing, and in fact decrease inverse-factorially after (F = 1). (This rapid decline allows us to spot nonrandom distributions quite easily, which as mentioned above might imply encryption which is weaker than it would otherwise appear.)

Back to encryption functions. In light of the above, we can postulate that the ideal N-bit symmetric encryption algo is constructed thus (if only we have enough time to implement it):

1. We take Q samples from an ideal (unbiased) N-bit TRNG. We record each of these to a data table. (Yes, the size of this table is exponential in N; remember, we're talking about a theoretical ideal here. We must appeal to a notional "oracle" who can write this table to classical storage in tractable time for anyone who asks. Moreover, the table can only be written in the exact order it was received from the TRNG (sorted by block number); the oracle cannot do anything else.)

2. We publish this table to the world. (By implication, the oracle must be invoked each time someone needs a local instantiation thereof.)

3. Alice wants to send private data to Bob, so she meets him in the park and gives him an initial block value, B0. Likewise, Bob provides her with an independently chosen B1. B0 is thus the Alice-to-Bob key, and B1 is the Bob-to-Alice key.

4. Alice sends packets of ciphertext to Bob, the first of which encrypted using xor mask M(B0), the next using M(B0+1), etc.; Bob does the same, starting with M(B1). (Incrementation wraps from (Q-1) to zero.) They can thus send up to Q packets in each direction while ensuring that no useful statistical biases exist in the ciphertext (even assuming that a man-in-the-middle (MITM) knows the plaintext to which to xor them, thereby revealing the masks).

Now, here is how such an MITM could crack this encryption, where "crack" means learning the value of even a single bit of plaintext with any better than unbiased uncertainty. (Some people think that "cracking" means "finding the value of bits which were initially unknown". That's a superfluous requirement in the real world, where merely skewing the distribution of private bits (which are in superposition from the MITM perspective, and thus more like qubits than bits) can provide a competitive advantage.)

1. The MITM derives several masks by xoring several ciphertext blocks to known plaintexts, thereby revealing several sequential masks, M0, M1, etc. (For example, he might know that Alice always sends the same email header before transmitting the email body.) Eventually, he encounters a ciphertext block encrypted with unknown mask M(i). In order to reveal the hidden plaintext, he need only determine its block number, B(M(i)). By finding {M0, M1...} in the public mask table, he can discover B0, and thereby trivially decrypt the ciphertext block. Moreover, all subsequent blocks could then be trivially decrypted, given their offsets relative to the base of the ciphertext. Furthermore, because the oracle's table approximates an LOPD, most of the work involves searching for M0; there is a 50% chance that it occurs exactly once, so {M1, M2...} merely confirm the match.

2. But the MITM is in trouble: the mask table is sorted by block number -- not mask. Without any help from the oracle, he now has the intractable task of searching for the first known mask by examining one mask at a time from the map. This task has complexity O(Q). Thus in the limit of intractably large Q, the encryption appears to be secure because its inversion complexity is exponential in N even in the average case.

3. But the MITM has a somewhat faster alternative: he can buy as many storage systems as he can afford, and tell the oracle to fill each of them with copies of the mask map. He can then parallelize his search, checking K masks at a time for the same known mask. In the rare case that he finds a match, he can trivially compare the subsequent {M1, M2...} to determine whether he has actually cracked the encryption, as opposed to having found a disconnected match on M0. He can thus reduce the expected search duration by a factor of K, but this is still O(Q / K) and thus still exponential in N. (For large K, the light delay between the root node and each machine is O(K ^ (1 / 3)), due to the need to traverse a 3D machine array, so the acceleration factor is cubically smaller. Connecting the machines in a binary tree does not help because the search program must be sent to each target at least once.)

4. The MITM could also use a (nonexistent) quantum computer to perform the search in as little as O(Q ^ 0.5) time by use of Grover's algo, assuming that the mask map could be stored in quantum-readable classical memory in trivial time, courtesy of the oracle. He could accelerate this further by use of multiple quantum computers, but due the classical connections between them, this would be a waste of resources relative to the construction of a larger monolithic quantum computer.

Note that this encryption algo is even strong in the absurd context in which Alice and Bob exchange Q ciphertext blocks in each direction, and the attacker knows (Q - 1) of the corresponding plaintext blocks (but not their corresponding block numbers); the Qth plaintext block would remain secure in the exponential sense discussed above. But in this case of huge Q, some acceleration could be had by consolidating the (Q - 1) implicitly known masks into a binary tree with as many sorted leaves, which could be employed to find a given mask from the oracle's table in O(Q ^ (1 / 3)) time, or O(Q ^ (1 / 6)) with help from Grover. But constructing the tree would require a sweep of the oracle's table,  requiring O(Q ^ (1 / 3)) because at least the original instance thereof must reside in classical memory. It might be more efficient to store only (Q ^ 0.5) items into the tree, requiring only O(Q ^ 0.5) sequential reads from the oracle's table. But this is still exponential in Q.

So what does this all imply from a practical perspective? I offer the following by way of conjecture:

1. The task of inverting a single member of a mask set generated by an ideal (unbiased) TRNG is in EXPTIME. But even if the inversion search were to be conducted in random order, the time would not be reduced (and would in fact increase, due to poor pipelining); hence the task is also in NEXPTIME. Moreover, the search is always hard in the sense of requiring the above expected compute time; it would waste more time to search for an ultrarare (asymptotically nonexistent) "easy" search case than it would pay back, on average. (For example, it might be that B = M in every case just by (amazing) luck, which would enable trivial decryption, but this case has asymptotic probability zero, so it would not be worth testing for.)

2. The more closely a given pseudorandom mask map resembles one generated by an ideal TRNG, the harder the encryption it affords.

3. Inversely, the more extreme the divergence between the fractional frequency populations of a given mask set and an LOPD, the weaker the encryption which it affords. (Note in particular that (uniformally white) permutations suffer from such divergence.)

The first point is as intuitively as true as can be, if as most people expect, NP is not a subset of P. After all, what function could possibly be harder than a mapping created by sampling an ideal TRNG?

The 2nd point relates to AES: I once thought that AES was technically weak because it has one-to-one (perfect) invertability. So for example, with 128-bit blocks, it would fail to repeat itself for the first time at the point predicted for a TRNG -- roughly upon the (2 ^ 64)th block according to the birthday attack approximation, for example. (See also the exact closed-form expression for the LMICBR presented in section 2.3.1 over here.) So in theory this shortcoming could violate deniability. (That is, under some obscure circumstances with statistical knowledge of the plaintext, an MITM could asymptotically prove the existence of a ciphertext, as opposed to, say, an uninitialized storage module filled with random bits.) Nevertheless, privacy would be intact. So, in practice, this weakness would be insignificant.

However, I failed to notice the elephant in the room. It was only much later that I realized that perfect invertability is a canary (an early warning system) for a weak mathematical structure underlying a transitively weak cryptosystem -- especially in a (2 ^ N) modulo space where binary hierarchical structure can easily occur. After all, how can one create a one-to-one function executable in polytime without employing some deep underlying elegance? Elegance is an asset in the quest to understand mathematical objects; but in cryptography, it's an anathema to security, speed advantages notwithstanding: paradoxically, the goal of symmetric cryptography is to create simple, fast mask generators whose properties are nonetheless otherwise inelegant. AES met the first goal exceptionally well, as have numerous cryptosystems since, designed under the uniform-whiteness-is-best philosophy. The LOPD is an elegant archetype against which to evaluate the purity of inelegance, as it were. A metric of such purity, whose discovery arose out of this article itself, is dyspoissonism.