Tuesday, December 16, 2014

Asymptotic Symmetric Security with Lambda-One Poisson Distributions

There is a critical question in the development of cryptographic systems and the subject of entropy in general: the computational complexity of invertability. Indeed, this question is at the core of the famous P-vs-NP question. Here we investigate the relationship between Poisson distributions and invertability.

To begin with, we can think of most symmetric ("shared-secret") cryptosystems as pseudorandom mask set generators, wherein a set of 2^N block numbers are mapped to equally many xor masks, each of which is to be xored to a given plaintext block in order to obtain a cipher text, and xored once again to invert the process. (Xor is thus trivially invertable. The inversion problem we will explore below is the problem of deriving the block number from a xor mask, given a partially known plaintext from which a mask fragment is easily obtained.) For their part, the block numbers increase by ones from zero to (2^N - 1). Each block number is a function of an initial value (IV) transmitted in the clear, a private symmetric (shared) key, and perhaps other information such as a subblock value relative to the base of an encrypted file. What's important in this definition is that, given a block number B, anyone can easily generate its corresponding mask M(B); further more, the next subblock would be encrypted with M(B+1); and so on. To put it simply, if an attacker is able to derive the block number corresponding to any subblock of an encrypted file, then he can trivially decrypt the entire file. To be clear: the base of a file is defined as having subblock zero, not block zero.

Realistically, small subsets of encrypted files are often known to an attacker. By xoring these plaintext fragments to the ciphertext, mask fragments may in turn be obtained. The question here is, given a "few" such fragments (or, as we may as well assume, so much as a single mask fragment long enough to be unique), how difficult would it be for the attacker to obtain the corresponding block number, thereby enabling decryption in polytime?

We can begin by considering mask set "invertability", which refers to the property of a function which permits it to be partially or fully inverted.  We might, for example, have the following (horribly weak) 3-bit encryption scheme, which maps a block number B to a xor mask M:

B -> M
000 -> 010
001 -> 111
010 -> 011
011 -> 101
100 -> 000
101 -> 100
110 -> 001
111 -> 110

So for instance, suppose we need to encrypt block 2 (010). Using this table, we find that we must xor the plaintext with 011. A peer with knowledge of the same table can decrypt by applying this same xor again. (Equivalently, we could map a 3-bit plaintext to a 3-bit ciphertext by directly translating the former via the above table; this is however insecure in practice due to likely statistical biases in the plaintext.)

Thus xoring twice undoes the encryption; in this sense, the encryption is invertible in a manner which would succeed regardless of how we might change the values in the M column. But again, as mentioned above, it is instead the inversion of the mask set which is informative as to the degree of security. In this case, all values of M can be inverted to unique values of B. For this reason, this encryption algo is not unlike AES, in the sense that it's (perfectly) invertible. Such algos are said to be "permutative" because the mask set (i.e. the set of all possible values of M) constitutes a permutation of {0, 1, 2... (Q - 1)}, where Q = (2 ^ N) and N is the number of bits in each block number as well as each mask. Although Q need not be a power of 2, we will assume so because this is always the case in practice, and it simplifies the analysis.

For years (and even largely today), such permutative encryption has often been considered ideal because it so optimally "fair", in the sense that no particular mask occurs more or less often than any other. As such, in the limit of infinite Q, the ciphertext would indeed appear to be noise from the perspective of any observer with much-less-than-Q computational resources. Indeed, the essence of good encryption is to render the impression of random noise despite assurance of invertability.

But there is a problem: given enough chances, true random ("real world") unbiased N-bit binary noise will repeat itself with certainty after Q masks, and probably after only O(Q^0.5) masks. In other words, the same M value will occur twice, or even more times, in the mask set. So in the above example, the following mask set would be slightly more credible:

B -> M
000 -> 010
001 -> 111
010 -> 011
011 -> 101
100 -> 000
101 -> 100
110 -> 001
111 -> 111

because 111 occurs twice (and 110 does not occur). With only 8 masks, the difference is trivial, but in the limit of large N, the difference is instrumental to the maintaining or destroying the illusion of true random noise. More importantly, these subtle distinctions can serve as canaries for weak mathematical structure which may not be obvious.

First, some definitions: the "frequency" F of a mask M is the number of times which it occurs in the entire mask set -- for example, 2 in the case of 111. The "population" P of the frequency is the number of M values having frequency F; in this case, we have (F = 0) has (P = 1) because only 110 is missing, (F = 1) has (P = 6), and (F = 2) has (P = 1). The population of all higher F values is zero.

What is more, there is a very specific expected distribution of P. (The expected value of P is simply one, because the number of possible masks is equal to the number of block numbers, given that Q is a power of 2 and we're dealing in bits.) In the specific case of mapping (2 ^ N) unique values of B to equally many but not necessarily unique values of M, we expect, in the limit of large Q, a lambda-one Poisson distribution (LOPD) of the fractional (as opposed to absolute) populations of such frequencies, as illustrated at WolframAlpha and described at Wikipedia.

To be clear, a Poisson distribution assumes the limit of infinite Q, in which P is described as a fraction of Q as opposed to a whole (nonnegative integer). (WolframAlpha stops at the P value for (F = 4) just by way of convenience.)

The "lambda-one" aspect refers to the fact that the expected value λ of this distribution is one, because the number of blocks and masks are equal, so each mask occurs once on average.

Mathematically, a (generic) Poisson distribution F(X) is given by:

F(X) = (λ ^ X) / ((e ^ λ)(X!))

where e is Euler's number and which is defined only for whole values of X. As required for a valid probability distribution, its infinite sum (equivalently in this case, its sum over whole values of X) is one, as in this example where (λ = 2.6).

Substituting (λ = 1) implies that the LOPD G(X) is given by:

G(X) = 1 / (e(X!))

As you can see here, the sum of G(X) over the wholes also converges to one, as expected. 

LOPDs have some elegant properties, above and beyond the properties of Poisson distributions in general:

1. The fractional populations of (F = 0) and (F = 1), are identical (because ((0!) = (1!) = 1)), namely, (1 / e). Thus both absolute populations approach (Q / e) in the limit of infinite Q.

2. The fraction of all possible masks which actually occur is asymptotically the reflexive density constant.

3. The P values for increasing values of F are monotonically nonincreasing, and in fact decrease inverse-factorially after (F = 1). (This rapid decline allows us to spot nonrandom distributions quite easily, which as mentioned above might imply encryption which is weaker than it would otherwise appear.)

Back to encryption functions. In light of the above, we can postulate that the ideal N-bit symmetric encryption algo is constructed thus (if only we have enough time to implement it):

1. We take Q samples from an ideal (unbiased) N-bit TRNG. We record each of these to a data table. (Yes, the size of this table is exponential in N; remember, we're talking about a theoretical ideal here. We must appeal to a notional "oracle" who can write this table to classical storage in tractable time for anyone who asks. Moreover, the table can only be written in the exact order it was received from the TRNG (sorted by block number); the oracle cannot do anything else.)

2. We publish this table to the world. (By implication, the oracle must be invoked each time someone needs a local instantiation thereof.)

3. Alice wants to send private data to Bob, so she meets him in the park and gives him an initial block value, B0. Likewise, Bob provides her with an independently chosen B1. B0 is thus the Alice-to-Bob key, and B1 is the Bob-to-Alice key.

4. Alice sends packets of ciphertext to Bob, the first of which encrypted using xor mask M(B0), the next using M(B0+1), etc.; Bob does the same, starting with M(B1). (Incrementation wraps from (Q-1) to zero.) They can thus send up to Q packets in each direction while ensuring that no useful statistical biases exist in the ciphertext (even assuming that a man-in-the-middle (MITM) knows the plaintext to which to xor them, thereby revealing the masks).

Now, here is how such an MITM could crack this encryption, where "crack" means learning the value of even a single bit of plaintext with any better than unbiased uncertainty. (Some people think that "cracking" means "finding the value of bits which were initially unknown". That's a superfluous requirement in the real world, where merely skewing the distribution of private bits (which are in superposition from the MITM perspective, and thus more like qubits than bits) can provide a competitive advantage.)

1. The MITM derives several masks by xoring several ciphertext blocks to known plaintexts, thereby revealing several sequential masks, M0, M1, etc. (For example, he might know that Alice always sends the same email header before transmitting the email body.) Eventually, he encounters a ciphertext block encrypted with unknown mask M(i). In order to reveal the hidden plaintext, he need only determine its block number, B(M(i)). By finding {M0, M1...} in the public mask table, he can discover B0, and thereby trivially decrypt the ciphertext block. Moreover, all subsequent blocks could then be trivially decrypted, given their offsets relative to the base of the ciphertext. Furthermore, because the oracle's table approximates an LOPD, most of the work involves searching for M0; there is a 50% chance that it occurs exactly once, so {M1, M2...} merely confirm the match.

2. But the MITM is in trouble: the mask table is sorted by block number -- not mask. Without any help from the oracle, he now has the intractable task of searching for the first known mask by examining one mask at a time from the map. This task has complexity O(Q). Thus in the limit of intractably large Q, the encryption appears to be secure because its inversion complexity is exponential in N even in the average case.

3. But the MITM has a somewhat faster alternative: he can buy as many storage systems as he can afford, and tell the oracle to fill each of them with copies of the mask map. He can then parallelize his search, checking K masks at a time for the same known mask. In the rare case that he finds a match, he can trivially compare the subsequent {M1, M2...} to determine whether he has actually cracked the encryption, as opposed to having found a disconnected match on M0. He can thus reduce the expected search duration by a factor of K, but this is still O(Q / K) and thus still exponential in N. (For large K, the light delay between the root node and each machine is O(K ^ (1 / 3)), due to the need to traverse a 3D machine array, so the acceleration factor is cubically smaller. Connecting the machines in a binary tree does not help because the search program must be sent to each target at least once.)

4. The MITM could also use a (nonexistent) quantum computer to perform the search in as little as O(Q ^ 0.5) time by use of Grover's algo, assuming that the mask map could be stored in quantum-readable classical memory in trivial time, courtesy of the oracle. He could accelerate this further by use of multiple quantum computers, but due the classical connections between them, this would be a waste of resources relative to the construction of a larger monolithic quantum computer.

Note that this encryption algo is even strong in the absurd context in which Alice and Bob exchange Q ciphertext blocks in each direction, and the attacker knows (Q - 1) of the corresponding plaintext blocks (but not their corresponding block numbers); the Qth plaintext block would remain secure in the exponential sense discussed above. But in this case of huge Q, some acceleration could be had by consolidating the (Q - 1) implicitly known masks into a binary tree with as many sorted leaves, which could be employed to find a given mask from the oracle's table in O(Q ^ (1 / 3)) time, or O(Q ^ (1 / 6)) with help from Grover. But constructing the tree would require a sweep of the oracle's table,  requiring O(Q ^ (1 / 3)) because at least the original instance thereof must reside in classical memory. It might be more efficient to store only (Q ^ 0.5) items into the tree, requiring only O(Q ^ 0.5) sequential reads from the oracle's table. But this is still exponential in Q.

So what does this all imply from a practical perspective? I offer the following by way of conjecture:

1. The task of inverting a single member of a mask set generated by an ideal (unbiased) TRNG is in EXPTIME. But even if the inversion search were to be conducted in random order, the time would not be reduced (and would in fact increase, due to poor pipelining); hence the task is also in NEXPTIME. Moreover, the search is always hard in the sense of requiring the above expected compute time; it would waste more time to search for an ultrarare (asymptotically nonexistent) "easy" search case than it would pay back, on average. (For example, it might be that B = M in every case just by (amazing) luck, which would enable trivial decryption, but this case has asymptotic probability zero, so it would not be worth testing for.)

2. The more closely a given pseudorandom mask map resembles one generated by an ideal TRNG, the harder the encryption it affords.

3. Inversely, the more extreme the divergence between the fractional frequency populations of a given mask set and an LOPD, the weaker the encryption which it affords. (Note in particular that (uniformally white) permutations suffer from such divergence.)

The first point is as intuitively as true as can be, if as most people expect, NP is not a subset of P. After all, what function could possibly be harder than a mapping created by sampling an ideal TRNG?

The 2nd point relates to AES: I once thought that AES was technically weak because it has one-to-one (perfect) invertability. So for example, with 128-bit blocks, it would fail to repeat itself for the first time at the point predicted for a TRNG -- roughly upon the (2 ^ 64)th block according to the birthday attack approximation, for example. (See also the exact closed-form expression for the LMICBR presented in section 2.3.1 over here.) So in theory this shortcoming could violate deniability. (That is, under some obscure circumstances with statistical knowledge of the plaintext, an MITM could asymptotically prove the existence of a ciphertext, as opposed to, say, an uninitialized storage module filled with random bits.) Nevertheless, privacy would be intact. So, in practice, this weakness would be insignificant.

However, I failed to notice the elephant in the room. It was only much later that I realized that perfect invertability is a canary (an early warning system) for a weak mathematical structure underlying a transitively weak cryptosystem -- especially in a (2 ^ N) modulo space where binary hierarchical structure can easily occur. After all, how can one create a one-to-one function executable in polytime without employing some deep underlying elegance? Elegance is an asset in the quest to understand mathematical objects; but in cryptography, it's an anathema to security, speed advantages notwithstanding: paradoxically, the goal of symmetric cryptography is to create simple, fast mask generators whose properties are nonetheless otherwise inelegant. AES met the first goal exceptionally well, as have numerous cryptosystems since, designed under the uniform-whiteness-is-best philosophy. The LOPD is an elegant archetype against which to evaluate the purity of inelegance, as it were. A metric of such purity, whose discovery arose out of this article itself, is dyspoissonism.

No comments:

Post a Comment