Saturday, December 27, 2014

The Kernel Density Randomness Metric

Apart from dyspoissonism, another powerful statistical analysis tool is what I call "kernel density". It measures the fraction of all possible masks which remain after a mask list is iterated through itself until every mask in the list is on a limit cycle (potentially, one of many such cycles). Kernel density assumes a mask list with Q block numbers and Q possible masks, as discussed here. We refer to this metric as "order-Q kernel density". Q is usually a power of 2, but this is not required. And remember that the mask count refers to masks which could occur, but do not necessarily do so. For example, the mask list might consist of the 19-bit outputs of a TRNG or PRNG, each of which is guaranteed to be on [0, (2^19)-1], but none of which are guaranteed to occur.

This mask list, by definition, maps a block number B to a mask M, i.e.

B -> M(B)

But we could take this a step further by treating M as just another block number:

B -> M(B) -> M(M(B))

We could iterate a third time, etc. At some point, because there are at most Q unique masks (and usually about QR, where R is the reflexive density constant), we would repeat a previous mask, whereupon the sequence would continue just as it had after its previous occurrence. We could iterate indefinitely, but would never escape this sequence. The period of the sequence could be as short as 1 (e.g. 5 -> 5...) or as long as Q. We call such a repeating sequence a "limit cycle" of the mask list.

How many 1-cycles, 2-cycles... etc. should we expect? In other words, what is the expected frequency distribution (histogram) of the limit cycle lengths? While there is surely a precise (and complicated) answer to this question, it would not likely constitute a useful randomness quality metric, on account of the wide variability among cycle lengths -- even assuming an ideal TRNG.

However, the density of the union of all limit cycles (said union being called the "kernel") with respect to Q possible masks, is informative. By the way, all limit cycles are disjoint because each mask can only point to one other mask when it is interpreted as a block number. So what I call the "kernel density", S, is then simply the number of unique masks in the kernel, divided by Q. Put another way, S is the probability that a random whole less than Q happens to be a mask which actually occurs and furthermore appears in the kernel.

What is the expected value, S0, of S under conditions of an ideal order-N TRNG? In order to answer this question, we need to look at the probability that some given mask M is a member of a limit cycle of any particular length.

First of all, what's the probability P1 that M belongs to a 1-cycle, e.g. M -> M...? This is obviously

P1 = (1 / Q)

because given Q theoretically possible choices of mask, only one of them equals M.

And what about the probability P2 that M belongs to a 2-cycle? That would be:

P2 = (1 / Q)(1 - (1 / Q))

because the second mask must equal M (probability (1 / Q)), but first must not equal M (probability (1 - (1 / Q))). Similarly, we can compute the probability P3 that M belongs to a 3-cycle:

P3 = (1 / Q)(1 - (1 / Q))(1 - (2 / Q))

By extension, the probability P[C] that M belongs to a C-cycle, where C is on [2, Q] is:

P[C] = (1 / Q)(Π(i = (1, (C - 1)), (1 - (i / Q))))

or, recursively:

P[C + 1] = P[C](1 - (C / Q))

Thus the overall probability S0 that M is part of any limit cycle -- equivalently, that it belongs to the kernel -- is given by:

S0 = (1 / Q)(1 + Σ(i = (2, Q), Π(j = (1, (i - 1)), (1 - (j / Q))))

because all P[i] are orthogonal and thus additive. S0 is, by definition, the expected kernel density.

First of all, the above formula for S0 is given in an intuitive form, but please don't compute it that way! Dyspoissometer computes it very efficiently using logarithmic summation and other optimizations. See dyspoissometer_kernel_density_expected_fast_get() and dyspoissometer_kernel_density_expected_slow_get() in the source code.

Now let's make some actual computations. Here is some rather obtuse MathKernel code for doing so, which adheres to my usual weird coding standards. By the way, you can use the LogGamma[] Mathematica function to do this more efficiently and accurately:
__________________________________________________________________________
KernelDensityExpectedGet[bitcount_]:=(intcount0=BitShiftLeft[1, bitcount]; s0=1.; product0=s0; intcountrecip0=s0/intcount0; For[i0=1, i0<intcount0, i0++, product0*=1-i0*intcountrecip0; s0prev0=s0; s0+=product0; If[s0==s0prev0, i0=intcount]]; s0*=intcountrecip0; Return[s0])
__________________________________________________________________________

So what is S0 for an order-32 mask list, for example?
__________________________________________________________________________
In[1]:= KernelDensityExpectedGet[bitcount_]:=(intcount0=BitShiftLeft[1, bitcount]; s0=1.; product0=s0; intcountrecip0=s0/intcount0; For[i0=1, i0<intcount0, i0++, product0*=1-i0*intcountrecip0; s0prev0=s0; s0+=product0; If[s0==s0prev0, i0=intcount]]; s0*=intcountrecip0; Return[s0])

In[2]:= KernelDensityExpectedGet[32]*2^32

Out[2]= 82136.9
__________________________________________________________________________
We multiplied S0 by (Q = (2 ^ 32)) in order to obtain the actual number of masks expected to appear in the kernel. The answer is approximately 82,137. Note that although this value approximates ((Q ^ 0.5) = 65,536), it's clearly larger than the LMICBR of 77,163. That's because kernel density isn't LMICBR, although the concepts are related.

In the limit that Q approaches infinity, I conjecture that:

(Q * S0) / (((πQ / 2)^0.5) - (1 / 3)) -> 1

which is based on an approximation I found here. In this case, the above approximation returns 82137.2 -- virtually identical to the Mathematica value. The (1 / 3) is most likely the first in an infinite series of rational values which are error correction terms to the approximation.

Now I suppose you're wondering how 
we can efficiently measure the kernel density of "any old" mask list with Q block numbers and Q possible masks. That's the topic of the next post.

No comments:

Post a Comment