Investigating a Claim About Joint Distributions

A while ago, a friend asked me if I could prove the following claim:

"If X is a random variable with an arbitrary distribution, there exists a random variable X' such that the joint distribution of [X,X'] is uniform." (Colbeck and Renner, "No Extension of Quantum Theory", Nature Communications, August 2, 2011)

I'm terrible at proving things. But inspired by Shalosh B. Ekhad, I decided to take a computational approach.

The Approach


Here's how it goes. Since there's no restriction on the type of random variables, I'll keep it simple and assume that we have two discrete random variables.

X has outcomes a, b, and c
X' has outcomes R, S, T, and U

When there are 3 values for X and 4 values for X', the number of joint outcomes is 3 times 4 = 12. In general, the total number of distinct joint outcomes will be the number of possible outcomes for X multiplied by the number of possible outcomes for X'. In the case I've picked, we know that at the end of the day, the probability of each of the outcomes above will be 1/12th -- i.e. [X,X'] is uniformly distributed.

Here are all the possible outcomes of [X,X']:

{{a,R},{a,S},{a,T},{a,U},{b,R},{b,S},{b,T},{b,U},{c,R},{c,S},{c,T},{c,U}}

I said I was going to keep it simple, didn't I? Here's what we're going to do.

Step 1: Assign X an arbitrary non-uniform discrete distribution. Incidentally, I fixed the probability of a at 0.2, the probability of b at 0.3, and that of c at 0.5.

Step 2: Simulate a large number of outcomes of X based on the probabilities fixed in step 1. This produces a list of outcomes such as

{b,b,c,b,a,c,a,c,c,c,b,a,b,a,a,c,a,c,a,b, ..., c}.

Step 3: Generate candidate probability distributions for X'. The approach is going to be to test each of these candidate distributions for X' and see if it results in a uniform distribution for [X,X'].

To generate a large number of probability distributions for a random variable X' that can take on any of 4 values, I used Mathematica (or as it's now called, the Wolfram Language).  For example, here's one of the X' probability distributions I generated: {0.09,0.15,0.19,0.57}. And there are about 10,000 of these that are randomly generated.

If you'd like to read the Mathematica files, you can download the PDF or the Wolfram notebook files.

The concludes the set up. We'll now test each of these candidate distributions and see if there's one that gives us a uniform probability for [X,X'].


The Investigation


We'll investigate the candidate distributions for X' by testing each one. To conduct a test, we pretend we have 2 urns. The first urn contains 20,000 tokens of a, b, and c distributed according to the probabilities we picked in step 1 above. The second urn contains 20,000 tokens of R, S, T, and U distributed according to the first candidate distribution for X' we generated. We then randomly select pairs of tokens from the urns. This gives us values for [X,X'] and we can check to see if [X,X'] is uniformly distributed. By measuring the Euclidean distance between the required uniform probability distribution and the distribution we obtained, we can get a sense of how far off the candidate distribution is from the distribution we need in order for [X,X'] to be uniform.

And we do this test for each of the 10,000 candidate distributions for X' that were randomly generated; so that's 10,000 tests in all. What did we find?


The Results


For each of the 10,000 candidate distributions we examined (represented by the numbers on the x axis of the chart below), we plotted the Euclidean distance of the distribution we got for [X,X'] from the required probability distribution for [X,X'] -- the distances are measured along the y axis of the chart below.


Not bad. The lowest Euclidean distance is around 0.1 -- it's actually 0.1094. What is the X' distribution that resulted in this low value? It's {0.26,0.27,0.24,0.23} which looks like a distribution for X' that's pretty close to uniform! Could this be a clue to how to set up the distribution of X' to get the uniform distribution for [X,X'] no matter what the initial distribution of X is?

What distribution of [X,X'] values did we get when the Euclidean distance was at a minimum? Here it is:

{0.0592, 0.0566, 0.0467, 0.0458, 0.0791, 0.0852, 0.0744, 0.0673, 0.1349, 0.1333, 0.1156, 0.1203}.

What we really needed if [X,X'] is going to be uniform is:

{0.0833333, 0.0833333, 0.0833333, 0.0833333, 0.0833333, 0.0833333, 0.0833333, 0.0833333, 0.0833333, 0.0833333, 0.0833333, 0.0833333}.

So even though the Euclidean distances are quite small, the difference between the joint distribution we got from the simulation and the actual uniform joint distribution we need is quite far apart.


Conclusion


After testing 10,000 different randomly generated distributions of X', we haven't been able to find one that gets us close to a uniform distribution for [X,X']. The best we can do is to keep the probabilities for X' uniform -- but even this results in a distribution for [X,X'] that is quite far from uniform. The evidence so far points to the claim being false. Of course, this is not a proof that the claim is false -- it was never meant to be. It is just a way to learn more about how simple joint distributions behave.

While the approach is illuminating, the conclusion might be wrong either because we've misinterpreted the claim or somehow messed up the experiments. The good news is that both these mistakes are quite easy to correct.

If you'd like to read the Mathematica files, you can download the PDF or the Wolfram notebook files.