# Jeremy Stein - Journal

« Herdsperson Wanted | Why I Like Ecclesiastes » |

# Future Random Number

OK, my fellow geeks. I need a hand here. (Non-geeks welcome also.)

I want to specify a random number that will not be known by anyone (including myself) until a given date. That way, I can publicly say that I will use this random number, and everyone will know that I have no control over it, and when I use the number, it will be easy to verify that I am using the number I said I would. Confused yet? Let me give an example:

My only idea so far is to say “the random number I will use will be the result of multiplying the closing stock value on 31-December-2004 of every stock on the NYSE, the result being truncated (floored) and modulus 1000”. That should give me a nice relatively random number from 0-999. The problem with this idea is that I don’t know an easy way to get this information. And there might be too many stocks on the exchange to make this practical. I’d like a number that is a little easier to verify.

To summarize, I need a number that will be:

1) unknown (non-existent) until a given future date that I can pick in advance

2) obviously not influenced by me

3) relatively random (a decent distribution over at least 100 possible values)

4) easy to verify

Any suggestions?

## 13 Comments

**Mark A. Hershberger replied:**

Random.org offers true random numbers to anyone on the internet.

That seems to fit your four criteria. Random.org is used by a lot of people.

Otherwise, I’d just suggest getting a Linux or *BSD box and reading /dev/urandom. I’m sure NT has something similar.

August 30th, 2004 at 1:00 pm. Permalink.

**Shannon replied:**

uhhhhh……

August 30th, 2004 at 4:32 pm. Permalink.

**Jeremy replied:**

Mark,

Thanks for the Random.org suggestion. I hadn’t heard of it until I started Googling on this question. The problem is that it didn’t seem to meet criteria 4. If I announced, “I used random.org and got the number 296813”, how can I prove that I really did that? Even if a screen shot was considered proof, how would anyone know that I didn’t just keep getting random numbers until I got one that I liked?

August 31st, 2004 at 8:02 am. Permalink.

**Matt Smith replied:**

This answer might be a little obvious and simple but…

How about using the results of a lotto drawing?

The numbers are freely available, and easy to obtain even after the specified date. They have a very public reputation for producing random numbers, reinforced by the independent auditing and the swirling ping-pong ball thing — almost hypnotic, don’t you think?

The “Pick 10” would be a good candidate. Twenty numbers are picked each day from the range of numbers between 1 and 80. So the only trick is to find out how to convert the twenty selected numbers into a specified range, like 0-999.

At first I was thinking about multiplying all those numbers together and doing a mod, but there are problems with floating point precision, among other things. So for the sake of keeping things “simple,” how about this…

Let’s say that we need a ten digit number, take the second digit from each of the last ten numbers from pick ten and string them together. So if the pick ten was…

1, 4, 8, 9, 11, 17, 21, 23, 24, 37, 4

**2**, 4**9**, 5**3**, 5**6**, 5**7**, 5**9**, 6**3**, 6**6**, 7**4**, 7**6**

…the number is 2,936,793,646.Problems:

By using the second digit, each digit is a fairly random choice between 0 and 9. The big down-side (that I can see) is the exclusion of repeating numbers. Once the number 52 is selected it cannot be selected again. This will decrease the likelihood of another number being selected that ends with a two. In fact, now that I think of it, if you need to get a number with a more than 7 digits (like in the example), you will never get a number where all of the digits are equal. So, 111,111,111 is not a possibility. Numbers within a smaller range will be less affected by this problem — 3 digit numbers for example. This problem could be avoided by only selecting a single digit each night — such as the second digit of the tenth number. But then it takes a few days to get enough digits.Ugh, even worse, since the numbers are sorted, the distribution of the second digit will not be random. In the past 12 months of Pick 10, here are the odds for the second digit.

So the best way to get a random number, from what is turning out to be the horrible idea of using lotto, is to grab the second digit from some of the middle numbers. The numbers on the end tend to be less random as a result of the sorting.

Wow, that was a poor idea. Let me see if I can’t come up with something a little more promising.

August 30th, 2004 at 1:46 pm. Permalink.

**Jeremy replied:**

Matt,

Excellent analysis of the problem. I thought of the Lotto yesterday as I was discussing this with my wife but, having never played the lotto, I didn’t know that they were ordered and didn’t re-use numbers. That does cause some problems. I suspect it would still be OK if you used the product-of-the-numbers and modulus solution, but I’m a little suspicious about the distribution if I pick a large modulus (I suspect it clusters).

(I think I could get around the problem with the really large number by using Java’s BigInteger class, or with a really good calculator, or with a pencil [grin])

Why do I need this number? Well, at the moment, I’m more interested in the answer to this puzzle than in the original reason I came up with the puzzle, but for historical purposes: I was thinking about a public programming contest where each program had to control a tank. The last tank standing wins. There would be some random elements (like initial placement and order, etc.). But I would want anyone to run the resulting contest program to see the results for themselves. So, I’d want this “randomness” to be consistent. I would want to have the random sequence used be controlled by a fixed seed. But I wouldn’t want anyone to think that I picked a random seed that yielded results to my liking. Thus, I’d need a number that is:

1) unknown (non-existent) until a given future date that I can pick in advance

2) obviously not influenced by me

3) relatively random

4) easy to verifyI don’t know whether I’ll ever write a programming contest, but the future well-known random number problem is interesting.

August 31st, 2004 at 9:11 am. Permalink.

**Jeremy replied:**

My co-worker, Aaron Claypoole, worked on this problem a little bit and found that the Nasdaq 100 is a pretty nice-size set of well-defined values. I was drawn to the volumes. The last digits of those volumes numbers are probably pretty random, and fairly easy to obtain.

August 31st, 2004 at 9:21 am. Permalink.

**Mark A. Hershberger replied:**

I’m gonna bet that anything that comes from the stock market numbers aren’t going to be truly random. They may be sufficiently random for your purposes, though, since you don’t want to depend upon trust.

A lot of research has gone into the numbers that the stock market generates, so look at that before you assume that they’re random enough.

Also, look at rfc3797 which talks about publicly verifiable sources of randomness. It contains this note: “(Experience has indicated that stock prices

and/or volumes are a poor source of unambiguous data due trading

suspensions, company mergers, delistings, splits, multiple markets,

etc.)”.Another that they mention that you could consider: daily balance in the US Treasury on a specified day.

August 31st, 2004 at 10:47 am. Permalink.

**Jeremy replied:**

Ah! Thanks for that pointer to rfc3797. That is very interesting. I

**knew**someone else must have worked on that problem before.August 31st, 2004 at 11:08 am. Permalink.

**ezra replied:**

Ah, very interesting. You want a number that’s derived from some deterministic process, given some objective, widely (?) available information, such that the expected distribution of the number is close to uniform, no matter how much information you have before that day.

How easy exactly does it need to be to verify this number? Do you want to do it by computer? Or would you prefer to do it manually? I would think, actually, that the closing data of the NYSE would be pretty easy to gather automatically; very easy if you have access to some for-pay web service, I bet, and moderately easy if you don’t.

If you want to do it manually, how about taking all the headlines on the front page of the NY Times for that day, and using some numerical coding to boil it down into a number? Assign each character a number and do some checksum on that string of numbers, taking the headlines in a specified order. Now, how big does the range of the variable have to be for your needs? Because you’d want to calculate how much English text you’d need to get the kind of distribution you want. If you just took the first word, the range of possible values is not likely to be very large; if you just took two words, it’s still not very large, since the second word is going to have a certain correlation with the first. There are something like 50,000 English words; amongst those the NY Times is likely to use a very small subset in a headline. But, supposing there were just 1000 free possibilities in each position, a few dozen words would be enough, I think.

The way you phrase the problem, it sounds like you need to have an exact value, i.e. the variable needs to take discrete values. Would you be willing to accept a continuous variable if you could assert that it was verifiable within some tolerance? This would open up a lot more possibilities for source material. I guess that doesn’t change the question very much. If you had a variable that might range from 0 to 10 and any two measurements of it were within +/- 0.005 of each other, then that’s enough to determine a precise value out of 1000 discrete possibilities, right?

You’ve also got me thinking about whether this needs to depend on social phenomena (stock exchange, newspaper headlines, etc.) or if you could drive it from something natural (weather, tides, numbers of leaves dropped by a certain tree in autumn). What community of others needs to be able to verify this number? Are they distributed throughout the world, or do they all live in the same place?

If you have the chance to build some kind of gizmo, you could develop a random variable by accumulating state based on some events in a particular place, for example, the time between footsteps at a particular street corner, or between moments when a certain light sensor is obscured because of someone passing in front of it. I’m assuming you don’t want to go to that much trouble!

If the community of verifiers is small-scale, you could do something interesting and adventurous; if it’s one person who you know, for example, you could say, “On April 22 next year, you and I will go for a hike and at some point we will stop and take one picture with a digital camera. The number is given by the last k bits of the MD5 checksum of the resulting TIFF file.” (you’d have to avoid lossy compression, whose results might be susceptible to influence).

Now you’ve got me intrigued as to why you want this. For establishing some cryptographic secret? For settling some contest?

Cheers,

EzraSeptember 9th, 2004 at 11:05 pm. Permalink.

**ezra replied:**

RFC 3797: Neat! I like it. I wonder if a simpler technique–one just as strong–could be devised.

September 9th, 2004 at 11:21 pm. Permalink.

**stefan replied:**

Since you mentioned the Dow and someone else mentioned NASDQ, why not use the Dow Jones Industrial average as of close Dec 31st?

November 11th, 2004 at 10:40 pm. Permalink.

**Jim replied:**

I’m a little late to the party here, but here in NYC, if you “play the numbers” with you’re local bookie, the daily number is generally the last three digits of the daily total handle (total amount bet for the day) at Aqueduct or Belmont (two local racetracks). It’s published in three local papers, as well as the Racing Form. This is obviously similar to using stock market info, but adds a little local color.

February 25th, 2005 at 4:01 pm. Permalink.

**Jeremy Stein replied:**

Thanks for the suggestion, Jim!

February 25th, 2005 at 4:16 pm. Permalink.

## Leave a Reply