Mathematically, How Many Different Tweets Could There Ever Be?

Mathematically, How Many Different Tweets Could There Ever Be?

There’s only so much you can say in 140 paltry characters. So how many possible different tweets are there that could ever be sent?

Randall Munroe has tackled that very question in today’s wonderful What If? post. First, a very crude approximation:

Tweets are 140 characters long. There are 26 letters in English-27 if you include spaces. Using that alphabet, there are 27140≈10200 possible strings. But Twitter doesn’t limit you to those characters, though. You have all of Unicode to play with, which has room for over a million different characters. The way Twitter counts Unicode characters is complicated, but the number of possible strings could be as high as 10800.

But that would create a jumble of horrible characters that don’t make any sense. So what about combinations of those characters that, say, make sense to English speakers? Turns out, that’s a very difficult question to answer:

Claude Shannon… had a clever method for measuring the information content of a language. [He] determined that the information content of typical written English was around 1.0 to 1.2 bits per letter. This means that a good compression algorithm should be able to compress ASCII English text — which is eight bits per letter — to about 1/8th of its original size. Indeed, if you use a good file compressor on a .txt ebook, that’s about what you’ll find.

If a piece of text contains n bits of information, in a sense it means that there are 2n different messages it can convey. There’s a bit of mathematical juggling here, but the bottom line is that it suggests there are on the order of about 2140×1.1≈2×1046 meaningfully different English tweets, rather than 10200 or 10800.

That is a staggering number, which would take take a person nearly 1047 seconds to read out loud. And if you want to get a grasp of how long that really is, read Munroe’s post. I guarantee you’ll be shocked. [What If?]