To a fairly good approximation the nth most common word in a large sample of English text occurs with frequency 1/n, as illustrated in the first picture below. This fact was first noticed around the end of the 1800s, and was attributed in 1949 by George Zipf to a general, though vague, Principle of Least Effort for human behavior. I suspect that in fact the law has a rather simple probabilistic origin. Consider generating a long piece of text by picking at random from k letters and a space. Now collect and rank all the "words" delimited by spaces that are formed. When k = 1, the nth most common word will have frequency c-n. But when k ≥ 2, it turns out that the nth most common word will have a frequency that approximates c/n. If all k letters have equal probabilities, there will be many words with equal frequency, so the distribution will contain steps, as in the second picture below. If the k letters have non-commensurate probabilities, then a smooth distribution is obtained, as in the third picture. If all letter probabilities are equal, then words will simply be ranked by length, with all km words of length m occurring with frequency pm. The normalization of probabilities then implies p = 1/(2k), and since the word at rank roughly km then has probability 1/(2k)m, Zipf's law follows.