Skip to main content

Power Laws and Zipf’s Law in Quantitative Linguistics

A recurring pattern that is observed while quantifying popularity in many different domains is that of power laws. The “Networks, Crowds, and Markets” textbook, for instance, provides a few examples that illustrate the recurrence of power laws outside of the context of web pages. One is that the fraction of telephone numbers that receive k calls per day is roughly proportional to 1/k2. Similarly, the fraction of scientific papers that receive k citations is roughly proportional to 1/k3. A conceptual extension of power laws in these examples is the Zipf’s law. Similar to power laws, Zipf’s law is an empirical law that occurs in several domains. However, it was originally established in linguistics. In its original formulation, it states that given a large corpus with several words, the frequency of any word is inversely proportional to its rank in frequency (i.e. the most frequent word given the rank 1).

The paper “Empirical and Theoretical Bases of Zipf’s Law” by Ronald Wyllys tries to examine why Zipf’s law arises in at least most languages. It first explains Zipf’s law like I do in the earlier paragraph of this blog. Wyllys then states firstly that this law appears to be a somewhat puzzling phenomenon as Zipf himself did not establish a clear logical explanation. However, Wyllys deems Mandelbrot’s explanation as “intellectually much more satisfying than Zipf’s”. Mandelbrot’s explanation was that Zipf’s law is a correct approximation as it also accomplishes the task of minimizing communication costs. What Mandelbrot considers as communication cost for words is in terms of the letters that spell the words and the spaces between them. As the number of letters increases, the communication cost also increases. The task of minimizing costs in terms of phenomes gives a certain approximation from which the Zipf’s law itself follows. In his paper, Wyllys goes in more detail into other attempts to explain why the Zipf’s law arises. While the rich-get-richer rule explains why power laws arise for most cases, when it comes to the domain of language, it is interesting to see what other inherent structures language has affect and give rise to the Zipf’s law.

Paper can be found here:


Leave a Reply

Blogging Calendar

November 2017
« Oct   Dec »