The Power-law Distribution in Tagging Systems and the Influence of Tagging Suggestions
Tagging is using descriptive words to define or label a resource. The tags form a power-law distribution because of two reasons: firstly, users are imitating each other’s decisions when they are presented suggested tags by the tagging system, which resembles the example in the lecture that music downloads produces a “rich get richer” effect on the songs; secondly, the users share the same background knowledge.
There are several models to explain the emergence of the power-law distribution in the tagging systems. For example, there is a classical model called The Polya Urn, which constructs an urn with balls of different colors. At each time, a ball is randomly chosen and then put back to the urn with an extra ball of the same color as the chosen ball. This model can be employed to describe the tagging process, because different tags are symbolized as different balls, and the replacement of the two same balls are symbolized as the users’ behavior of imitation. There is another model called Yule-Simon model. On the top of the Polya Urn, Yule-Simon model also takes addition of new tags into account. It introduces a parameter p to represent the probability of adding a new tag (this resembles pointing to a new node in the information network), and the probability of copying an existing tag will be (1-p) (this resembles pointing to the node that a node points to in the information network). Based on the Yule-Simon model, some researchers modified it by introducing a second parameter r that represent the speed of the decay of the memory kernel, because as time goes by, recently added tags have a greater probability of being imitated than the old tags.
It seems plausible that the tagging system forms a power-law distribution because of the tagging suggestions. However, there is an experiment on tagging that resulted in a counter-intuitive conclusion: without tagging suggestions, the tagging system can naturally form a power-law distribution; with tagging suggestions, the power-law distribution will be distorted. The experiment asked 222 participants to apply tags to 11 websites that appeal to the general public and have over 200 tags. There are two treatments: one condition is presenting 7 tagging suggestions to the users, and the other condition is no tagging suggestions. The researchers used del.icio.us as the tagging systems and employed its algorithm. By collecting the data and depicting them using Kolmogorow-Smirnov complexity test, the researchers discovered that the power law only holds when there are tagging suggestions, and the suggestions actually prevents the power-law distribution from emerging. To account for this phenomenon, the researchers explained that tagging suggestions do reinforce the frequency of tags being imitated and shorten the “tail”, but without tagging suggestions, the users can still choose the tags that would have been suggested by themselves. In their conclusion, they mentioned that “words in natural language naturally follow a power law”.
Therefore, tagging systems without suggestions cause the tags to converge into a power-law distribution, while the tagging systems with suggestions weaken the formation of a power-law distribution. The users’ behavior of “imitating tags”, or “applying highly frequent tags”, exist s under both conditions.
Resource: An Experimental Analysis of Suggestions in Collaborating Tagging
https://ieeexplore.ieee.org/document/5286089