Perseus
As any Classics major or other person with a penchant for translating ancient texts knows, Perseus, the digital library of ancient texts run by Tufts university, is invaluable. This is partly because its dictionary function does far more than any normal dictionary does: when you give it a specific word, it not only gives you the translation, but parses the word for you. In inflected language like Greek and Latin, this is amazingly useful. It’s difficult to find the verb “fero, tuli, latum” in an alphabetical dictionary when one encounters only the form “latum” in a text without knowing what verb it is from. However, even more amazingly, when Perseus has a text on file that you are translating, you don’t have to type in the word from that text you need to look up. You can simply visit Perseus’ online version of the text and click on each word you don’t know, which brings you to a page like this:
http://www.perseus.tufts.edu/hopper/morph?l=commode&la=la&can=commode0&prior=et&d=Perseus:text:2007.01.0040:section=1&i=1
Here I have clicked on the word ‘commode’ on the fifth line of this page (http://www.perseus.tufts.edu/hopper/text?doc=Perseus%3atext%3a2007.01.0040) of Cicero’s De Amicitia. In many cases, as here, a certain word form will have multiple, sometimes many, different possible meanings. In such cases, Perseus uses its ‘digital library’ function to help the user choose between meanings; it gives, as you can see, a percentage – a proposed probability, or a signal – for each possible meaning. This signal takes into account a variety of numbers: the fraction of the number of times this form has this particular meaning in all the times this particular form appears in the texts recorded in Perseus’ library, the type of word that precedes the word you click, and, most interestingly, user votes. The user can also see directly how many votes each form has received, next to the percentage. The more users vote on a particular word, the more weight is given to the user-vote portion of the signal Perseus presents.
This use of Perseus’ digital corpus shows an interesting awareness of network effects and information cascade. In most case, of course, only one meaning will work within a full translation, but sometimes – as in the case linked above – there are actually multiple possibilities and multiple possible decisions. The user votes alone would seem to cause a cascade: the first few users vote for a particular meaning, then more other users will assume that meaning is correct, use it, find it works within a translation and vote for it (without, considering human laziness, even trying the other possibility). This works well as a model of information cascade.
However, Perseus takes steps to prevent false information cascades from being as likely to occur. Firstly, it gives every user information beyond other users’ votes – a unique and reliable signal for every meaning, comparable to each other meanings’ signals – partly based on user votes but mostly based on other information. It even, if one clicks the ‘More Info’ button (located by the meaning with the highest percentage) tells you how the percentage was calculated, and thus what the percentage would be without taking into account other users’ votes. Secondly, the system weighs the user-vote portion of the signal differently depending on how many users have voted in all. If only a few users have voted, most of the signal will consist of information outside of user votes. This means that even the next few voters after the first few will still be relying largely on the other information: that is, there will be a larger base of voters at the beginning who vote using mostly information other than user votes. Thus later voters, even if they make decisions based on votes, will be more likely to choose correctly.
This system of representing user-votes in signals is, I think, a quite elegant attempt to mitigate some of the negative effects of information cascade while retaining many of the benefits of a large user base.