Skip to main content



Lexicographical Distances in the Language Network

210-Languages

This article provides a visualization of how closely interconnected various languages are to each other.  This distance is defined to be the lexical distance, and it is calculated by taking the sum of all Brown-Holman-Wichmann distances between two languages using a collection of words classified as “stable word stems” in the article.  There are 210 different languages studied for this graph (which is further broken down into several more specific graphs), and from this calculation, several distinct subgroups of languages develop, with the Indo-European subgroup making up a little less than half of the languages studied.  This article goes a little more into the theory behind the creation of these types of lexicographical networks, likening the relationships between languages to the building of a two-dimensional matrix.

The visualization is a rather interesting application of graph theory, more specifically the notion of graph structure.  The first characteristic of note is the three-dimensional structure of the network in the first structure.  I find this a particularly useful way to envision graphs as it removes the more sprawling, sometimes convoluted nature of the language network.  Secondly, even though the edge lines are omitted, a reader can still determine the relative distances between language families via the clustering present.  Each of these clusters represent a group of languages that are similar enough but what I find interesting is how many subdivisions the Indo-European component has, as apparently they are fairly similar but not similar enough to be lumped into one, so it’s rather strange that these groups aren’t further parsed, like they are in the second article.

Comments

Leave a Reply

Blogging Calendar

September 2019
M T W T F S S
 1
2345678
9101112131415
16171819202122
23242526272829
30  

Archives