At one point in this blog, I talked about how historical linguists determine where and when the Indo-European languages were likely to have begun. The use of such methods usually puts the Indo-European homeland as somewhere in the Black Sea steppes around 3,500 B.C. What I didn’t know until just about now, however, is that that turns out not to be the end of the story. In a recent paper in Science (Bouckaert et al., 2012), a group of researchers attempts to use Bayesian phylogeographic methods to find a Indo-European homeland. What are Bayesian phylogeographic methods, you may ask? As far as I understand, it starts out with cladistics methods used to infer phylogenies (i.e. family trees of biological reltaionships) from a list of (usually genetic) traits. And then putting that together with locational information to chart geographic, as well as temporal, changes. Beyond that, what I know is only that this method is often used to chart disease spread – phylogeographic analysis of samples of a virus coming from a specific outbreak, for example, can help determine the time and location of the outbreak. And now, this technology is being brought to bear on the outbreak of Indo-European. This would be all great and good, except that the homeland posited by this research is very much not the Black Sea steppes 5,500 years ago, but Anatolia, 9,000 years ago. By their model, Indo-European developed roughly contemporaneously with agriculture, and spread with agriculture for thousands of years before the steppe nomads showed up in the picture. Here is the New York Times’ Nicholas Wade on the two theories. Historical linguists by and large seem to be skeptical of the new findings, largely because the evidence for a proto-Indo-European origin for horse-words and other steppe technologies and phenomena is so strong. How could the agrarian Anatolians be spreading the word for “wheel” via the same processes as other proto-Indo-European words if the wheel itself hasn’t been invented yet? There is additional skepticism because not that many people in the field of historical linguistics understand how phylogeography works, and as a result the researchers’ tools remain somewhat opaque. An obvious critique is that language spread is less “tree-like” than genetic spread is. Languages in the same area tend to take on the same characteristics and even share the same vocabulary, despite not being closely related. Therefore, the choice of traits (realistically the word-set) used as input to the phylogeographic machinery becomes very crucial. What did the researchers use, and how did they control for potential non-tree-like developments? Find out by visiting their site with a layman’s introduction, an animated map of their proposed evolution of Indo-European, a response to critics, and a link to the paper itself. They note that their method did correctly predict the origin of the Romance languages to be Rome about 2,000 years ago. On the other hand, Razib Khan sees that it pegs Romani as the outgroup for the modern Indo-Aryan languages. One mistaken branching does not invalidate a phylogenetic method, of course. But the specific fact that it is Romani, associated with a nomadic population and a language with many loanwords, does suggest the method may have specific limitations.
In general, the bringing in of biological methods “back” to the study of linguistics is a topic that is very interesting, but fraught with disagreement. Don’t assume that something published in Science on the topic is incontrovertible (or, conversely, junk). For example, about two years ago, Quentin Atkinson, who is one of the researchers on the above paper, put out a blockbuster paper in Science (Atkinson, 2011) positing the reduction of phonemic diversity away from the African heartland to be analogous to the reduction in genetic diversity, and therefore proposed that all languages evolved out-of-Africa, and the reduction in phonemes is due to a series of founder effect-like losses in diversity. That paper generated a lot of press and even more controversy: what were the marks of “phonemic diversity” used and are they appropriate (Mark Liberman noted that tonal diversity seems to be being over-weighted and as a result places where tonal languages cluster may have their “phonemic diversity” over-estimated)? Are effects of the influence of neighbouring languages on each other being taken into account? And is there even any reason to think anything like the founder effect occurs for phonemes? As far as I can tell, the question is far from settled, but many of the criticisms of the Atkinson paper seem valid. Perhaps the only lame-ass conclusion that I can give that would be undisputed here is that the importation of genetics methods into linguistics is an exciting new phenomenon that holds a bunch of promise and new developments are being eagerly awaited.