Behind the optimization of human language: what is the life cycle of words with identical consonants?
Is there a reason behind the structure of the words we use? A particularity seen across languages, and that cannot be explained by chance, is the relative fewness of words with identical consonants (i.e. two adjacent consonants separated by a vowel, like “dodeliner”, “beber” or “cookie”). Linguists from the University of Zurich and the NCCR Evolving Language investigated a new hypothesis that could explain this curious phenomenon.

To facilitate efficient communication, words in the world’s spoken languages must be easy for speakers to pronounce and for hearers to process and comprehend. However, certain sound patterns found in words pose problems for speakers and hearers. For instance, words containing pairs of identical consonants separated by a vowel are difficult to utter and process. Analyses showed that these structures are underrepresented in the world’s languages: while many languages allow such words (e.g., French “dodeliner” (to shake one’s head), Spanish “beber” (to drink) or English “cookie”), they are far less frequent than would be expected under chance.
Until now, little was known regarding the specific evolutionary forces responsible for this optimization, leading to an underrepresentation of identical consonants. Thanks to phylogenetic analysis of etymological databases, researchers from the University of Zurich were able to determine the mechanisms underlying this phenomenon in a recently published paper.
Life and death of identical consonants
It is clear that there are statistically fewer words containing identical consonants (IC) than there should be only considering chance. What could be the mechanism causing this? In the past, researchers put forward 3 potential mechanisms explaining this underrepresentation: (1) words with IC are less likely to appear (“a lower birth rate”); (2) a more frequent tendency for words to shift form and lose their IC (“a higher mutation rate”); and (3) a higher chance to fall out of use (“a higher loss rate”).
From a data set of cognate words – i.e. words that are etymologically linked to each other, but may not have the same meaning, like Latin “manducare” (to chew), Italian “mangiare” (to eat) and French “manger” (to eat) – from three language family trees, the researchers of the University of Zurich unraveled the evolution of related words. “The probabilistic model used in the paper is able to tell us the most likely history of a word form as it evolves down the family tree into its descendants in different languages, which makes it possible to estimate birth, mutation, and loss ratios,” explains Chundra Cathcart, senior researcher at the University of Zurich and first author of the study. The data set used includes a majority of modern languages, such as Finnish, Amharic, or Tagalog, but also a few extinct languages such as Akkadian and Classical Tibetan.
According to their results, the main mechanism that seems to explain the low frequency of words with identical consonants in the world’s languages is a significantly lower birth rate. “This means words containing identical consonants are less likely to enter into use in the world’s languages than words without them,” Cathcart clarifies. As for the mutation rate and the loss rate, at odds with prior beliefs, the results are more moderate or even contradictory. “Crucially, these results contradict a commonly held view that words with identical consonants are more likely to die out than those without,” he says. “Here, we don’t find any evidence for this view – once they come into existence, words with identical consonants are not more likely to die out than words without them.”
Considering the multiplicity of answers
Eager for more nuance in their results, the researchers explored another possible mechanism for the underrepresentation of identical consonants, this time focusing on basic meaning words. “Basic meanings are vocabulary items that are thought to be relatively more stable and more frequently used than others, and found in the so-called Swadesh 100 list,” explains Cathcart. An example of a basic meaning would be “eat”, as opposed to the non-basic “fishhook”. “During their lifetime, evolutionary forces seem to work to keep forms with identical consonants out of languages’ basic vocabularies,” he clarifies.
In order to better understand lexical competition and replacement of basic words, the researchers used a database composed of cognate-concept traits for basic words – i.e. etymologically related word forms found with the same meaning in different languages. For example, French and Italian share a cognate-concept trait in that a cognate form is used for the word “to eat” (French “manger” and Italian “mangiare”), excluding the non-cognate form Latin “edere”. In contrast, cognate traits express whether an etymologically related form is found in different languages, regardless of meaning. For instance, Latin “manducare” (to chew), would be part of the same cognate form as French “manger” (to eat), and Italian “mangiare” (to eat).
Comparing data sets of 5 language families, the researchers investigated the same questions on a different scale – Do word forms without IC enter basic vocabulary more than forms with IC? Are changes leading to IC-less words more frequent in basic vocabulary items? Do forms with IC fall out of use more frequently than those without in basic meaning words? Results from their analysis revealed differences in the roles of the mechanisms involved. Indeed, in an overwhelming majority of languages, words with identical consonants seemed more likely to lose their basic-meaning status, meaning the hypothesis of the “higher loss rate” is more likely in this case. Other hypotheses are not confirmed in all of the families analyzed. “These results encourage us to decompose linguistic evolution into multiple component parts,” Cathcart emphasizes.
Suboptimal structures and subverting limitations
The results obtained by Cathcart and his team provide a nuanced view of the different processes responsible for making languages optimal for their users, and highlight the limits of traditional theories. “There is a tendency in cognitive science to assume that if some feature of language is suboptimal for communication, language change as a whole will conspire to get rid of this feature,” the researcher observes. “Here, we show that though there is a significant bottleneck in terms of the creation of words containing identical consonants, as words evolve, there is not always a strong pressure to get rid of identical consonants (either by getting rid of identical consonants in a word, or by getting rid of the word).”
Currently, digitized etymological databases are not always suited for this kind of computational analysis. Hence, the study doesn’t encompass the totality of the world’s languages, which limits its universality. According to Cathcart, this situation may change in the next few years as new tools for data processing are developed.
Another question left to answer is why exactly identical consonants are suboptimal for human adults, especially considering they are widespread in child-directed communication. “There are already some psycholinguistic experiments that explore the difficulty of producing and processing sequences of identical consonants, including work by researchers from the NCCR Evolving Language which suggests that pronunciation of this pattern is difficult due to motor planning,” Cathcart states. He adds he is currently collaborating with the aforementioned group to determine whether difficulty in pronunciation of these sequences varies across different types of consonants.