Skip to content

Dataset Index

The NCCR Evolving Language offers the following datasets as open source downloads:


Su, Y., MacGregor, L. J., Olasagasti, I., & Giraud, A.-L. (2023). A deep hierarchy of predictions enables online meaning extraction in a computational model of human speech comprehension. Retrieved from
Lancheros, M., Friedrichs, D., & Laganaro, M. (2023). What Do Differences between Alternating and Sequential Diadochokinetic Tasks Tell Us about the Development of Oromotor Skills? An Insight from Childhood to Adulthood. Retrieved from
Lorenz, C., Hao, X., Tomka, T., Rüttimann, L., & Hahnloser, R. H. R. (2023). Interactive extraction of diverse vocal units from a planar embedding without the need for prior sound segmentation. Retrieved from
Cathcart, C. (2023). The evolution of similarity avoidance: a phylogenetic approach to phonotactic change. Retrieved from
Sarkar, E., & -Doss, M. M. (2023). Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers? Retrieved from
An, A., Jiang, C., A. Rodriguez, M., Nastase, V., & Merlo, P. (2023). BLM-AgrF: A New French Benchmark to Investigate Generalization of Agreement in Neural Networks. Retrieved from
Dalboni da Rocha, J. L., Kepinska, O., Schneider, P., Benner, J., Degano, G., Schneider, L., & Golestani, N. (2023). Multivariate Concavity Amplitude Index (MCAI) for characterizing Heschl’s gyrus shape. Retrieved from
Arango-Isaza, E., Capodiferro, M. R., Aninao, M. J., Babiker, H., Aeschbacher, S., Achilli, A., … Barbieri, C. (2023). The genetic history of the Southern Andes from present-day Mapuche ancestry. Retrieved from
Watson, S. K., Mine, J. G., O’Neill, L. G., Mueller, J. L., Russell, A. F., & Townsend, S. W. (2023). Cognitive constraints on vocal combinatoriality in a social bird. Retrieved from
Isasi-Isasmendi, A., Andrews, C., Flecken, M., Laka, I., Daum, M. M., Meyer, M., … Sauppe, S. (2023). The Agent Preference in Visual Event Apprehension. Retrieved from
Di Pietro, S. V., Karipidis, I. I., Pleisch, G., & Brem, S. (2023). Neurodevelopmental trajectories of letter and speech sound processing from preschool to the end of elementary school.
Brügger, R. K., Willems, E. P., & Burkart, J. M. (2023). Looking out for each other: coordination and turn taking in common marmoset vigilance.
Haugg, A., Frei, N., Menghini, M., Stutz, F., Steinegger, S., Röthlisberger, M., & Brem, S. (2023). Self-regulation of visual word form area activation with real-time fMRI neurofeedback. Retrieved from
Rychen, J., Eckerle, A., Liu, X., & Schucker, S. (2023). Underwater sounds, including killer whale and humpback whale vocalizations, recorded in northern Norway in January 2023.
Henderson, J., & Fehr, F. J. (2023). A VAE for Transformers with Nonparametric Variational Information Bottleneck. Retrieved from
Rychen, J., Paitz, P., Edme, P., Smolinski, K., Brackenhoff, J., & Fichtner, A. (2023). Test experiments with distributed acoustic sensing and hydrophone arrays for locating underwater sounds.
Leroux, M., Schel, A. M., Wilke, C., Chandia, B., Zuberbühler, K., Slocombe, K. E., & Townsend, S. W. (2023). Call combinations and compositional processing in wild chimpanzees. Retrieved from


Bittar, A., & Garner, P. N. (2022). A surrogate gradient spiking baseline for speech command recognition. Retrieved from
Gu, N., Ash, E., & Hahnloser, R. H. R. (2022). MemSum: Extractive Summarization of Long Documents Using Multi-Step Episodic Markov Decision Processes. Retrieved from
Gao, Y., Gu, N., Lam, J., & Hahnloser, R. H. R. (2022). Do Discourse Indicators Reflect the Main Arguments in Scientific Papers? Retrieved from
Bittar, A., & Garner, P. N. (2022). Bayesian Recurrent Units and the Forward-Backward Algorithm. Retrieved from
Demartsev, V., Manser, M. B., & Tattersall, G. J. (2022). Vocalization-associated respiration patterns: thermography-based monitoring and detection of preparation for calling.
Cathcart, C., Herce, B., & Bickel, B. (2022). Decoupling Speed of Change and Long-Term Preference in Language Evolution: Insights From Romance Verb Stem Alternations. Retrieved from
Bolton, T. A. W., Ville, D. V. D., Amico, E., Preti, M. G., & Liégeois, R. (2022). The arrow-of-time in neuroimaging time series identifies causal triggers of brain function. Retrieved from
Cathcart, C. A. (2022). Dialectal Layers in West Iranian: A Hierarchical Dirichlet Process Approach to Linguistic Relationships1. Retrieved from
Sarkar, E., Prasad, R., & Doss, M. M. (2022). Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering. Retrieved from
Rochant, N., Allassonnière-Tang, M., & Cathcart, C. (2022). The evolutionary trends of noun class systems in Atlantic languages. Retrieved from
Watson, S. K., Lambeth, S. P., & Schapiro, S. J. (2022). Innovative multi-material tool use in the pant-hoot display of a chimpanzee. Retrieved from
Griffa, A., Amico, E., Liégeois, R., Van De Ville, D., & Preti, M. G. (2022). Brain structure-function coupling provides signatures for task decoding and individual fingerprinting. Retrieved from
Rossano, F., Terwilliger, J., Bangerter, A., Genty, E., Heesen, R., & Zuberbühler, K. (2022). How 2- and 4-year-old children coordinate social interactions with peers.
Gu, N., Gao, Y., & Hahnloser, R. H. R. (2022). Local Citation Recommendation with Hierarchical-Attention Text Encoder and SciBERT-Based Reranking. Retrieved from
Kaiping, G. A., Steiger, M. S., & Chousou-Polydouri, N. (2022). Lexedata: A toolbox to edit CLDF lexical datasets. Retrieved from
Proix, T., Delgado Saa, J., Christen, A., Martin, S., Pasley, B. N., Knight, R. T., … Giraud, A.-L. (2022). Imagined speech can be decoded from low- and cross-frequency intracranial EEG features.
Mearing, A. S., Burkart, J. M., Dunn, J., Street, S. E., & Koops, K. (2022). The evolutionary drivers of primate scleral coloration.
Wermelinger, S., Moersdorf, L., Ammann, S., & Daum, M. M. (2022). Exploring the role of COVID-19 pandemic-related changes in social interactions on preschoolers’ emotion labeling. Retrieved from
Miani, A., Hills, T., & Bangerter, A. (2022). Interconnectedness and (in)coherence as a signature of conspiracy worldviews. Retrieved from
Mansfield, J., Saldana, C., Hurst, P., Nordlinger, R., Stoll, S., Bickel, B., & Perfors, A. (2022). Category Clustering and Morphological Learning. Retrieved from
Neureiter, N., Ranacher, P., Efrat-Kowalsky, N., Kaiping, G. A., Weibel, R., Widmer, P., & Bouckaert, R. R. (2022). Detecting contact in language trees: a Bayesian phylogenetic model with horizontal transfer.
Lester, N. A., Moran, S., Küntay, A. C., Allen, S. E. M., Pfeiler, B., & Stoll, S. (2022). Detecting structured repetition in child-surrounding speech: Evidence from maximally diverse languages. Retrieved from
Bickel, B., Nichols, J., Zakharko, T., Witzlack-Makarevich, A., Hildebrandt, K., Rießler, M., … Lowe, J. B. (2022). The AUTOTYP database.
Ger, E., Küntay, A. C., Göksun, T., Stoll, S., & Daum, M. M. (2022). Do typological differences in the expression of causality influence preschool children’s causal event construal? Retrieved from
Jorgewich-Cohen, G., Townsend, S. W., Padovese, L. R., Klein, N., Praschag, P., Ferrara, C. R., … Sánchez-Villagra, M. R. (2022). Common evolutionary origin of acoustic communication in choanate vertebrates. Retrieved from
Sotiropoulos, A. G., Arango-Isaza, E., Ban, T., Barbieri, C., Bourras, S., Cowger, C., … Wicker, T. (2022). Global genomic analyses of wheat powdery mildew reveal association of pathogen spread with historical human migration and trade. Retrieved from
Bouchard, A., & Zuberbühler, K. (2022). An intentional cohesion call in male chimpanzees of Budongo Forest. Retrieved from
de Vevey, M., Bouchard, A., Soldati, A., & Zuberbühler, K. (2022). Thermal imaging reveals audience-dependent effects during cooperation and competition in wild chimpanzees.
Fröhlich, M., & van Schaik, C. P. (2022). Social tolerance and interactional opportunities as drivers of gestural redoings in orang-utans. Retrieved from
Ger, E., You, G., Küntay, A. C., Göksun, T., Stoll, S., & Daum, M. M. (2022). Gradual Route to Productivity: Evidence from Turkish Morphological Causatives. Retrieved from
Kaiping, G. A., & Neureiter, N. (2022). Clocks with bursts: Phylogenetic inference of schismogenesis in language evolution.
Spiess, S., Mylne, H. K., Engesser, S., Mine, J. G., O’Neill, L. G., Russell, A. F., & Townsend, S. W. (2022). Syntax-like Structures in Maternal Contact Calls of Chestnut-Crowned Babblers (Pomatostomus ruficeps).
Egurtzegi, A., Blasi, D. E., Bornkessel-Schlesewsky, I., Laka, I., Meyer, M., Bickel, B., & Sauppe, S. (2022). Cross-linguistic differences in case marking shape neural power dynamics and gaze behavior during sentence planning. Retrieved from
Wilke, C., Lahiff, N. J., Sabbi, K. H., Watts, D. P., Townsend, S. W., & Slocombe, K. E. (2022). Declarative referential gesturing in a wild chimpanzee (Pan troglodytes). Retrieved from
Engesser, S., & Manser, M. B. (2022). Collective close calling mediates group cohesion in foraging meerkats via spatially determined differences in call rates.
Berthet, M., Mesbahi, G., Cäsar, C., & Zuberbühler, K. (2022). Impact of predator model presentation paradigms on titi monkey alarm sequences. Retrieved from
Barbieri, C., Blasi, D. E., Arango-Isaza, E., Sotiropoulos, A. G., Hammarström, H., Wichmann, S., … Shimizu, K. K. (2022). A global analysis of matches and mismatches between human genetic and linguistic histories.
Kaiping, G. A., & Klamer, M. (2022). The dialect chain of the Timor-Alor-Pantar language family: A new analysis using systematic Bayesian phylogenetics. Retrieved from
Atanasova, T., & Laganaro, M. (2022). Word Production Changes through Adolescence: A Behavioral and ERP Investigation of Referential and Inferential Naming.
Averly, B., Sridhar, V. H., Demartsev, V., Gall, G., Manser, M., & Strandburg-Peshkin, A. (2022). Disentangling influence over group speed and direction reveals multiple patterns of influence in moving meerkat groups.
Wermelinger, S., Moersdorf, L., & Daum, M. M. (2022). How experience shapes infants’ communicative behaviour: Comparing gaze following in infants with and without pandemic experience. Retrieved from
Soldati, A., Fedurek, P., Dezecache, G., Call, J., & Zuberbühler, K. (2022). Audience sensitivity in chimpanzee display pant hoots. Retrieved from
Pasqualotto, A., Altarelli, I., De Angeli, A., Menestrina, Z., Bavelier, D., & Venuti, P. (2022). Enhancing reading skills through a video game mixing action mechanics and cognitive training.
Ravignani, A., & Garcia, M. (2022). A cross-species framework to identify vocal learning abilities in mammals.
Mine, J. G., Slocombe, K. E., Willems, E. P., Gilby, I. C., Yu, M., Thompson, M. E., … Machanda, Z. P. (2022). Vocal signals facilitate cooperative hunting in wild chimpanzees.
Fröhlich, M., van Schaik, C. P., van Noordwijk, M. A., & Knief, U. (2022). Individual variation and plasticity in the infant-directed communication of orang-utan mothers.


Fröhlich, M., Bartolotta, N., Fryns, C., Wagner, C., Momon, L., Jaffrezic, M., … Schaik, C. P. van. (2021). Orangutans have larger gestural repertoires in captivity than in the wild—A case of weak innovation? Retrieved from
Matsumae, H., Ranacher, P., Savage, P. E., Blasi, D. E., Currie, T. E., Koganebuchi, K., … Bickel, B. (2021). Exploring correlations in genetic and cultural variation across language families in northeast Asia. Retrieved from
Rychen, J., Rodrigues, D. I., Tomka, T., Rüttimann, L., Yamahachi, H., & Hahnloser, R. H. R. (2021). A system for controlling vocal communication networks. Retrieved from
Aguirre-Fernández, G., Barbieri, C., Graff, A., Pérez de Arce, J., Moreno, H., & Sánchez-Villagra, M. R. (2021). Cultural macroevolution of musical instruments in South America. Retrieved from
Heesen, R., Zuberbühler, K., Bangerter, A., Iglesias, K., Rossano, F., Pajot, A., … Genty, E. (2021). Evidence of joint commitment in great apes’ natural joint actions.
Bonvin, A., Brugger, L., & Berthele, R. (2021). Lexical measures as a proxy for bilingual language dominance? Retrieved from
Gönül, G., & Paulus, M. (2021). Children’s reasoning about the efficiency of others’ actions: The development of rational action prediction.
Hollenstein, N., Pirovano, F., Zhang, C., Jäger, L., & Beinborn, L. (2021). Multilingual Language Models Predict Human Reading Behavior. Retrieved from
Carling, G., & Cathcart, C. (2021). Evolutionary dynamics of Indo-European alignment patterns.
Miani, A., Hills, T., & Bangerter, A. (2021). LOCO: The 88-million-word language of conspiracy corpus. Retrieved from
Wirsich, J., Jorge, J., Iannotti, G. R., Shamshiri, E. A., Grouiller, F., Abreu, R., … Vulliémoz, S. (2021). The relationship between EEG and fMRI connectomes is reproducible across simultaneous EEG-fMRI studies from 1.5T to 7T.
Le Floch, A., Bouchard, A., Gallot, Q., & Zuberbühler, K. (2021). Lesser spot-nosed monkeys coordinate alarm call production with associated Campbell’s monkeys. Retrieved from
Brügger, R. K., Willems, E. P., & Burkart, J. M. (2021). Do marmosets understand others’ conversations? A thermography approach.
Ranacher, P., Neureiter, N., van Gijn, R., Sonnenhauser, B., Escher, A., Weibel, R., … Bickel, B. (2021). Contact-tracing in cultural evolution: a Bayesian mixture model to detect geographic areas of language contact. Retrieved from
Jing, Y., Widmer, P., & Bickel, B. (2021). Word Order Variation is Partially Constrained by Syntactic Complexity. Retrieved from
Neureiter, N., Ranacher, P., Van Gijn, R., Bickel, B., & Weibel, R. (2021). Can Bayesian phylogeography reconstruct migrations and expansions in linguistic evolution?
Manfredi, M., Celebic, C., & Daum, M. M. (2021). When dogs meow: An electrophysiological study of lexical–semantic processing in toddlers. Retrieved from
Fröhlich, M., Bartolotta, N., Fryns, C., Wagner, C., Momon, L., Jaffrezic, M., … van Schaik, C. P. (2021). Multicomponent and multisensory communicative acts in orang-utans may serve different functions.
Mohammadshahi, A., & Henderson, J. (2021). Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement. Retrieved from
Sauppe, S., Choudhary, K. K., Giroud, N., Blasi, D. E., Norcliffe, E., Bhattamishra, S., … Bickel, B. (2021). Neural signatures of syntactic variation in speech planning. Retrieved from
Jing, Y., Blasi, D., & Bickel, B. (2021). Dependency length minimization and its limits: a possible role for a probabilistic version of the Final-Over-Final Condition (to appear in Language).
Van De Ville, D., Farouj, Y., Preti, M. G., Liégeois, R., & Amico, E. (2021). When makes you unique: Temporality of the human brain fingerprint.
Ranacher, P., & Neureiter, N. (2021). Hidden spatial clusters - and how to find them. Retrieved from
Carling, G., & Cathcart, C. (2021). Reconstructing the evolution of Indo-European grammar.
Ger, E., Stuber, L., Küntay, A. C., Göksun, T., Stoll, S., & Daum, M. M. (2021). Influence of causal language on causal understanding: A comparison between Swiss German and Turkish. Retrieved from
Shimizu, K. K., Copetti, D., Okada, M., Wicker, T., Tameshige, T., Hatakeyama, M., … Handa, H. (2021). De Novo Genome Assembly of the Japanese Wheat Cultivar Norin 61 Highlights Functional Variation in Flowering Time and Fusarium-Resistant Genes in East Asian Genotypes. Retrieved from The data are also accessible at IPK, Germany, (last accessed on Dec. 15, 2020), and BLAST server at the National Institute of Genetics, Japan, (last accessed on Dec. 15, 2020). The gene annotation of the Fhb1 locus can be downloaded from
Sauppe, S., & Flecken, M. (2021). Speaking for seeing: Sentence structure guides visual event apprehension. Retrieved from
You, G., Bickel, B., Daum, M. M., & Stoll, S. (2021). Child-directed speech is optimized for syntax-free semantic inference. Retrieved from
Sato, T., Adachi, N., Kimura, R., Hosomichi, K., Yoneda, M., Oota, H., … Ishida, H. (2021). Whole Genome Sequencing of a 900-year-old Human Skeleton Supports Two Past Migration Events from the Russian Far East to Northern Japan. Retrieved from
Heesen, R., Bangerter, A., Zuberbühler, K., Iglesias, K., Neumann, C., Pajot, A., … Genty, E. (2021). Assessing joint commitment as a process in great apes.


Hovsepyan, S., Olasagasti, I., & Giraud, A.-L. (2020). Combining predictive coding and neural oscillations enables online syllable recognition in natural speech. Retrieved from
Cathcart, C. A., Hölzl, A., Jäger, G., Widmer, P., & Bickel, B. (2020). Numeral classifiers and number marking in Indo-Iranian: A phylogenetic approach. Retrieved from
You, G., Daum, M., & Stoll, S. (2020). Processing causatives in first language acquisition: A computational approach. Retrieved from
Widmer, M., Jenny, M., Behr, W., & Bickel, B. (2020). Morphological structure can escape reduction effects from mass admixture of second language speakers. Retrieved from
Watson, S. K., Burkart, J. M., Schapiro, S. J., Lambeth, S. P., Mueller, J. L., & Townsend, S. W. (2020). Nonadjacent dependency processing in monkeys, apes, and humans. Retrieved from
Kurthen, I., Meyer, M., Schlesewsky, M., & Bornkessel-Schlesewsky, I. (2020). Individual differences in peripheral hearing and cognition reveal sentence processing differences in healthy older adults. Retrieved from
Wermelinger, S., Gampe, A., Helbling, N., & Daum, M. M. (2020). Do you understand what I want to tell you? Early sensitivity in bilinguals’ iconic gesture perception and production. Retrieved from
Collier, K., Radford, A. N., Stoll, S., Watson, S. K., Manser, M. B., Bickel, B., & Townsend, S. W. (2020). Dwarf mongoose alarm calls: investigating a complex non-human animal call.
Watson, S. K., Heesen, R., Hedwig, D., Robbins, M. M., & Townsend, S. W. (2020). An exploration of Menzerath’s law in wild mountain gorilla vocal sequences. Retrieved from
Sokoliuk, R., Degano, G., Banellis, L., Melloni, L., Hayton, T., Sturman, S., … Cruse, D. (2020). Covert Speech Comprehension Predicts Recovery From Acute Unresponsive States. Retrieved from
Andrieu, J., Penny, S. G., Bouchet, H., Malaivijitnond, S., Reichard, U. H., & Zuberbühler, K. (2020). White-handed gibbons discriminate context-specific song compositions.
Marchesotti, S., Nicolle, J., Merlet, I., Arnal, L. H., Donoghue, J. P., & Giraud, A.-L. (2020). Selective enhancement of low-gamma activity by tACS improves phonemic processing and reading accuracy in dyslexia. Retrieved from
Bangerter, A., Mayor, E., & Knutsen, D. (2020). Lexical entrainment without conceptual pacts? Revisiting the matching task. Retrieved from
Shorland, G., Genty, E., Guéry, J.-P., & Zuberbühler, K. (2020). Investigating self-recognition in bonobos: mirror exposure reduces looking time to self but not unfamiliar conspecifics.