Can Large Language Models pass an intelligence test? – Paola Merlo, Faculty of Humanities, Computational linguistics.
How do we evaluate LLMs and determine the aspects and limits of their intelligent behaviour? When exposed to tests of analytic intelligence, human problem-solvers identify rules applied to relevant objects and attributes. Based on the induced rules, they can generalise and are able to provide a solution to the test. An analogous language task has recently been proposed (called BLM) for LLM. In this work, we investigate what linguistic reasoning LLM develop, by asking them to solve some simple variants of the BLM task. We find that current state-of-the-art generative models can handle the task: they easily understand the instructions and can provide step-by-step explanations. The explanations they provide show that LLMs can solve two of the main hurdles: correspondence finding (object and attribute identification) and item novelty. However, overall they struggle to find the correct underlying global rules, even when they find the right answer. We argue that these findings support the usefulness of the task as a method to test the limits and specific properties of generalisation ability in Large Language Models, providing an intrinsic evaluation method inspired by tests of human intelligence.
Curled in a Velvet Knot: Counterfactual Explorations of the Affordances of Tiger and Leopard Prints Using AI Image Synthesis – Bokar Lamine N’Diaye, Faculty of Humanities, Digital humanities.
As they interpolate between each point in their “latent space”, ML generative processes offer a counterfactual exploration of the images in their training data, unbound by the emotional or cultural affordances that restrict the production of real images. This study relies on a hand-picked corpus of images connected by the presence of tiger and leopard fur pattern (either as live animals or as man-made graphic patterns, with a specific focus on the positive or negative socio-cultural value they ascribe to human bodies depicted alongside them across history and in different cultural contexts). Established through the Visual Contagions database, allowing to sift through the segmented images of almost four millions of issues of 80k+ periodicals across the world for visual similarities, the curated corpus is used to train a fine-tuned Stable Diffusion model, whose generated variations may form a comparative foil to the initial set.This work hopes to constitute a framework for an ethical, self-aware use of generative AI, an assistance to the definition of a transversal research object, an exploration of the affordances that shape the production of visual artefacts, as well as a possible bridge between academic corpora and artistic practices.
Scientists constantly navigate a directed space where nodes are hypotheses and arrows are experiments. These arrows have a cause-and-effect directionality, from initial conditions to outcomes, and scientists navigate that space based on complexity and cost considerations. It turns out that their situation is analogous to that of taxi drivers, aiming to move their clients around. We will first describe a citizen science experiment opportunistically conducted out of the FacLab to help drivers reverse engineer the algorithms applicable to their wayfinding (matching, routing, pricing). This has received the European Union Citizen Science Prize 2023. Secondly, we will provide the theoretical framework that shows this analogy can actually be made formal. Thirdly, we will circle back to LLMs and present the recent advance of Llemma (LLM trained on diversely formatted mathematical corpus) through the same prism. Finally, given current pace of development we anticipate giving an update on the pickup by the math community and anticipated extension into other fields.
13 November 2023 – 12h15-14h00
Registration mandatory – Under this link
Non-UNIGE Employees are welcome to join. To register, please use the affiliation code “NCCR Evolving Language”.