Can Humans Supervise Increasingly Ultracrepidarian AI?

With Jose Hernandez-Orallo

Can Humans Supervise Increasingly Ultracrepidarian AI?

Large language models have evolved to solve increasingly complex problems but still fail at many simple ones—from a human point of view. This discordance with human difficulty expectations strongly affects the reliability of these models, as users cannot identify a safe operating condition where the model is expected to be correct. With the extensive use of scaling up and shaping up (such as RLHF ) in newer generations of LLMs, we question whether this is the case. In a recent Nature paper, we examined several LLM families and showed that instances that are easy for humans are usually easy for the models. However, scaled-up, shaped-up models do not secure areas of low difficulty in which either the model does not err or human supervision can spot the errors. We also found that early models often avoid user questions, whereas scaled-up, shaped-up models tend to give apparently sensible yet wrong answers much more often, including errors on difficult questions that human supervisors frequently overlook. Finally, we disentangled whether this behaviour arises from scaling up or shaping up, and discovered new scaling laws showing that larger models become more incorrect and especially more ultracrepidarian, operating beyond their competence. These findings highlight the need for a fundamental shift in the design and development of general-purpose artificial intelligence, particularly in high-stakes areas where a predictable distribution of errors is paramount.

The talk will be based on the recent paper: L Zhou, W Schellaert, FM Plumed, YM Daval, C Ferri, JH Orallo (2024) “Larger and more instructable language models become less reliable”, Nature, 61-68

Add to your calendar or Include in your list