June 25
Nika Haghtalab
UC Berkeley
Title: Distortion of AI Alignment: Does Preference Optimization optimize for preferences?
Abstract: After pre-training, large language models are aligned with human preferences based on pairwise comparisons. State-of-the-art alignment methods (such as PPO-based RLHF and DPO) are built on the assumption of aligning with a single preference model, despite being deployed in settings where users have diverse preferences. As a result, it is not even clear that these alignment methods produce models that satisfy users on average — a minimal requirement. Drawing on social choice theory and modeling users' comparisons through individual Bradley-Terry (BT) models, we introduce an alignment method's distortion: the worst-case ratio between the optimal achievable average utility, and the average utility of the learned policy. The notion of distortion helps draw sharp distinctions between alignment methods: Nash Learning from Human Feedback achieves the minimax optimal distortion of a constant. We also give a fine-grained understanding of the distortion of RLHF (PPO or DPO based) which can suffer unbounded distortion in the worst-case.
Josh Alman
Columbia University
Title: Fine-Grained Complexity and the pursuit of Fast Attention
Abstract: The attention mechanism is the key behind the Transformer architecture and many other Large Language Models (LLMs). However, computing it in a straightforward way takes quadratic time, and this quadratic scaling is frequently cited as the bottleneck to making LLMs more efficient. In this talk, I'll survey a line of work in which we use tools from fine-grained complexity theory to address this challenge. I'll discuss the possibility of designing faster algorithms for attention itself, and then a few candidate ways to replace attention with other mechanisms guided by theory. I'll aim for the talk to be accessible to listeners without a background in fine-grained complexity or in LLMs.
Jon Kleinberg
Cornell University
Title: Breadth and Density for Language Generation in the Limit
Abstract: The emergence of large language models has prompted a surge of interest into theoretical models that might give us insight into both their successes and their shortcomings. We'll give an overview of a particular line of recent work in this direction, focusing on a surprising set of positive results that shows it is possible to give guarantees for language-generation algorithms even in the absence of any probabilistic assumptions, in a framework known as "language generation in the limit". These results suggest interesting notions of "breadth" in language generation, attempting to formalize the idea that different algorithms for this problem might all meet the specification but differ significantly in their expressiveness — in how richly they can generate from the underlying language. We explore how these ideas can be formalized using combinatorial notions of the density of one infinite set in another, and also discuss recent progress on related questions involving resource constraints, resource augmentation, and mistake bounds. The talk will be based on joint work with Moses Charikar, Anay Mehrotra, Sendhil Mullainathan, Chirag Pabbaraju, Charlotte Peale, Omer Reingold, Amin Saberi, Grigoris Velegkas, and Fan Wei.