Understanding Large Language Models via a TCS Lens

Overview

A workshop for mapping the theory of LLMs.

Large language models have transformed text generation and are already being used across domains ranging from information retrieval to scientific discovery. This progress has sparked a growing body of theoretical work in the TCS community that aims to understand the mathematical foundations of language generation and modern language models.

The theoretical study of language is not new: it connects to classic ideas from computation, information theory, and formal language theory. Today, researchers are bringing tools from learning theory, automata theory, computational complexity, communication complexity, and algorithm design, to bear on the modern landscape of LLMs.

The goal is twofold: to give STOC attendees an accessible entry point into these emerging theoretical frameworks, and to connect researchers who are approaching the foundations of LLMs from different angles.

Schedule

Workshop schedule

All talks are in Grand Ballroom A.

June 25

8:30–9:20

Nika Haghtalab

UC Berkeley

Title: Distortion of AI Alignment: Does Preference Optimization optimize for preferences?

Show abstract Hide abstract

Abstract: After pre-training, large language models are aligned with human preferences based on pairwise comparisons. State-of-the-art alignment methods (such as PPO-based RLHF and DPO) are built on the assumption of aligning with a single preference model, despite being deployed in settings where users have diverse preferences. As a result, it is not even clear that these alignment methods produce models that satisfy users on average — a minimal requirement. Drawing on social choice theory and modeling users' comparisons through individual Bradley-Terry (BT) models, we introduce an alignment method's distortion: the worst-case ratio between the optimal achievable average utility, and the average utility of the learned policy. The notion of distortion helps draw sharp distinctions between alignment methods: Nash Learning from Human Feedback achieves the minimax optimal distortion of a constant. We also give a fine-grained understanding of the distortion of RLHF (PPO or DPO based) which can suffer unbounded distortion in the worst-case.

9:20–10:10

Josh Alman

Columbia University

Title: Fine-Grained Complexity and the pursuit of Fast Attention

Show abstract Hide abstract

Abstract: The attention mechanism is the key behind the Transformer architecture and many other Large Language Models (LLMs). However, computing it in a straightforward way takes quadratic time, and this quadratic scaling is frequently cited as the bottleneck to making LLMs more efficient. In this talk, I'll survey a line of work in which we use tools from fine-grained complexity theory to address this challenge. I'll discuss the possibility of designing faster algorithms for attention itself, and then a few candidate ways to replace attention with other mechanisms guided by theory. I'll aim for the talk to be accessible to listeners without a background in fine-grained complexity or in LLMs.

10:10–11:00

Jon Kleinberg

Cornell University

Title: Breadth and Density for Language Generation in the Limit

Show abstract Hide abstract

Abstract: The emergence of large language models has prompted a surge of interest into theoretical models that might give us insight into both their successes and their shortcomings. We'll give an overview of a particular line of recent work in this direction, focusing on a surprising set of positive results that shows it is possible to give guarantees for language-generation algorithms even in the absence of any probabilistic assumptions, in a framework known as "language generation in the limit". These results suggest interesting notions of "breadth" in language generation, attempting to formalize the idea that different algorithms for this problem might all meet the specification but differ significantly in their expressiveness — in how richly they can generate from the underlying language. We explore how these ideas can be formalized using combinatorial notions of the density of one infinite set in another, and also discuss recent progress on related questions involving resource constraints, resource augmentation, and mistake bounds. The talk will be based on joint work with Moses Charikar, Anay Mehrotra, Sendhil Mullainathan, Chirag Pabbaraju, Charlotte Peale, Omer Reingold, Amin Saberi, Grigoris Velegkas, and Fan Wei.

June 26

8:30–9:20

Binghui Peng

University of Maryland

Title: Hands-On Experience Using Frontier LLMs for TCS Research

[Slides]

Show abstract Hide abstract

Abstract: In this talk, I will share hands-on experience using large language models for automated theoretical research. We design a simple prover–verifier pipeline and evaluate a frontier LLM on open problems from three sources: my own paper, involving a problem I know well; COLT open problems, in an area I am familiar with; and commutative-algebra questions, about which I know almost nothing. In the second part, I will discuss a simple agentic workflow that automatically formalizes proofs in Lean.

9:20–10:10

Vatsal Sharan

University of Southern California

Title: Using Algorithms to Understand Transformers (and Using Transformers to Understand Algorithms)

[Slides]

Show abstract Hide abstract

Abstract: We will discuss how algorithmic tools and understanding borrowed from optimization theory, Fourier transforms, and Boolean function analysis can help understand the mechanisms employed by Transformers to solve basic computational tasks such as linear regression and addition. We will examine the role of the architecture and pre-trained data in enabling Transformers to learn their employed mechanisms. Finally, we will discuss work on using Transformers themselves to discover and design data structures for tasks such as nearest neighbor search.

10:10–11:00

Bingbin Liu

Kempner Institute, Harvard University

Title: Improving Reasoning Efficiency Through Theoretically Informed Sandboxes

[Slides]

Show abstract Hide abstract

Abstract: Modern LLMs are complex systems with many interacting components. Theoretically informed "sandboxes" offer a useful lens into their inner workings and limitations, yielding insights that inform practice. This talk presents examples of such sandboxes for studying LLM reasoning. The first part examines the existence of compact reasoning solutions: using automata as a sandbox for sequential reasoning, we show that there exist shallow, parallel solutions whose decompositions help reveal architectural limitations. The second part focuses on learning efficiency, using sparse parity as a sandbox to show how data-related factors can accelerate learning.

Organizers

For questions about the workshop, please contact the organizers.