Schedule
All times are local.
Abstract
TBD.
Abstract
This talk explores recent theoretical and methodological advancements in diffusion models. On the theoretical side, we demonstrate that diffusion models can circumvent the curse of dimensionality by discovering intrinsic low-dimensional structures within data distributions. We discuss this capability for both continuous and discrete variables and establish their theoretical optimality, showing that these advantages are driven by their feature learning abilities.Methodologically, we introduce novel post-training techniques for diffusion models handling both continuous and discrete variables. Leveraging Doob's $h$-transform as our primary technical tool, we demonstrate that integrating this transform with the efficient sampling capabilities of diffusion models facilitates effective post-training without explicit parameter updates. Specifically, for discrete diffusion models, our approach enables reinforcement learning on the unmasking order distribution without requiring model updates, yielding substantial performance improvements.
Abstract
How do neural networks memorize the training data? How to quantify, promote, or prevent memorization? This talk addresses these problems by studying two popular training algorithms: 1. Stochastic Gradient Langevin Dynamics 2. Gradient Descent in the EoS regime. For SGLD, which asymptotically samples from a Gibbs distribution, we show that we can upper and lower bound memorization by “generalization" via a connection to an average-case variant of differential privacy. For GD, we show how data distribution and model architecture design interact with the ability of neural networks to “shatter” data points. Finally, I present preliminary results on how to leverage these insights to tune memorization up and down for selected data points.
Abstract
Foundation models are becoming runtime decision-makers. At inference time, they decode under constraints, compare alternatives, search over hypotheses, call tools, coordinate with other agents, verify intermediate states, and defend against adversarial inputs. This creates a central control problem: how should an agent choose its next computation, action, or workflow under uncertainty, risk, latency, and safety constraints? In this talk, I will present adaptive test-time control as a unifying framework for reasoning and planning agents. Rather than treating decoding, search, reflection, collaboration, verification, and escalation as fixed inference recipes, we view them as controllable actions selected by a runtime policy. I will ground this framework in recent work from our group across a hierarchy of test-time control. At the generation level, GenARM, Transfer Q-Star, and Collab show how reward models, value estimates, and mixtures of agents can steer decoding and alignment at inference time. But control is only as reliable as its critic; ReForm closes this loop by using reward-guided failure discovery to expose and patch reward-model errors. At the action level, Agentic Critical Training trains agents to judge better actions among alternatives, turning reflection into action-quality control rather than imitation. At the workflow level, FlowBank adaptively selects among complementary multi-agent workflows under performance–cost tradeoffs. Together, these works suggest a path from controlled decoding to controlled agency: agents that allocate computation, evidence, collaboration, and safety intervention where they matter most. The broader message is that reliable planning agents require runtime policies for allocating computation, evidence, collaboration, verification, workflow structure, and safety intervention. The goal is to build agents that are selective, robust, and accountable — systems that deliver reliable behavior per unit compute in interactive, multimodal, and tool-rich environments.