Research
I study artificial intelligence, machine learning, and language models with a focus on social reasoning, evaluation, and interpretability. The goal is to locate where these capabilities come from, how they appear inside the model, and how to test them in domains where mistakes matter.
Core Agenda
Data provenance Tracing social capability back to concrete training data choices.
Representations Studying how social roles and personas appear in activation space.
Evaluation Building benchmarks where institutions, incentives, and instructions interact.
Safety Understanding how useful social reasoning and misuse risk rise together.
Most of my work sits in one of four lanes, but the questions are connected: what in the data teaches models about people, what structure that creates inside the model, and what evaluation is strong enough to show whether the behavior generalizes.
Current focus
I study where social capability comes from in the training corpus. Recent work uses gradient-based attribution and targeted unlearning to connect behavior changes to specific regions of Dolma3 rather than treating model behavior as a black box.
Representations
I look at how professions, personas, and other social categories are organized in activation space, and when that structure is compositional or steerable instead of merely descriptive.
Measurement
I build evaluation tools such as FLaME, FIFE, and FinForge to test model behavior where institutions, incentives, and instructions interact in ways that are difficult to fake with shallow pattern matching.
Implications
I treat safety as part of the same research program. Better models of people can create utility, but they also make misuse and manipulation easier, so the explanatory work has to keep pace with the capability work.
These projects are the clearest examples of the broader agenda: measurement where the stakes are concrete, and interpretability that ties model behavior back to a mechanism instead of a vague story.
Current focus
Using attribution and targeted unlearning to identify which parts of Dolma3 are responsible for social reasoning behavior, so changes in capability can be tied to specific data decisions.
ACL 2025
Benchmarked 23 foundation models across 20+ financial NLP tasks to show where domain fluency survives real evaluation pressure and where it breaks down.
Preprint, 2025
Built a benchmark for instruction following in finance, where subtle violations are easy to miss if evaluation looks only at surface correctness.
Preprint, 2026
Created a scalable way to generate financial evaluation data when privacy, rarity, or collection cost make real-world benchmarks too narrow to rely on alone.
EMNLP Workshop 2025
Used open-ended wargames as a testbed for strategic reasoning, coordination, and long-horizon decision-making when models have to track people rather than isolated prompts.