Research

Research Directions

Most of my work sits in one of four lanes, but the questions are connected: what in the data teaches models about people, what structure that creates inside the model, and what evaluation is strong enough to show whether the behavior generalizes.

Current focus

Data Provenance & Attribution

I study where social capability comes from in the training corpus. Recent work uses gradient-based attribution and targeted unlearning to connect behavior changes to specific regions of Dolma3 rather than treating model behavior as a black box.

Representations

Internal Social Structure

I look at how professions, personas, and other social categories are organized in activation space, and when that structure is compositional or steerable instead of merely descriptive.

Measurement

Evaluation in High-Stakes Domains

I build evaluation tools such as FLaME, FIFE, and FinForge to test model behavior where institutions, incentives, and instructions interact in ways that are difficult to fake with shallow pattern matching.

Implications

Safety Alongside Capability

I treat safety as part of the same research program. Better models of people can create utility, but they also make misuse and manipulation easier, so the explanatory work has to keep pace with the capability work.

Selected Work

These projects are the clearest examples of the broader agenda: measurement where the stakes are concrete, and interpretability that ties model behavior back to a mechanism instead of a vague story.

Browse Full Publication Archive →

How do language models learn about people?

Research Directions

Data Provenance & Attribution

Internal Social Structure

Evaluation in High-Stakes Domains

Safety Alongside Capability

Selected Work

Finance Language Model Evaluation (FLaME)

FIFE: Fine-grained Instruction Following Evaluation

FinForge: Semi-Synthetic Financial Benchmark Generation

Shall We Play a Game? Language Models for Open-ended Wargames

How do language models learn about people?

Research Directions

Data Provenance & Attribution

Internal Social Structure

Evaluation in High-Stakes Domains

Safety Alongside Capability

Selected Work

Tracing Social Reasoning in OLMo3

Finance Language Model Evaluation (FLaME)

FIFE: Fine-grained Instruction Following Evaluation

FinForge: Semi-Synthetic Financial Benchmark Generation

Shall We Play a Game? Language Models for Open-ended Wargames