Glenn Matlin
  • Home
  • About
  • Research
  • Publications
  • Blog
  • CV

Research

How do language models learn about people?

I study artificial intelligence, machine learning, and language models with a focus on social reasoning, evaluation, and interpretability. The goal is to locate where these capabilities come from, how they appear inside the model, and how to test them in domains where mistakes matter.

Research Directions Selected Work

Core Agenda

Data provenance Tracing social capability back to concrete training data choices.

Representations Studying how social roles and personas appear in activation space.

Evaluation Building benchmarks where institutions, incentives, and instructions interact.

Safety Understanding how useful social reasoning and misuse risk rise together.

Research Directions

Most of my work sits in one of four lanes, but the questions are connected: what in the data teaches models about people, what structure that creates inside the model, and what evaluation is strong enough to show whether the behavior generalizes.

Current focus

Data Provenance & Attribution

I study where social capability comes from in the training corpus. Recent work uses gradient-based attribution and targeted unlearning to connect behavior changes to specific regions of Dolma3 rather than treating model behavior as a black box.

Representations

Internal Social Structure

I look at how professions, personas, and other social categories are organized in activation space, and when that structure is compositional or steerable instead of merely descriptive.

Measurement

Evaluation in High-Stakes Domains

I build evaluation tools such as FLaME, FIFE, and FinForge to test model behavior where institutions, incentives, and instructions interact in ways that are difficult to fake with shallow pattern matching.

Implications

Safety Alongside Capability

I treat safety as part of the same research program. Better models of people can create utility, but they also make misuse and manipulation easier, so the explanatory work has to keep pace with the capability work.

Selected Work

These projects are the clearest examples of the broader agenda: measurement where the stakes are concrete, and interpretability that ties model behavior back to a mechanism instead of a vague story.

Tracing Social Reasoning in OLMo3

Current focus

Using attribution and targeted unlearning to identify which parts of Dolma3 are responsible for social reasoning behavior, so changes in capability can be tied to specific data decisions.

Finance Language Model Evaluation (FLaME)

ACL 2025

Benchmarked 23 foundation models across 20+ financial NLP tasks to show where domain fluency survives real evaluation pressure and where it breaks down.

FIFE: Fine-grained Instruction Following Evaluation

Preprint, 2025

Built a benchmark for instruction following in finance, where subtle violations are easy to miss if evaluation looks only at surface correctness.

FinForge: Semi-Synthetic Financial Benchmark Generation

Preprint, 2026

Created a scalable way to generate financial evaluation data when privacy, rarity, or collection cost make real-world benchmarks too narrow to rely on alone.

Shall We Play a Game? Language Models for Open-ended Wargames

EMNLP Workshop 2025

Used open-ended wargames as a testbed for strategic reasoning, coordination, and long-horizon decision-making when models have to track people rather than isolated prompts.

Browse Full Publication Archive →

© 2025-2026 Glenn Matlin