I test these questions in finance, wargaming, and other high-stakes domains where misunderstanding people has real costs.
I want to know which parts of training data teach social reasoning, and which parts teach the wrong lessons.
Current focus
I probe model internals to see how social concepts are encoded. Persona steering vectors are one tool I use to map that structure.
I build evaluations that separate real understanding from pattern-matching, especially in high-stakes settings.
If a model can predict people well, it can also be used to manipulate them. I study the mechanisms so we can detect and constrain that risk.
Finance Language Model Evaluation (FLaME) Glenn Matlin, Mika Okamoto, Huzaifa Pardawala, Yang Yang, Sudheer Chava Findings of ACL 2025 · arXiv · DOI
Shall We Play a Game? Language Models for Open-ended Wargames Glenn Matlin, Parv Mahajan, Isaac Song, Yixiong Hao, Ryan Bard, et al. Wordplay @ EMNLP 2025 · arXiv
Trust by Design: Skill Profiles for Transparent, Cost-Aware LLM Routing Mika Okamoto, Ansel Kaplan Erol, Glenn Matlin Preprint, 2026 · arXiv
FinForge: Semi-Synthetic Financial Benchmark Generation Glenn Matlin, Akhil Theerthala, Avinash Gupta, JM Aravinth, Ricardo Castillo, et al. Preprint, 2026 · arXiv
Before my PhD, I worked as a data scientist in healthcare and fintech. I kept seeing models hit the metric while missing what people needed in practice, and that tension pulled me back into research.
If you’re working on model behavior, social reasoning, or high-stakes evaluation, I’d love to compare notes.