Publications

Capability Provenance in Language Models: A Case Study in Social Reasoning

interpretability

training-data attribution

social reasoning

Training-data attribution maps which regions of a model’s pretraining corpus support social versus STEM reasoning, and the two draw on qualitatively distinct regions.

Jul 21, 2026

Glenn Matlin, Chandreyi Chakraborty, Saehee Eom, Mika Okamoto, Rayan Castilla, Louis Jaburi, Alvin Deng, Taywon Min, Lucia Quirke, Stella Biderman, Mark Riedl

Trust by Design: Skill Profiles for Transparent, Cost-Aware LLM Routing

LLM Systems

Interpretability

BELLA (Budget-Efficient LLM Selection via Automated skill-profiling): interpretable, skill-based model selection that makes cost-performance trade-offs explicit instead of black-box.

Feb 2, 2026

Mika Okamoto, Ansel Kaplan Erol, Glenn Matlin

FinForge: Semi-Synthetic Financial Benchmark Generation

NLP

Finance

Benchmarks

A scalable semi-synthetic pipeline for financial evaluation benchmarks, producing FinForge-5k: 5,000+ human-validated QA pairs across 11 finance subdomains from a 100k-document corpus.

Jan 11, 2026

Glenn Matlin, Akhil Theerthala, Anant Gupta, Anirudh JM, Rayan Castilla, Yi Mei Ng, Sudheer Chava

Financial Instruction Following Evaluation (FIFE)

NLP

Finance

Evaluation

A high-difficulty benchmark of 88 human-authored prompts with chainable, verifiable constraints, evaluating 53 models on complex financial instruction following.

Dec 1, 2025

Glenn Matlin, Siddharth , Anirudh JM, Aditya Shukla, Yahya Hassan, Sudheer Chava

Shall We Play a Game? Language Models for Open-ended Wargames

Security

Wargaming

A position paper and scoping review of 100 studies on AI in wargames, with a novel ontology of open-endedness, deployment recommendations, and open research challenges.

Sep 21, 2025

Glenn Matlin, Parv Mahajan, Isaac Song, Yixiong Hao, Ryan Bard, Stu Topp, Evan Montoya, M. Rehan Parwani, Soham Shetty, Mark Riedl

Do Language Models Agree with Human Perceptions of Suspense in Stories?

NLP

Computational Linguistics

Narrative

Replicating four seminal psychology studies with language models: LMs can tell whether a text is suspenseful, but not how suspenseful, nor how suspense rises and falls.

Aug 13, 2025

Glenn Matlin, Devin Zhang, Rodrigo Barroso Loza, Diana M. Popescu, Joni Isbell, Chandreyi Chakraborty, Mark Riedl

Financial Language Model Evaluation (FLaME)

NLP

Finance

ACL

The first holistic benchmarking suite for financial NLP: 23 foundation language models evaluated across 20 core finance tasks, with open-source framework, data, and results.

Jun 18, 2025

Glenn Matlin, Mika Okamoto, Huzaifa Pardawala, Yang Yang, Sudheer Chava

UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification

Machine Learning

Systems

NeurIPS

A cost-aware and uncertainty-based framework for dynamic 2D prediction in multi-stage classification systems.

Oct 19, 2022

Yanbo Xu, Alind Khare, Glenn Matlin, Monish Ramadoss, Rishikesan Kamaleswaran, Chao Zhang, Alexey Tumanov

Papers and preprints.