Machine Learning Research at Georgia Tech
I study artificial intelligence, machine learning, and language models, with a focus on social reasoning, evaluation, and interpretability in finance, wargaming, law, and other high-stakes domains.

Each project asks the same question from a different angle: what do models infer about people, and how do we test whether that understanding is real?
ACL 2025
Benchmarked 23 foundation models across 20+ financial NLP tasks to show where domain fluency breaks down and where it holds up under real evaluation pressure.
EMNLP Workshop 2025
Used open-ended wargames as a testbed for strategic reasoning, coordination, and long-horizon decision-making when models have to track people rather than isolated prompts.
Preprint, 2026
Made multi-model routing more legible by exposing capability and cost tradeoffs instead of hiding them behind a single black-box system decision.
Preprint, 2026
Built a scalable way to generate financial benchmarks when privacy, rarity, or cost makes real-world evaluation data too limited to rely on alone.
I work where models interact with people, institutions, and incentives rather than clean toy tasks.
Language models meet regulation, incentives, and real decision costs here. That makes evaluation grounded rather than abstract.
Open-ended planning exposes whether a model can track beliefs, goals, and uncertainty over time instead of pattern-matching one step at a time.
Law, healthcare, and other institutional settings are where social misunderstanding turns into brittle automation, bad advice, or manipulation risk.
If you’re working on model behavior, interpretability, or high-stakes evaluation, I’d love to compare notes.