This site is under active development. Some content may be AI-assisted or incomplete. If something looks off, it probably is.

Research

Research interests in AI reasoning, evaluation, safety, and policy

Research Interests

I study how AI systems learn about humans from our data, how they generalize from patterns, and how to use those findings to improve capabilities and safety.

Evaluation and Measurement

Grounded benchmarks from authoritative sources. Testing what models actually know versus what they memorize, across knowledge types and difficulty levels.

AI Safety and Alignment

Transparency, interpretability, and trust calibration. Making model behavior legible and auditable so humans can verify recommendations before acting on them.

AI Policy

Post-AGI governance, safety standards, and the institutional frameworks needed for deploying AI in high-stakes settings.


Dissertation: RAPID-AI

Reasoning, Analysis, and Planning for Interactive Decision-Making with Language Models

My PhD integrates the interests above into a unified methodology for socio-technical systems (STS) — domains like finance, government, security, law, and medicine where outcomes depend on both technical facts and human institutions.

I focus on three pillars:

  1. Measurement — constructing grounded, expert-level evaluations of STS knowledge and reasoning
  2. Training — using curated/synthetic data, supervised fine-tuning, and reinforcement learning to improve reasoning, analysis, and planning
  3. Assurance — methods for transparency, interpretability, and trust calibration so humans understand and reliably use LM recommendations

The result is a general methodology and open artifacts — datasets, models, and evaluation protocols — that transfer across domains and raise the reliability of LM-assisted decisions.

Research Thrusts

T1 — Grounded STS Evaluation

Authoritative-source corpora → cited QAs and multi-step tasks testing knowledge, multi-hop reasoning, planning, and trade-offs. Outputs carry inline quotes/citations, are teacher-graded, citation-validated, and deduplicated.

T2 — Training for Reasoning & Planning

Curated + synthetic data → SFT → RL with structure-aware rewards; OOD guards; self-agreement sampling. Focus on tractable levers that move reasoning quality — not pretraining scale.

T3 — Assurance & Interpretability

Cognitive-pattern analysis, mechanistic/behavioral probes, uncertainty and constraint checks, trust calibration. Delivering human-facing artifacts (rationales, uncertainty signals, constraint checks) that make model behavior legible and auditable.

T4 — Human-in-the-Loop Interaction

Interfaces that elicit justifications, counterfactuals, and red-teamable plans. Requiring justification before action, capturing counterfactuals, and supporting red-teaming of plans.


Publications

For citation information, see individual publication pages or my Google Scholar profile.

No matching items
Back to top