Collaboration

Work With Me

This page exists so neither of us has to do the vague-email dance. If you want to work with me, read it first. It covers what I work on, how I run projects, what I expect, and what I offer in return. The goal is a quick read: serious people should be able to tell if this is a fit — and if it isn’t, you’ll know that just as quickly.

I mainly work with Georgia Tech undergrads and master’s students, plus the occasional external collaborator. If you’re at Georgia Tech, research-for-credit is often the best route — it adds real stakes, real structure, and a real reason to take the work seriously. If you’re looking for a PhD advisor, that’s not me. For PhD admissions, reach out to Mark Riedl (Director, ML Center) or Sudheer Chava (Chair, Finance).

Get in Touch Read This First

Research Areas

Capability Provenance Attribution and unlearning — where capability comes from.

Persona Vectors Activation spaces, steering, and internal representations.

Social Reasoning Theory of mind, belief tracking, evaluation design.

Safety & Misuse Sycophancy, deception, manipulation — measuring failure modes.

Wargaming Open-ended strategic reasoning under uncertainty.

Negotiation Multi-agent cooperation, coalitions, and incentives.

Finance Domain-specific reasoning where “sounds right” is useless.

What We’re Working On

My research is organized around one question: what do language models learn about people? I care about where that comes from in the data, how it is represented inside the model, how we can make it legible enough to use on purpose, and what happens when we drop those capabilities into messy human systems. I am not interested in models just sounding socially fluent. I want to know how they conceptualize people, how they reason about us, and what that means when the stakes are real.

Attribution

Capability Provenance & Training Data Attribution

I care about where language models learn to conceptualize humanity. If a model can infer motives, roles, norms, or patterns of social behavior, I want to know what data put that there. That is why I care about attribution and unlearning: can we trace a behavior back to specific parts of training data, and can we show that those parts were actually causal?

This area is for people who like attribution, unlearning, and hard questions about where model behavior actually came from.

Read these before you reach out:

Akyürek et al. — Towards Tracing Knowledge in Language Models Back to the Training Data
Chang et al. — Scalable Influence and Fact Tracing for Large Language Model Pretraining
Cheng et al. — Training Data Attribution (TDA): Examining Its Adoption & Use Cases
Ruis et al. — Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
Grosse et al. — Studying Large Language Model Generalization with Influence Functions

Representations

Persona Vectors & Internal Representations

I care about giving people tools to understand and use their language models effectively. For me, that means getting past black-box vibes and asking what the model is actually encoding about people internally. Persona vectors are one way into that question: are there stable directions for roles, perspectives, or styles of reasoning, and can we steer them without fooling ourselves?

This area is for people who like activation spaces, steering, and making model internals less mystical.

Read these before you reach out:

Zou et al. — Representation Engineering: A Top-Down Approach to AI Transparency
Rimsky et al. — Steering Llama 2 via Contrastive Activation Addition
Lee et al. — Do LLMs Have Distinct and Consistent Personality? TRAIT
Serapio-García et al. — A Psychometric Framework for Evaluating and Shaping Personality Traits in Large Language Models
Chen et al. — Persona Vectors: Monitoring and Controlling Character Traits in Language Models

Evaluation

I care about whether language models can actually reason about people: who knows what, who believes what, what someone intends, and how those mental states change over time. A lot of “social intelligence” claims are fake because the benchmark is weak. This area is about building and stress-testing evaluations for belief tracking, perspective-taking, common ground, deception, and social inference — then using those tests to separate real social reasoning from shallow pattern matching.

This area is for people who like social cognition, benchmark design, and careful evaluation of mental-state reasoning.

Read these before you reach out:

Chen et al. — ToMBench: Benchmarking Theory of Mind in Large Language Models
Kim et al. — FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions
Xu et al. — OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models
He et al. — HI-TOM: A Benchmark for Evaluating Higher-Order Theory of Mind Reasoning in Large Language Models
Xiao et al. — Towards Dynamic Theory of Mind: Evaluating LLM Adaptation to Temporal Evolution of Human States
Gandhi et al. — Understanding Social Reasoning in Language Models with Language Models

Safety

Safety & Misuse of Social Modeling

If a model can reason about beliefs, intentions, and vulnerability, it can do more than help people. It can flatter them, manipulate them, mislead evaluators, or target the users most likely to be persuaded. I care about measuring those failure modes, understanding their mechanisms, and building ways to detect or constrain them.

This area is for people who can hold both ideas at once: the capability matters, and the failure mode does too.

Read these before you reach out:

Sharma et al. — Towards Understanding Sycophancy in Language Models
Wen et al. — Language Models Learn to Mislead Humans via RLHF
Williams et al. — On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Liu et al. — LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models
Scheurer et al. — Large Language Models Can Strategically Deceive Their Users When Put Under Pressure

Strategy

Wargaming & Strategic Decision-Making

Wargames force models out of canned benchmark land. They have to plan, bluff, negotiate, interpret uncertainty, and deal with other agents pushing back. I care most about open-ended wargames, where language matters and the stakes are not fake just because the environment is simulated.

Wargaming is not strictly about violent military conflict. Wargames are a tool for reasoning about any situation with opposing forces and high stakes — climate change, pandemics, rogue AI, market crises. The point is structured reasoning under pressure, not the battlefield.

This area is for people who like multi-agent reasoning, planning, and open-ended environments where the benchmark does not do the thinking for you.

Read these before you reach out:

Matlin et al. — Shall We Play a Game? Language Models for Open-ended Wargames
Bakhtin et al. — Human-level Play in the Game of Diplomacy by Combining Language Models with Strategic Reasoning
Kramár et al. — Negotiation and Honesty in Artificial Intelligence Methods for the Board Game of Diplomacy
Lamparth et al. — Human vs. Machine: Behavioral Differences Between Expert Humans and Language Models in Wargame Simulations
Rivera et al. — Escalation Risks from Language Models in Military and Diplomatic Decision-Making

Cooperation

Negotiation & Cooperation

I do not just care about bargaining in the narrow sense. I care about deals, coalitions, cooperation under conflict, and what happens when models have to manage incentives over multiple turns with other agents who want different things.

This area is for people who like multi-turn social interaction, incentives, coalitions, and strategic communication.

Read these before you reach out:

Bianchi et al. — How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis
Abdelnabi et al. — Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation
Vaccaro et al. — Advancing AI Negotiations: New Theory and Evidence from a Large-Scale Autonomous Negotiations Competition
Mukobi et al. — Welfare Diplomacy: Benchmarking Language Model Cooperation
Akata et al. — Playing Repeated Games with Large Language Models

Domain

Finance & Financial NLP

Finance is where plausible-sounding bullshit gets expensive. I care about whether models can actually reason over filings, calculations, instructions, and evidence — not just sound finance-coded.

This area is for people who care about domain-specific reasoning, evaluation, and high-stakes use cases where “sounds right” is useless.

Read these before you reach out:

Chen et al. — FinQA: A Dataset of Numerical Reasoning over Financial Data
Chen et al. — ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering
Islam et al. — FinanceBench: A New Benchmark for Financial Question Answering
Matlin et al. — Finance Language Model Evaluation (FLaME)
Xie et al. — FinChain: A Symbolic Benchmark for Verifiable Chain-of-Thought Financial Reasoning

How I Work

I lead the research, but I’m not your boss. Those are different roles. I set direction, give context, and push the work forward, but I am not here to micromanage people or force them to work. We are working toward a shared goal, and I want collaborators who act like collaborators: challenge my ideas, ask sharp questions, and then go do the work.

The people who do well with me get stuff done. They show up to meetings. They prepare ahead of time. They take ownership of tasks and get them done quickly instead of waiting around until someone pressures them. They ask clarifying questions when needed, but they do not need constant hand-holding. And when they finish something, they can explain what they did, why they did it that way, and how it turned out.

I reward competence, not compliance. I have had people join projects that were already months underway and, in a fraction of the time, earn second authorship because they showed up, asked good questions, did the work, and got things done. People who are reliable, capable, and do not make excuses can move very fast with me.

What does not work is the opposite: missing meetings, talking more than doing, treating AI tools like they are optional, needing every next step spelled out, not trying to improve, or outsourcing simple tasks that you should have been able to handle yourself. I am not going to drag people across the finish line. The desire to excel has to come from you. Without that fire, this will not work.

There’s a Ronnie Coleman line I think about a lot: Everybody wants to be a bodybuilder, but nobody wants to lift no heavy-ass weights. A lot of people want the paper, the CV line, the authorship, the idea of doing research. Fewer people want to do the reps. I care about whether you will do the reps.

Communication

Most communication happens in Slack. Ask questions early. Do not sit on blockers for days. During paper pushes, expect two meetings a week plus the weekly lab meeting. During quieter periods, once a week is usually enough. Either way, come prepared.

Writing & Feedback

I will give detailed feedback. I will not rewrite your draft for you. Your job is to revise until it is good. The point is not just to get one paper out — it is to get better at doing research and better at writing.

Pace

When we are pushing, we move fast. Deadlines are real, and the pace can be intense. When the push is over, we recover. This is not endless grind for its own sake. It is focused sprints with real breaks.

Credit

If you contribute meaningfully, you are on the paper. I care about contribution, not seniority games.

No Silos

I do not like siloed research. People in the group should know what other people are working on and talk to each other directly when the work overlaps. But “talk to each other” does not mean outsourcing basic thinking. Try the obvious things first. Then ask good questions.

Ownership

Take a task, drive it to done. Do not wait for reminders. When it is finished, be ready to explain what you did, why you did it that way, and how it turned out. I will trust you further the faster you show this.

What You Need to Bring

You do not need to arrive perfect. You do need to close gaps quickly — and three things are non-negotiable.

AI-Native Coding

You need to be comfortable working with AI coding agents — Claude Code, Codex, Cursor, whatever your tool is. This is not optional. We are not here to hand-craft artisanal code. We are here to do good research, and code is a means to that end. If you are unwilling to use agents for coding, or unwilling to learn, we are not a match.

10 Hours a Week Minimum

Less than that and projects stall. I would rather work with fewer people who are actually present than more people who show up sporadically.

Reliability

Show up when you say you will. Deliver what you committed to. Communicate early when something is off track. This matters more to me than raw technical skill.

Stack, Resources & the GT Path

What I expect you to use day-to-day, what I cover, and how Georgia Tech affiliation factors in.

Your Stack

Python / PyTorch for the core work. GitHub for PRs, branches, and issues. Overleaf / LaTeX for papers. Weights & Biases for experiment tracking. Google Docs / Slides for collaboration. AI coding agents — Claude Code, Codex, Cursor — every day.

Bonus: sprint-based workflows, Kanban boards, and ticket systems. Projects run on sprint cycles with tracked tickets.

Compute & Accounts

You do not pay for compute. I have GPUs, API credits, and infrastructure through DARPA, NSF, Together AI, Modal Labs, and Thinking Machines Lab. That part is on me.

You do need your own AI assistant accounts — ChatGPT Pro, Claude Pro, or equivalent. I treat that as a cost of doing serious research. I’m working on reimbursement; for now assume it’s part of the job.

Georgia Tech Path

Lack of publications does not automatically disqualify you. I am open to promising Georgia Tech undergrads and master’s students. What matters is whether you have read the work, can think and write clearly, and are accountable for what you ship.

Research-for-credit is often the cleanest way to make that real — it creates structure, stakes, and a reason to take the work seriously.

Is This a Good Fit?

I’m not looking for generic enthusiasm. I’m looking for people who have read the work, can explain what they want to do, and can do it without constant supervision. I don’t need a bunch of helpers. I want young scientists.

Strong signs

You’ve read at least one of my papers and have a specific reaction to it.
You can point to relevant research experience, or you make up for not having it with unusually clear thinking and initiative.
Your writing is brief, clear, and high-signal.
You can name a problem you want to work on and why.
You take initiative without needing constant pressure.

Bad signs

Your pitch is “open to whatever needs help right now.”
You haven’t read the work.
Your writing is vague, sloppy, or much longer than it needs to be.
You say you want to “help out” but cannot explain what you want to learn, build, or solve.
You need constant hand-holding, reminders, or pressure to move.

Flexible is good. Directionless is not. What I offer in return is real responsibility, real respect, and hard problems worth solving. Do strong work and you can earn authorship, trust, and room to grow fast. I reward competence, not compliance.

Get In Touch

Before you reach out: read at least one paper, have a specific reaction to it, know your primary area, and be able to name the exact problem you want to work on and the strongest piece of evidence that you can do it. Keep your answers short — parsimony matters, and more words will not rescue a weak answer.

Read this page. Then decide whether we should talk. Don’t send me a generic paragraph about being passionate about AI — tell me what you read, what you think, and what you want to do.

I read every email. The strong ones get a 15-minute intro call. The vague ones do not.

An intake form is coming online soon. In the meantime, email me directly.

Email me

glenn@gatech.edu LinkedIn GitHub

Work With Me

What We’re Working On

Capability Provenance & Training Data Attribution

Persona Vectors & Internal Representations

Social Reasoning & Theory of Mind

Safety & Misuse of Social Modeling

Wargaming & Strategic Decision-Making

Negotiation & Cooperation

Finance & Financial NLP

How I Work

Communication

Writing & Feedback

Pace

Credit

No Silos

Ownership

What You Need to Bring

AI-Native Coding

10 Hours a Week Minimum

Reliability

Stack, Resources & the GT Path

Your Stack

Compute & Accounts

Georgia Tech Path

Is This a Good Fit?

Strong signs

Bad signs

Get In Touch