Skip to content
View Battam1111's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@polyunlp

Block or report Battam1111

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
battam1111/README.md

Yanjun Chen

Second-year PhD student in the Department of Computing at The Hong Kong Polytechnic University, advised by Prof. Wenjie Li (Maggie) and Prof. Wei Zhang.

I work on Environment-Centric AI: treating the training environment of intelligent agents as a designed object. The environment is not a given, it has pieces (reward, feedback, observation, evaluation), and those pieces can be analyzed and re-designed.

More at battam1111.github.io.

Selected work

  • Exact Is Easier: Credit Assignment for Cooperative LLM Agents (in submission, arXiv:2603.06859) Cooperative LLM histories are deterministic, so per-agent counterfactual credit is exactly computable. Delivers a learning algorithm that outperforms every approximate multi-agent RL alternative, plus the first method-agnostic auditing tool for credit quality.

  • The Accuracy Paradox in RLHF (EMNLP 2024) Moderate reward models train better language models than highly accurate ones on relevance, factuality, and completeness. Reward-model accuracy as an environment design property.

  • battam1111.github.io Source for my homepage. al-folio + custom SCSS, trilingual EN/中/日.

Find me

Pinned Loading

  1. Myco Myco Public

    Self-evolving cognitive organism for AI agents — eternal devouring, eternal evolution.

    Python 62 7

  2. EIT-EAST-Lab/C3 EIT-EAST-Lab/C3 Public

    Official implementation of the paper "Contextual Counterfactual Credit Assignment for Multi-Agent Reinforcement Learning in LLM Collaboration". (by Yanjun Chen)

    Python 29

  3. AccuracyParadox-RLHF AccuracyParadox-RLHF Public

    [EMNLP 2024 Main] Official implementation of the paper "The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models".

    Python 8