Tianwei Ni   倪天炜

I am a final-year PhD student at Mila - Quebec AI Institute and Université de Montréal, advised by Pierre-Luc Bacon. In the previous years, I worked closely with Benjamin Eysenbach at Princeton University and Aditya Mahajan at McGill University. I also worked at Amazon as an applied scientist intern.

My research centers on deep reinforcement learning (RL), spanning training paradigms:

  1. Canonical deep RL -- online learning from scratch, with emphasis on partial observability, representation learning, self-supervised learning, and sequence modeling.
  2. Offline RL & foundation models for decision making -- learning from static, diverse datasets, focusing on uncertainty awareness and zero-shot generalization.
  3. Post-training with RL -- fine-tuning pretrained models for complicated tasks such as large language model (LLM) reasoning and planning, leveraging synthetic data and continual learning.

I work across the spectrum: from scientific understanding to practical algorithms and scalable implementation, which I believe is essential to building strong general AI.

           

News
  • Nov 2024: I am honored to receive the RBC Borealis AI fellowship (10 PhD students in Canada every year).
  • Sept 2024 - Feb 2025: I started my applied scientist internship at Amazon Web Services in Santa Clara, supervised by Rasool Fakoor and Allen Nie in the generalist agent team.
  • Jan 2024: One paper got accepted at ICLR as a poster. See you in Vienna!
  • Sept 2023: One paper got accepted at NeurIPS as an oral. See you in New Orleans!
  • Aug 2023: I passed my predoc exam and became a PhD candidate.
Highlighted Papers

Below are selected papers in reverse chronological order, and please see the full publication list in Google Scholar.
Notation: * indicates equal contribution.

Teaching Large Language Models to Reason through Learning and Forgetting
Tianwei Ni, Allen Nie, Sapana Chaudhary, Yao Liu, Huzefa Rangwala, Rasool Fakoor
arXiv /

Inference-time search improves reasoning in LLMs but is expensive at deployment. We propose unlikelihood fine-tuning (UFT), which trains LLMs to follow correct reasoning paths and forget wrong ones from various reasoning algorithms, while enabling fast inference.

Do Transformer World Models Give Better Policy Gradients?
Michel Ma*, Tianwei Ni, Clement Gehring, Pierluca D'Oro*, Pierre-Luc Bacon
International Conference on Machine Learning (ICML), 2024
and ICLR 2024 Workshop on Generative Models for Decision Making (oral)
arXiv

Led by Michel and Pierluca, we craft a model-based policy gradient method for long-horizon planning. Conditioning solely on action sequences, the world model yields better gradients over state-based models and sometimes ground-truth simulators.

Bridging State and History Representations: Understanding Self-Predictive RL
Tianwei Ni, Benjamin Eysenbach, Erfan Seyedsalehi, Michel Ma, Clement Gehring, Aditya Mahajan, Pierre-Luc Bacon
International Conference on Learning Representations (ICLR), 2024
and NeurIPS 2023 Workshop on Self-Supervised Learning: Theory and Practice (oral)
arXiv / OpenReview / 1-hour Talk /

Provide a unified view on state and history representations in MDPs and POMDPs, and further investigate the challenge, solution, and benefit of learning self-predictive representations in standard MDPs, distracting MDPs, and sparse-reward POMDPs.

When Do Transformers Shine in RL? Decoupling Memory from Credit Assignment
Tianwei Ni, Michel Ma, Benjamin Eysenbach, Pierre-Luc Bacon
Conference on Neural Information Processing Systems (NeurIPS), 2023 (oral)
and NeurIPS 2023 Workshop on Foundation Models for Decision Making
arXiv / OpenReview / Poster / 11-min Talk / Mila Blog /

Investigate the architectural aspect of history representations in RL on temporal dependencies -- memory and credit assignment, with rigorous quantification.

Recurrent Model-Free RL Can Be a Strong Baseline for Many POMDPs
Tianwei Ni, Benjamin Eysenbach, Ruslan Salakhutdinov
International Conference on Machine Learning (ICML), 2022
project page / arXiv / CMU ML Blog /

Find and implement simple but often strong baselines for POMDPs, including meta-RL, robust RL, generalization in RL, and temporal credit assignment.

Service

Reviewer: ICML 2023-24, NeurIPS 2023-24, ICLR 2024, RLC 2025

Teaching Assistant: 10-703 Deep Reinforcement Learning and Control, Carnegie Mellon University, Fall 2020

Personal Journey

Before embarking on my PhD journey, I was a research intern on embodied AI mentored by Jordi Salvador and Luca Weihs at Allen Institute for AI (AI2). I earned my Master's degree in Machine Learning at Carnegie Mellon University, where I studied deep RL guided by Ben Eysenbach and Russ Salakhutdinov, and explored human-agent collaboration advised by Katia Sycara. My research journey started with computer vision for medical images supervised by Alan Yuille at Johns Hopkins University. I earned my Bachelor's degree in Computer Science at Peking University.

Fun fact: I have experienced university education in three languages - Chinese, English, and French.

Website template is credit to Jon Barron's source code.