Tianwei Ni   倪天炜

I am a PhD candidate at Mila - Quebec AI Institute and Université de Montréal, advised by Pierre-Luc Bacon. In the previous years, I worked closely with Benjamin Eysenbach at Princeton University and Aditya Mahajan at McGill University.

My research centers on reinforcement learning (RL) that enables good sequential-decision-making under uncertainty. I aim to empower RL with better frameworks, algorithms, and implementations that can tackle real-world-level challenges beyond toy tasks. To advance this vision of strong AI, I incorporate sequence modeling, representation learning, planning, and more -- techniques that I consider essential for building a modern RL system.

           

News
  • Sept 2024: I started my applied scientist internship at Amazon in Santa Clara, supervised by Rasool Fakoor.
  • Jan 2024: One paper got accepted at ICLR as a poster. See you in Vienna!
  • Sept 2023: One paper got accepted at NeurIPS as an oral. See you in New Orleans!
  • Aug 2023: I passed my predoc exam and became a PhD candidate.
Highlighted Papers

Below are selected papers in reverse chronological order, and please see the full publication list in Google Scholar.
Notation: * indicates equal contribution.

Do Transformer World Models Give Better Policy Gradients?
Michel Ma*, Tianwei Ni, Clement Gehring, Pierluca D'Oro*, Pierre-Luc Bacon
International Conference on Machine Learning (ICML), 2024
and ICLR 2024 Workshop on Generative Models for Decision Making (oral)
arXiv

Led by Michel and Pierluca, we craft a model-based policy gradient method for long-horizon planning. Conditioning solely on action sequences, the world model yields better gradients over state-based models and sometimes ground-truth simulators.

Bridging State and History Representations: Understanding Self-Predictive RL
Tianwei Ni, Benjamin Eysenbach, Erfan Seyedsalehi, Michel Ma, Clement Gehring, Aditya Mahajan, Pierre-Luc Bacon
International Conference on Learning Representations (ICLR), 2024
and NeurIPS 2023 Workshop on Self-Supervised Learning: Theory and Practice (oral)
arXiv / OpenReview / 1-hour Talk /

Provide a unified view on state and history representations in MDPs and POMDPs, and further investigate the challenge, solution, and benefit of learning self-predictive representations in standard MDPs, distracting MDPs, and sparse-reward POMDPs.

When Do Transformers Shine in RL? Decoupling Memory from Credit Assignment
Tianwei Ni, Michel Ma, Benjamin Eysenbach, Pierre-Luc Bacon
Conference on Neural Information Processing Systems (NeurIPS), 2023 (oral)
and NeurIPS 2023 Workshop on Foundation Models for Decision Making
arXiv / OpenReview / Poster / 11-min Talk / Mila Blog /

Investigate the architectural aspect of history representations in RL on temporal dependencies -- memory and credit assignment, with rigorous quantification.

Recurrent Model-Free RL Can Be a Strong Baseline for Many POMDPs
Tianwei Ni, Benjamin Eysenbach, Ruslan Salakhutdinov
International Conference on Machine Learning (ICML), 2022
project page / arXiv / CMU ML Blog /

Find and implement simple but often strong baselines for POMDPs, including meta-RL, robust RL, generalization in RL, and temporal credit assignment.

Service

Reviewer: ICML 2023-24, NeurIPS 2023-24, ICLR 2024

Teaching Assistant: 10-703 Deep Reinforcement Learning and Control, Carnegie Mellon University, Fall 2020

Personal Journey

Before embarking on my PhD journey, I was a research intern on embodied AI mentored by Jordi Salvador and Luca Weihs at Allen Institute for AI (AI2). I earned my Master's degree in Machine Learning at Carnegie Mellon University, where I studied deep RL guided by Ben Eysenbach and Russ Salakhutdinov, and explored human-agent collaboration advised by Katia Sycara. My research journey started with computer vision for medical images supervised by Alan Yuille at Johns Hopkins University. I earned my Bachelor's degree in Computer Science at Peking University.

Fun fact: I have experienced university education in three languages - Chinese, English, and French.

Website template is credit to Jon Barron's source code.