Tags |
|
|
|
|
Reinforcement Learning
|
|
|
|
-
From Policy Gradient to PPO
PPO, one of the most widely-used RL algorithms.
-
Understanding Bellman Equations
Why and how are Bellman equations used in RL?
Multi-Agent Systems
|
|
|
|
-
A Brief Introduction to Dec-POMDP
About Dec-POMDP, its computation, and planning.
-
Optimal Q-functions for Dec-POMDP
Why can't we compute π* via normative definition of Q* in Dec-POMDP?
Generative AI
|
|
|
|
-
A Brief Introduction to Diffusion Models
Math foundations of diffusion models.
-
Reward Modeling in LLM Alignment
How the reward models in RLHF are built?
See instructions here.
If you spot anything off, please send me an email.