Tags | | | | |

Reinforcement Learning | | | |

From Policy Gradient to PPO
PPO, one of the most widely-used RL algorithms.
Understanding Bellman Equations
Why and how are Bellman equations used in RL?

Multi-Agent Systems | | | |

A Brief Introduction to Dec-POMDP
About Dec-POMDP, its computation, and planning.
Optimal Q-functions for Dec-POMDP
Why can't we compute π^* via normative definition of Q^* in Dec-POMDP?

Generative AI | | | |

A Brief Introduction to Diffusion Models
Math foundations of diffusion models.
Reward Modeling in LLM Alignment
How the reward models in RLHF are built?

See instructions here. If you spot anything off, please send me an email.