Jeff Liu Lab
Home
Projects
Workshop
AI Wiki
AI Lab
Shop
中
Sign In
All
Computing Science
Artificial Intelligence
Deep Learning
Reinforcement Learning
AI Agents
Embodied Intelligence
Robot Engineering
Human-Like Intelligence
AI Engineering
← Back to Wiki
Reinforcement Learning
RL Overview
RL Landscape
RL Milestones
Classical RL
Deep RL
Advanced Policy Gradient
Offline RL
Model-based RL
RL Engineering
LLM Post-Training
Multi-Agent RL
Exploration & Reward Engineering
Advanced RL
RL Applications
Comments (0)
Sign in to comment
Table of Contents
Overview
1. Markov Decision Process (MDP)
1.1 Basic Framework
1.2 Core Objective
1.3 Bellman Equations
1.4 MDP Extensions
2. RL Taxonomy
2.1 Overall Taxonomy Tree
2.2 Model-Free vs Model-Based
2.3 On-Policy vs Off-Policy
2.4 Offline Reinforcement Learning
2.5 Single-Agent vs Multi-Agent
3. Key Algorithm Map
3.1 By Development Timeline
3.2 By Application Scenario
4. Core Components of Deep RL
4.1 Function Approximation
4.2 Key Techniques for Stable Training
4.3 Exploration Strategies
5. Connection to LLM Post-Training
5.1 RLHF Pipeline
5.2 Beyond RLHF
5.3 LLM from an RL Perspective
6. Frontier Directions
6.1 Current Hot Topics
6.2 Open Challenges
7. Suggested Learning Path
References
Further Reading
Comments
Comments (0)
Sign in to comment