RLHF Book by Nathan Lambert 2026
FOUNDATION LAYER #
┌─────────────────────────────────────────────────────────────┐ │ 1. Overview 2. History 3. Training Overview │ │ [What & Why] [Evolution] [Problem Setup] │ └─────────────────────────────────────────────────────────────┘
CORE METHODOLOGY LAYER #
┌─────────────────────────────────────────────────────────────┐ │ 4. Instruction Fine-tuning [Data Preparation] │ │ 5. Reward Modeling [Preference Learning] │ │ 6. Reinforcement Learning [Core Optimization] │ └─────────────────────────────────────────────────────────────┘
ADVANCED TECHNIQUES LAYER #
┌─────────────────────────────────────────────────────────────┐ │ 7. Reasoning & Inference-Time Scaling [2024-25 Frontier] │ │ 8. Direct Alignment Algorithms [DPO & Variants] │ │ 9. Rejection Sampling [Simple RL Alt] │ └─────────────────────────────────────────────────────────────┘
DATA & THEORY LAYER #
┌─────────────────────────────────────────────────────────────┐ │ 10. Nature of Preferences [Philosophy] │ │ 11. Preference Data [Collection & Design] │ │ 12. Synthetic Data [AI Feedback & CAI] │ └─────────────────────────────────────────────────────────────┘
APPLICATION & CONTROL LAYER #
┌─────────────────────────────────────────────────────────────┐ │ 13. Tool Use & Function Calling [Agentic Capabilities] │ │ 14. Style and Information [Format & Tone] │ │ 15. Regularization [Stability Controls] │ │ 16. Over-Optimization [Failure Modes] │ └─────────────────────────────────────────────────────────────┘
EVALUATION & DEPLOYMENT LAYER #
┌─────────────────────────────────────────────────────────────┐ │ 17. Evaluation [Measurement] │ │ 18. Product, UX, Model Character [Real-World Deploy] │ └─────────────────────────────────────────────────────────────┘
REFERENCE #
┌─────────────────────────────────────────────────────────────┐ │ Appendix A: Definitions [Technical Reference] │ └─────────────────────────────────────────────────────────────┘
Foundation Layer #
- Chapter 1: Overview - What RLHF is and why it matters
- Chapter 2: History - Evolution of the field
- Chapter 3: Training Overview - Problem setup and fundamentals
Core Methodology Layer #
- Chapter 4: Instruction Fine-tuning - Data preparation techniques
- Chapter 5: Reward Modeling - Learning from preferences
- Chapter 6: Reinforcement Learning - Core optimization algorithms
Advanced Techniques Layer #
- Chapter 7: Reasoning & Inference-Time Scaling - 2024-25 frontier methods
- Chapter 8: Direct Alignment Algorithms - DPO and variants
- Chapter 9: Rejection Sampling - Simple RL alternatives
Data & Theory Layer #
- Chapter 10: Nature of Preferences - Philosophical foundations
- Chapter 11: Preference Data - Collection and design strategies
- Chapter 12: Synthetic Data - AI feedback and Constitutional AI
Application & Control Layer #
- Chapter 13: Tool Use & Function Calling - Agentic capabilities
- Chapter 14: Style and Information - Format and tone control
- Chapter 15: Regularization - Stability controls
- Chapter 16: Over-Optimization - Failure modes and mitigation
Evaluation & Deployment Layer #
- Chapter 17: Evaluation - Measurement techniques
- Chapter 18: Product, UX, Model Character - Real-world deployment
Reference #
- Appendix A: Definitions - Technical reference guide