|
| Ch12. Synthetic Data |
| Ch7. Reasoning & Inference-time Scaling |
| Ch9. Instruction Fine-Tuning (IFT/SFT) |
| Data Preparation in RLHF -- Ch6 (Preference Data) vs Ch9 (SFT Data) |
| PPO in LLMs vs PPO in Walker2D |
| PPO vs DPO in RLHF |
| Single GPUT (RTX4090) RLHF Training Pipeline w/ TRL |
| The Complete InstructGPT Recipe (Ch 4.2.1) |