RLHF
Reinforcement Learning from Human Feedback, alignment, and post-training LLMs (Manning 2026)