Tweet by AK on RLHF. REINFORCEMENT LEARNING

kpM5Fe12_normal.jpg spacer.png
AK
⁦‪@_akhaliq‬⁩
logo_twitter-1497383721365.png
spacer_464x1-1582829598167.png
Efficient RLHF: Reducing the Memory Usage of PPO

paper page: huggingface.co/papers/2309.00…

Reinforcement Learning with Human Feedback (RLHF) has revolutionized language modeling by aligning models with human preferences. However, the RL stage, Proximal Policy Optimization (PPO),… twitter.com/i/web/status/1… pic.twitter.com/nxoXhSyvev

9/5/23, 11:03 PM

Joseph Thornton

Leave a comment