| SALMON: Self-Alignment with Principle-Following Reward Models
abs: arxiv.org/abs/2310.05910
code: github.com/IBM/SALMON
This paper from IBM proposes a new RLAIF paradigm that uses the LLM to judge responses based on specific principles, and trains a reward model conditional on… twitter.com/i/web/status/1… pic.twitter.com/9bGZMP1bz1 |