Skip to main content
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D. Manning, Stefano Ermon, Chelsea Finn

00
2023-02-27
alignmentoptimization

Abstract

This paper introduces and evaluates the idea described in “Direct Preference Optimization: Your Language Model is Secretly a Reward Model”, and reports empirical results that helped shape subsequent work in alignment, optimization.