Understanding DeepSeek's AI Reward Models: Aligning with Human Preferences