-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues for Implementation #30
Comments
感谢关注!我们建议的复现论文中性能最好的方法就是按照论文中的 iterative 方式进行数据构造和训练。我们近期会开源使用 RLAIF-V 作为 reward model 直接构造高质量数据的方法,并且在定量实验中发现这样构造的数据效率也较高并且在单轮训练中就能取得不错的效果,希望能够帮助到您! Thank you for your interest! We recommend reproducing the best-performing method in the paper by following the iterative approach for data construction and training as outlined. We will soon release a method for directly constructing high-quality data using RLAIF-V as the reward model. Our quantitative experiments show that this method not only improves data efficiency but also achieves promising results in single stage training (though suboptimal as mentioned in the paper). We hope this will be helpful to you! |
Any Expected Time of releasing this version of code? |
非常赞的工作,请问“近期会开源使用RLAIF-V 作为 reward model 直接构造高质量数据的方法”有预计时间吗,太期待了! |
您好,非常有意义的工作!
在我用您提供的数据集中 LLaVA 1.5的数据进行训练时,发现在Object Hallucination的benchmark上:
全量微调的性能分别为:19.13 与 9.32
lora微调的性能分别为:10.07 与 5.30
远低于您使用online策略微调的模型性能,请问有什么策略 / 使用何种数据可以逼近于您的性能吗?
The text was updated successfully, but these errors were encountered: