Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues for Implementation #30

Open
injadlu opened this issue Nov 12, 2024 · 3 comments
Open

Issues for Implementation #30

injadlu opened this issue Nov 12, 2024 · 3 comments

Comments

@injadlu
Copy link

injadlu commented Nov 12, 2024

您好,非常有意义的工作!
在我用您提供的数据集中 LLaVA 1.5的数据进行训练时,发现在Object Hallucination的benchmark上:
全量微调的性能分别为:19.13 与 9.32
lora微调的性能分别为:10.07 与 5.30
远低于您使用online策略微调的模型性能,请问有什么策略 / 使用何种数据可以逼近于您的性能吗?

@yiranyyu
Copy link
Collaborator

感谢关注!我们建议的复现论文中性能最好的方法就是按照论文中的 iterative 方式进行数据构造和训练。我们近期会开源使用 RLAIF-V 作为 reward model 直接构造高质量数据的方法,并且在定量实验中发现这样构造的数据效率也较高并且在单轮训练中就能取得不错的效果,希望能够帮助到您!

Thank you for your interest! We recommend reproducing the best-performing method in the paper by following the iterative approach for data construction and training as outlined. We will soon release a method for directly constructing high-quality data using RLAIF-V as the reward model. Our quantitative experiments show that this method not only improves data efficiency but also achieves promising results in single stage training (though suboptimal as mentioned in the paper). We hope this will be helpful to you!

@gagaein
Copy link

gagaein commented Nov 21, 2024

感谢关注!我们建议的复现论文中性能最好的方法就是按照论文中的 iterative 方式进行数据构造和训练。我们近期会开源使用 RLAIF-V 作为 reward model 直接构造高质量数据的方法,并且在定量实验中发现这样构造的数据效率也较高并且在单轮训练中就能取得不错的效果,希望能够帮助到您!

Thank you for your interest! We recommend reproducing the best-performing method in the paper by following the iterative approach for data construction and training as outlined. We will soon release a method for directly constructing high-quality data using RLAIF-V as the reward model. Our quantitative experiments show that this method not only improves data efficiency but also achieves promising results in single stage training (though suboptimal as mentioned in the paper). We hope this will be helpful to you!

Any Expected Time of releasing this version of code?

@lufanma
Copy link

lufanma commented Nov 21, 2024

感谢关注!我们建议的复现论文中性能最好的方法就是按照论文中的 iterative 方式进行数据构造和训练。我们近期会开源使用 RLAIF-V 作为 reward model 直接构造高质量数据的方法,并且在定量实验中发现这样构造的数据效率也较高并且在单轮训练中就能取得不错的效果,希望能够帮助到您!

Thank you for your interest! We recommend reproducing the best-performing method in the paper by following the iterative approach for data construction and training as outlined. We will soon release a method for directly constructing high-quality data using RLAIF-V as the reward model. Our quantitative experiments show that this method not only improves data efficiency but also achieves promising results in single stage training (though suboptimal as mentioned in the paper). We hope this will be helpful to you!

非常赞的工作,请问“近期会开源使用RLAIF-V 作为 reward model 直接构造高质量数据的方法”有预计时间吗,太期待了!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants