Issues for Implementation #30

injadlu · 2024-11-12T03:52:48Z

您好，非常有意义的工作！
在我用您提供的数据集中 LLaVA 1.5的数据进行训练时，发现在Object Hallucination的benchmark上：
全量微调的性能分别为：19.13 与 9.32
lora微调的性能分别为：10.07 与 5.30
远低于您使用online策略微调的模型性能，请问有什么策略 / 使用何种数据可以逼近于您的性能吗？

yiranyyu · 2024-11-19T10:33:57Z

感谢关注！我们建议的复现论文中性能最好的方法就是按照论文中的 iterative 方式进行数据构造和训练。我们近期会开源使用 RLAIF-V 作为 reward model 直接构造高质量数据的方法，并且在定量实验中发现这样构造的数据效率也较高并且在单轮训练中就能取得不错的效果，希望能够帮助到您！

Thank you for your interest! We recommend reproducing the best-performing method in the paper by following the iterative approach for data construction and training as outlined. We will soon release a method for directly constructing high-quality data using RLAIF-V as the reward model. Our quantitative experiments show that this method not only improves data efficiency but also achieves promising results in single stage training (though suboptimal as mentioned in the paper). We hope this will be helpful to you!

gagaein · 2024-11-21T08:54:21Z

感谢关注！我们建议的复现论文中性能最好的方法就是按照论文中的 iterative 方式进行数据构造和训练。我们近期会开源使用 RLAIF-V 作为 reward model 直接构造高质量数据的方法，并且在定量实验中发现这样构造的数据效率也较高并且在单轮训练中就能取得不错的效果，希望能够帮助到您！

Thank you for your interest! We recommend reproducing the best-performing method in the paper by following the iterative approach for data construction and training as outlined. We will soon release a method for directly constructing high-quality data using RLAIF-V as the reward model. Our quantitative experiments show that this method not only improves data efficiency but also achieves promising results in single stage training (though suboptimal as mentioned in the paper). We hope this will be helpful to you!

Any Expected Time of releasing this version of code?

lufanma · 2024-11-21T10:37:04Z

感谢关注！我们建议的复现论文中性能最好的方法就是按照论文中的 iterative 方式进行数据构造和训练。我们近期会开源使用 RLAIF-V 作为 reward model 直接构造高质量数据的方法，并且在定量实验中发现这样构造的数据效率也较高并且在单轮训练中就能取得不错的效果，希望能够帮助到您！

Thank you for your interest! We recommend reproducing the best-performing method in the paper by following the iterative approach for data construction and training as outlined. We will soon release a method for directly constructing high-quality data using RLAIF-V as the reward model. Our quantitative experiments show that this method not only improves data efficiency but also achieves promising results in single stage training (though suboptimal as mentioned in the paper). We hope this will be helpful to you!

非常赞的工作，请问“近期会开源使用RLAIF-V 作为 reward model 直接构造高质量数据的方法”有预计时间吗，太期待了！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues for Implementation #30

Issues for Implementation #30

injadlu commented Nov 12, 2024

yiranyyu commented Nov 19, 2024

gagaein commented Nov 21, 2024

lufanma commented Nov 21, 2024

Issues for Implementation #30

Issues for Implementation #30

Comments

injadlu commented Nov 12, 2024

yiranyyu commented Nov 19, 2024

gagaein commented Nov 21, 2024

lufanma commented Nov 21, 2024