This repository is the official GitHub page of MLBCAP, the first-place winner of the 2nd SciCap Challenge. MLBCAP has been accepted for presentation at AI4Research @ AAAI 2025.
Paper: Link
Dataset (HuggingFace): Link
Scientific figure captioning is a challenging task that demands contextually accurate descriptions of visual content. Existing approaches often oversimplify the task by treating it as either an image-to-text conversion or text summarization problem, leading to suboptimal results. Furthermore, commonly used datasets derived from arXiv papers are plagued with low-quality captions, making them unsuitable for effectively training large language models (LLMs).
MLBCAP addresses these challenges by leveraging a multi-LLM collaborative approach to generate high-quality captions. 🚀
This dataset stems from the results of the 2nd Scicap Challenge, utilizing the hidden test dataset from the competition. The dataset is composed of synthetic high-quality captions generated by MLBCAP.
Note: This dataset is based on the hidden test dataset from the challenge, and the original captions from arXiv papers are not publicly available.
The 2nd Scicap Challenge was held during IJCAI 2024 (August 3-9, Jeju Island, South Korea). The competition featured two tracks based on caption length constraints:
- Short Caption Track: At least 30% of the generated captions must be shorter than the author-written captions.
- Long Caption Track: At least 30% of the generated captions must be longer than the author-written captions.
The dataset includes the following features:
figure_type
: Extracted from the Scicap datasetocr
: Extracted from the Scicap datasetparagraph
: Extracted from the Scicap datasetmention
: Extracted from the Scicap datasetcategories
: Extracted from the Scicap datasetfigure_description
: Generated by GPT-4omlbcap_long
: Captions generated by MLBCAP-longmlbcap_short
: Captions generated by MLBCAP-short
Human evaluation within the Scicap Challenge confirms the high quality of MLBCAP-generated captions. Three judges evaluated the captions with the following results:
- MLBCAP-long: Demonstrated higher quality compared to the original captions written by arXiv authors. 💪
- MLBCAP-short: Achieved a similar quality to the original captions written by authors. 🤝
If you use MLBCAP in your research, please cite our paper:
@misc{kim2025multillmcollaborativecaptiongeneration,
title={Multi-LLM Collaborative Caption Generation in Scientific Documents},
author={Jaeyoung Kim and Jongho Lee and Hong-Jun Choi and Ting-Yao Hsu and Chieh-Yang Huang and Sungchul Kim and Ryan Rossi and Tong Yu and Clyde Lee Giles and Ting-Hao 'Kenneth' Huang and Sungchul Choi},
year={2025},
eprint={2501.02552},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.02552},
}