MLBCAP

This repository is the official GitHub page of MLBCAP, the first-place winner of the 2nd SciCap Challenge. MLBCAP has been accepted for presentation at AI4Research @ AAAI 2025.

Paper: Link
Dataset (HuggingFace): Link

📌 Introduction

Scientific figure captioning is a challenging task that demands contextually accurate descriptions of visual content. Existing approaches often oversimplify the task by treating it as either an image-to-text conversion or text summarization problem, leading to suboptimal results. Furthermore, commonly used datasets derived from arXiv papers are plagued with low-quality captions, making them unsuitable for effectively training large language models (LLMs).

MLBCAP addresses these challenges by leveraging a multi-LLM collaborative approach to generate high-quality captions. 🚀

📊 Dataset Overview

This dataset stems from the results of the 2nd Scicap Challenge, utilizing the hidden test dataset from the competition. The dataset is composed of synthetic high-quality captions generated by MLBCAP.

Note: This dataset is based on the hidden test dataset from the challenge, and the original captions from arXiv papers are not publicly available.

🏆 2nd Scicap Challenge

The 2nd Scicap Challenge was held during IJCAI 2024 (August 3-9, Jeju Island, South Korea). The competition featured two tracks based on caption length constraints:

Short Caption Track: At least 30% of the generated captions must be shorter than the author-written captions.
Long Caption Track: At least 30% of the generated captions must be longer than the author-written captions.

✨ Features of the Dataset

The dataset includes the following features:

figure_type: Extracted from the Scicap dataset
ocr: Extracted from the Scicap dataset
paragraph: Extracted from the Scicap dataset
mention: Extracted from the Scicap dataset
categories: Extracted from the Scicap dataset
figure_description: Generated by GPT-4o
mlbcap_long: Captions generated by MLBCAP-long
mlbcap_short: Captions generated by MLBCAP-short

🌟 Quality of MLBCAP's Captions

Human evaluation within the Scicap Challenge confirms the high quality of MLBCAP-generated captions. Three judges evaluated the captions with the following results:

MLBCAP-long: Demonstrated higher quality compared to the original captions written by arXiv authors. 💪
MLBCAP-short: Achieved a similar quality to the original captions written by authors. 🤝

📎 Citation

If you use MLBCAP in your research, please cite our paper:

@misc{kim2025multillmcollaborativecaptiongeneration,
      title={Multi-LLM Collaborative Caption Generation in Scientific Documents}, 
      author={Jaeyoung Kim and Jongho Lee and Hong-Jun Choi and Ting-Yao Hsu and Chieh-Yang Huang and Sungchul Kim and Ryan Rossi and Tong Yu and Clyde Lee Giles and Ting-Hao 'Kenneth' Huang and Sungchul Choi},
      year={2025},
      eprint={2501.02552},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.02552}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLBCAP

📌 Introduction

📊 Dataset Overview

🏆 2nd Scicap Challenge

✨ Features of the Dataset

🌟 Quality of MLBCAP's Captions

📎 Citation

About

Releases

Packages

teamreboott/MLBCAP

Folders and files

Latest commit

History

Repository files navigation

MLBCAP

📌 Introduction

📊 Dataset Overview

🏆 2nd Scicap Challenge

✨ Features of the Dataset

🌟 Quality of MLBCAP's Captions

📎 Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages