Skip to content

This repository is the official GitHub page of MLBCAP, the first-place winner of the 2nd SciCap Challenge. MLBCAP has been accepted for presentation at AI4Research @AAAI 2025.

Notifications You must be signed in to change notification settings

teamreboott/MLBCAP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

MLBCAP

This repository is the official GitHub page of MLBCAP, the first-place winner of the 2nd SciCap Challenge. MLBCAP has been accepted for presentation at AI4Research @ AAAI 2025.

Paper: Link
Dataset (HuggingFace): Link

📌 Introduction

Scientific figure captioning is a challenging task that demands contextually accurate descriptions of visual content. Existing approaches often oversimplify the task by treating it as either an image-to-text conversion or text summarization problem, leading to suboptimal results. Furthermore, commonly used datasets derived from arXiv papers are plagued with low-quality captions, making them unsuitable for effectively training large language models (LLMs).

MLBCAP addresses these challenges by leveraging a multi-LLM collaborative approach to generate high-quality captions. 🚀

MLBCAP Diagram


📊 Dataset Overview

This dataset stems from the results of the 2nd Scicap Challenge, utilizing the hidden test dataset from the competition. The dataset is composed of synthetic high-quality captions generated by MLBCAP.

Note: This dataset is based on the hidden test dataset from the challenge, and the original captions from arXiv papers are not publicly available.


🏆 2nd Scicap Challenge

The 2nd Scicap Challenge was held during IJCAI 2024 (August 3-9, Jeju Island, South Korea). The competition featured two tracks based on caption length constraints:

  • Short Caption Track: At least 30% of the generated captions must be shorter than the author-written captions.
  • Long Caption Track: At least 30% of the generated captions must be longer than the author-written captions.

✨ Features of the Dataset

The dataset includes the following features:

  • figure_type: Extracted from the Scicap dataset
  • ocr: Extracted from the Scicap dataset
  • paragraph: Extracted from the Scicap dataset
  • mention: Extracted from the Scicap dataset
  • categories: Extracted from the Scicap dataset
  • figure_description: Generated by GPT-4o
  • mlbcap_long: Captions generated by MLBCAP-long
  • mlbcap_short: Captions generated by MLBCAP-short

🌟 Quality of MLBCAP's Captions

Human evaluation within the Scicap Challenge confirms the high quality of MLBCAP-generated captions. Three judges evaluated the captions with the following results:

  • MLBCAP-long: Demonstrated higher quality compared to the original captions written by arXiv authors. 💪
  • MLBCAP-short: Achieved a similar quality to the original captions written by authors. 🤝

Quality Evaluation


📎 Citation

If you use MLBCAP in your research, please cite our paper:

@misc{kim2025multillmcollaborativecaptiongeneration,
      title={Multi-LLM Collaborative Caption Generation in Scientific Documents}, 
      author={Jaeyoung Kim and Jongho Lee and Hong-Jun Choi and Ting-Yao Hsu and Chieh-Yang Huang and Sungchul Kim and Ryan Rossi and Tong Yu and Clyde Lee Giles and Ting-Hao 'Kenneth' Huang and Sungchul Choi},
      year={2025},
      eprint={2501.02552},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.02552}, 
}

About

This repository is the official GitHub page of MLBCAP, the first-place winner of the 2nd SciCap Challenge. MLBCAP has been accepted for presentation at AI4Research @AAAI 2025.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published