Skip to content

PRIV-Creation/Awesome-GPT4V

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

Awesome Maintenance PR's Welcome


Awesome GPT4V


We are focusing on evaluation and analysis of GPT4-V(ision).

A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering.
Yunxin Li, Longyue Wang, Baotian Hu, Xinyu Chen, Wanqi Zhong, Chenyang Lyu, Min Zhang.
arXiv 2023.11. [PDF]

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions.
Lin Chen, Jisong Li, Xiaoyi Dong, Pan Zhang, Conghui He, Jiaqi Wang, Feng Zhao, Dahua Lin.
arXiv 2023.11. [PDF]

GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration.
Naoki Wake, Atsushi Kanehira, Kazuhiro Sasabuchi, Jun Takamatsu, Katsushi Ikeuchi.
arXiv 2023.11. [PDF]

To See Is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning.
Junke Wang, Lingchen Meng, Zejia Weng, Bo He, Zuxuan Wu, Yu-Gang Jiang.
arXiv 2023.11. [PDF]

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation.
An Yan, Zhengyuan Yang, Wanrong Zhu, Kevin Lin, Linjie Li, Jianfeng Wang, Jianwei Yang, Yiwu Zhong, Julian McAuley, Jianfeng Gao, Zicheng Liu, Lijuan Wang.
arXiv 2023.11. [PDF]

GPT-4V(ision) as A Social Media Analysis Engine.
Hanjia Lyu, Jinfa Huang, Daoan Zhang, Yongsheng Yu, Xinyi Mou, Jinsheng Pan, Zhengyuan Yang, Zhongyu Wei, Jiebo Luo.
arXiv 2023.11. [PDF]

On The Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving.
Licheng Wen, Xuemeng Yang, Daocheng Fu, Xiaofeng Wang, Pinlong Cai, Xin Li, Tao Ma, Yingxuan Li, Linran Xu, Dengke Shang, Zheng Zhu, Shaoyan Sun, Yeqi Bai, Xinyu Cai, Min Dou, Shuanglu Hu, Botian Shi.
arXiv 2023.11. [PDF]

Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges.
Chenhang Cui, Yiyang Zhou, Xinyu Yang, Shirley Wu, Linjun Zhang, James Zou, Huaxiu Yao.
arXiv 2023.11. [PDF]

Towards Generic Anomaly Detection and Understanding: Large-scale Visual-linguistic Model (GPT-4V) Takes The Lead.
Yunkang Cao, Xiaohao Xu, Chen Sun, Xiaonan Huang, Weiming Shen.
arXiv 2023.11. [PDF]

GPT-4V(ision) as A Generalist Evaluator for Vision-Language Tasks.
Xinlu Zhang, Yujie Lu, Weizhi Wang, An Yan, Jun Yan, Lianke Qin, Heng Wang, Xifeng Yan, William Yang Wang, Linda Ruth Petzold.
arXiv 2023.11. [PDF]

A Comprehensive Study of GPT-4V's Multimodal Capabilities in Medical Imaging.
Yingshu Li, Yunyi Liu, Zhanyu Wang, Xinyu Liang, Lingqiao Liu, Lei Wang, Leyang Cui, Zhaopeng Tu, Longyue Wang, Luping Zhou.
arXiv 2023.10. [PDF]

Multimodal ChatGPT for Medical Applications: An Experimental Study of GPT-4V.
Zhiling Yan, Kai Zhang, Rong Zhou, Lifang He, Xiang Li, Lichao Sun.
arXiv 2023.10. [PDF]

Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and In-depth Evaluation.
Yongxin Shi, Dezhi Peng, Wenhui Liao, Zening Lin, Xinhong Chen, Chongyu Liu, Yuyi Zhang, Lianwen Jin.
arXiv 2023.10. [PDF]

An Early Evaluation of GPT-4V(ision).
Yang Wu, Shilong Wang, Hao Yang, Tian Zheng, Hongbo Zhang, Yanyan Zhao, Bing Qin.
arXiv 2023.10. [PDF][Github]

HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models.
Fuxiao Liu, Tianrui Guan, Zongxia Li, Lichang Chen, Yaser Yacoob, Dinesh Manocha, Tianyi Zhou.
arXiv 2023.10. [PDF]

Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation.
Zhengyuan Yang, Jianfeng Wang, Linjie Li, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan Wang.
arXiv 2023.10. [PDF][Github]

The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision).
Zhengyuan Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Chung-Ching Lin, Zicheng Liu, Lijuan Wang.
arXiv 2023.09. [PDF]

About

Evaluation and analysis of GPT4-V(ision).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published