LikeGiver

LikeGiver

多模态对话模型调研

47
0
1
2023-11-09
多模态对话模型调研

2024.1.29更新:

请直接看腾讯最新出的survey:

https://arxiv.org/abs/2401.13601

MM-LLMs: Recent Advances in MultiModal Large Language Models

Duzhen Zhang, Yahan Yu, Chenxing Li, Jiahua Dong, Dan Su, Chenhui Chu, Dong Yu

In the past year, MultiModal Large Language Models (MM-LLMs) have undergone substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs via cost-effective training strategies. The resulting models not only preserve the inherent reasoning and decision-making capabilities of LLMs but also empower a diverse range of MM tasks. In this paper, we provide a comprehensive survey aimed at facilitating further research of MM-LLMs. Specifically, we first outline general design formulations for model architecture and training pipeline. Subsequently, we provide brief introductions of 26 existing MM-LLMs, each characterized by its specific formulations. Additionally, we review the performance of MM-LLMs on mainstream benchmarks and summarize key training recipes to enhance the potency of MM-LLMs. Lastly, we explore promising directions for MM-LLMs while concurrently maintaining a real-time tracking website for the latest developments in the field. We hope that this survey contributes to the ongoing advancement of the MM-LLMs domain.

之前内容:

之前其实做过一些模型的调研,但是比较乱,就没有展示出来,最近GPT-4V出来,是时候回顾一下之前的模型了。

调研以开源模型为主

先放mPLUG-Owl2中模型实力的对比图:

mPLUG.jpeg

虽然文章比较新,但是效果好像没CogVLM好:

2023-11-09 13-30-46 的屏幕截图.png

(烂尾一段时间,有空再写)

2023.12.22:

烂尾时间有点长了,所以直接补上pdf了,大家自己看看吧,之后还会来更新:

1_多模态大模型(agent)调研 (1).pdf


封面图,SDXL,"对话,机器人,卡通,相机, 彩色,抽象"

output_image-dcbq.png