2024.1.31 HelloWorld 4.0 version update
HelloWorld4.0 is a progressive transitional version from tagging with blip+clip to tagging with GPT4V. I initially trained a pure GPT4V tagging model, and then merged it with a large proportion of the HelloWorld3.2 version and a small proportion of other models. The new version has shown improvements in prompt compliance and concept coverage compared to the 3.2 version.
The new GPT4V tagging training set has doubled from the 4000 images of the helloworld3 series to 8000 images, covering not only portraits but also animals, architecture, nature, food, illustrations, and more. However, the pure GPT4V version encountered an overfitting problem, which is preliminarily attributed to the doubling of the number of training images. One of the next steps in iterative optimization is to find out how to include as many non-portrait concepts as possible while ensuring sufficient training of portraits. At this stage, a fusion of the new and old versions has been used for fine-tuning to ensure a smooth transition between versions, so the expanded concept set and the advantages brought by GPT4V tagging are not very perceptible at the moment. These advantages will become increasingly apparent in the subsequent generations 5 and 6 of the model.
2024.1.31 HelloWorld 4.0版本更新
HelloWorld4.0是从blip+clip打标到GPT4V打标的渐进过渡版本,采用最新GPT4V打标训练出的HelloWorld模型与原HelloWorld3.2进行了融合调优。新版本在提示词跟随性与概念覆盖度上相比3.2版本有所提升。
新的GPT4V打标训练集在helloworld3系列4000张基础上翻了一倍,达到了8000张,内容除人像外,还涵盖动物、建筑、自然、美食、插画等内容。但纯GPT4V版本遇到了过拟合的问题,初步判断是训练图数量翻倍导致。如何在保证人像充分训练前提下,尽可能加入更多人像外其他概念,是下一步的迭代优化方向之一。现阶段就先使用了新旧版本融合调优来保证版本间平稳过渡,因此概念集扩充以及GPT4V打标带来的优势目前感知不会非常强。这些优势将在后续5代及6代模型中愈加明显体现出来。