混元DiT,一个基于Diffusiontransformer的文本到图像生成模型,此模型具有中英文细粒度理解能力。为了构建混元DiT,精心设计了Transformer结构、文本编码器和位置编码。构建了完整的数据管道,用于更新和评估数据,为模型优化迭代提供帮助。为了实现细粒度的文本理解,训练了多模态大语言模型来优化图像的文本描述。最终,混元DiT能够与用户进行多轮对话,根据上下文生成并完善图像。
Run9.1K
Comments
Version Detail
HunYuanDiT
Project Permissions
Use without crediting me
Share merges of this model
Use different permissions on merges
Use Permissions
Sell generated images
Use on generation services
Sell this model or merges
Commercial Use
Comments
Related Posts
Describe the image you want to generate, then press Enter to send.