MiniGPT-4 is a factual language model that builds upon the capabilities of GPT-4 by combining a vision encoder with a large language model (LLM) called Vicuna. It accomplishes this by using a single projection layer. This allows MiniGPT-4 to perform many of the same tasks that GPT-4 can do, including generating detailed image descriptions and creating websites from handwritten drafts. MiniGPT-4 can also write stories and poems inspired by images, solve problems shown in images, and teach users how to cook based on food photos.