Please upload a picture / image and press "Generate Captions", the model will generate captions for it.
The model uses google/vit-base-patch16-224-in21k or openai/clip-vit-base-patch32
as image encoder, trained together with a customer transformer decoder to generate captions.
The available caption styles are: "Factual 🤖", "Creative 🤪", and "Human like 🫀",
which are actually argmax (greedy), top-K and top-P respectively.