llava-1.5-7b-hf
Model ID: @cf/llava-hf/llava-1.5-7b-hf
LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture.
Properties
Task Type: Image-to-Text
Code Examples
Workers - TypeScript
export interface Env { AI: Ai;
}
export default { async fetch(request: Request, env: Env): Promise<Response> { const res: any = await fetch("https://cataas.com/cat"); const blob = await res.arrayBuffer(); const input = { image: [...new Uint8Array(blob)], prompt: "Generate a caption for this image", max_tokens: 512, }; const response = await env.AI.run( "@cf/llava-hf/llava-1.5-7b-hf", input ); return new Response(JSON.stringify(response)); },
} satisfies ExportedHandler<Env>;
Response
{ "description": " This is a photo of a supdog."
}
API Schema
The following schema is based on JSON SchemaInput JSON Schema
Output JSON Schema