Vision LLMs

LLMs that can work with images, screenshots, or multimodal file inputs in addition to text.

Browse Vision LLMs tools filtered by practical fit and workflow needs.

15 matching tools.

Tools in this category

ChatGPT logo

ChatGPT

Free cloud LLM for writing, research, and file-based analysis.

  • Freemium
  • cloud-llm
  • chat-assistant
  • multimodal

Best for: Daily writing, rewriting, and brainstorming, Quick research and summary work from uploaded files

Claude logo

Claude

Cloud LLM known for strong writing quality and explicit model-improvement controls.

  • Freemium
  • cloud-llm
  • chat-assistant
  • multimodal

Best for: Proposal and client communication drafting, Long-form editing and narrative refinement

DeepSeek-VL2 logo

DeepSeek-VL2

Mixture-of-experts local vision-language family for OCR, documents, charts, and grounded multimodal reasoning.

  • Free
  • local-inference
  • open-weights
  • self-hosted

Best for: Private visual document analysis, Multimodal document understanding

Gemini logo

Gemini

Free cloud LLM with published daily prompt limits and research-focused workflows.

  • Freemium
  • cloud-llm
  • chat-assistant
  • multimodal

Best for: Research briefs and competitive scans, Long-form summarization and outline generation

Gemma 3 logo

Gemma 3

Portable open-weight family with long context and multimodal options under custom terms.

  • Free
  • local-inference
  • open-weights
  • on-device

Best for: Local assistants with manageable compliance processes, Multimodal summarization and extraction

GLM (Z.AI) logo

GLM (Z.AI)

Z.AI’s cloud GLM assistant and API stack for coding, reasoning, and multilingual business workflows.

  • Freemium
  • cloud-llm
  • chat-assistant
  • multimodal

Best for: Cloud coding assistants and technical drafting, Multilingual business operations support

InternVL 3.5 logo

InternVL 3.5

Apache-2.0 multimodal family with many size options and a strong focus on reasoning, OCR, and agent-style visual tasks.

  • Free
  • local-inference
  • open-weights
  • self-hosted

Best for: Multimodal internal analysis workflows, Builders experimenting with vision-language tasks

Le Chat logo

Le Chat

Mistral’s cloud LLM chat with clear plan-level training defaults and opt-out controls.

  • Freemium
  • cloud-llm
  • chat-assistant
  • multimodal

Best for: Multilingual drafting and editing, Teams that require explicit training opt-out controls

Llama 3.2 Vision logo

Llama 3.2 Vision

Vision-capable Llama model for local image-plus-text understanding tasks.

  • Free
  • local-inference
  • open-weights
  • self-hosted

Best for: Local image + text analysis workflows, Multimodal document understanding

Llama 4 logo

Llama 4

Open-weight multimodal family with massive context, but significant policy and license constraints.

  • Free
  • local-inference
  • open-weights
  • multimodal

Best for: Large multi-document summarization pipelines, Multimodal internal analysis workflows

MiniCPM-V 2.6 logo

MiniCPM-V 2.6

Efficient local VLM with strong OCR, multi-image, and video understanding in an 8B-class footprint.

  • Free
  • local-inference
  • open-weights
  • self-hosted

Best for: Private visual document analysis, Multimodal local assistant workflows

Molmo logo

Molmo

Open vision-language family from AI2 focused on strong multimodal quality with Apache-2.0 licensing.

  • Free
  • local-inference
  • open-weights
  • self-hosted

Best for: Multimodal document understanding, Private visual document analysis

Phi-3.5 Vision Instruct logo

Phi-3.5 Vision Instruct

Compact MIT-licensed multimodal model for local image, OCR, chart, and multi-image reasoning tasks.

  • Free
  • local-inference
  • open-weights
  • on-device

Best for: Multimodal document understanding, Private visual document analysis

Qwen Chat logo

Qwen Chat

Alibaba’s cloud Qwen assistant with multilingual support and enterprise-grade API access through Model Studio.

  • Freemium
  • cloud-llm
  • chat-assistant
  • multimodal

Best for: Multilingual drafting and rewriting, Cost-controlled cloud assistant operations

Qwen2.5 VL logo

Qwen2.5 VL

Multimodal Qwen model family for local vision-language workflows.

  • Free
  • local-inference
  • open-weights
  • self-hosted

Best for: Multimodal local assistant workflows, Private visual document analysis

Related categories

View all categories · View all tools

Alternatives to explore

Share This Page