LFM2.5-VL-Extract — Liquid AI | Rapport 05-06-2026

1 Contexte : Liquid AI en ce moment

Liquid AI enchaîne les publications depuis quelques temps. Fin mai, ils ont sorti LFM2.5-8B-A1B, un modèle MoE (Mixture of Experts) à 8.3B paramètres totaux mais seulement 1.5B actifs par token, optimisé pour le tool calling et l'agentic work sur matériel consumer. C'est dans la même lignée que les précédents LFM2 et LFM2.5, tous conçus pour le déploiement embarqué et edge.

Avec ces nouveaux modèles Extract, Liquid AI pousse encore plus loin la logique : au lieu de produire du texte libre à parser ensuite, les modèles retournent directement un JSON structuré, défini par l'utilisateur via un schéma YAML.

🔗 La famille LFM2.5 en un coup d'œil

LFM2.5-8B-A1B · MoE · 8.3B total / 1.5B actif · Tool calling, agentic

LFM2.5-VL-1.6B / 450M · Vision-language · Captioning, OCR, raisonnement

LFM2.5-VL-XB-Extract ← Nouveau · Extraction JSON structurée depuis images

2 Les deux modèles

Recommandé

LFM2.5-VL-1.6B-Extract

1.6B paramètres (LM) + ~400M encoder SigLIP2 · ~2B total

Meilleur équilibre précision/performance
JSON Validity : 99.6%
F1 Score : 99.6%
VLM Judge Score : 90.6%
Contexte : 128K tokens

Ultra rapide

LFM2.5-VL-450M-Extract

450M paramètres (LM) + ~100M encoder SigLIP2 · ~550M total

Pour déploiement edge et inférence rapide
JSON Validity : 98.9%
F1 Score : 98.8%
VLM Judge Score : 84.5%
Contexte : 128K tokens

Les deux partagent le même mécanisme : vous définissez les champs à extraire dans un schéma YAML au prompt système, le modèle analyse l'image et retourne un objet JSON strictement conforme. Support des énumérations (valeurs prédéfinies) inclus pour un contrôle maximal.

3 Comment ça marche

Prompt système (schéma YAML)

wood_color: The overall coloration of the wood surface
wood_texture: The tactile quality of the wood surface
wood_pattern: The pattern types visible on the wood surface

Sortie JSON

{
  "wood_color": "light tan to beige with darker brown streaks",
  "wood_texture": "smooth with visible grain patterns",
  "wood_pattern": "wavy, linear, irregular"
}

Avec énumérations possibles :

Schéma avec enum

wood_texture: The tactile quality of the wood surface,
  select from smooth, rough, or grainy

Le modèle retournera obligatoirement smooth, rough ou grainy.

Architecture technique

Encoder vision : SigLIP2 (~400M pour 1.6B, ~100M pour 450M)
Backbone : hybrid conv+attention
Précision : bfloat16
Entrée : image unique, résolution dynamique
Décodage recommandé : greedy (temperature=0)

4 Performances

Évaluation sur un benchmark de 2 000 triplets (image, schéma, JSON), labels générés par un ensemble de modèles multimodaux frontier. Trois métriques :

JSON Validity : % de JSON strictement parseable
F1 Score : correspondance des noms de champs (macro-avg)
VLM Judge Score : qualité des valeurs jugées par Qwen3.5-35B-A3B

LFM2.5-VL-1.6B-Extract vs concurrents

Modèle	Params	JSON Validity	F1 Score	VLM Judge
LFM2.5-VL-1.6B-Extract	1.6B	99.6	99.6	90.6
LFM2.5-VL-1.6B (non-Extract)	1.6B	91.8	75.8	66.0
FastVLM-1.5B	1.91B	87.3	80.3	50.9
SmolVLM2-2.2B-Instruct	2.25B	84.4	82.9	64.8
Qwen3.5-2B	2.27B	97.9	97.7	89.7
InternVL3.5-2B	2.35B	99.6	99.2	87.7
Qwen3-VL-4B-Instruct	4.44B	99.8	99.7	92.0

LFM2.5-VL-450M-Extract vs concurrents

Modèle	Params	JSON Validity	F1 Score	VLM Judge
LFM2.5-VL-450M-Extract	0.45B	98.9	98.8	84.5
LFM2.5-VL-450M (non-Extract)	0.45B	97.7	93.5	73.4
Qwen3.5-0.8B	0.87B	96.4	96.3	82.3
InternVL3.5-1B	1.06B	98.0	96.5	80.7
InternVL3.5-2B	2.35B	99.6	99.2	87.7

Point clé : le variant -Extract surpasse largement la version non-Extract de base (ex: 99.6 vs 91.8 en JSON Validity pour le 1.6B), prouvant que l'optimisation pour l'extraction structurée fait une différence massive.

5 Exemple d'utilisation

from transformers import AutoProcessor, AutoModelForImageTextToText
from transformers.image_utils import load_image

model_id = "LiquidAI/LFM2.5-VL-1.6B-Extract"

model = AutoModelForImageTextToText.from_pretrained(
    model_id, device_map="auto", dtype="bfloat16",
    trust_remote_code=True,
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

image = load_image("dashboard.jpg")

fields_yaml = """
clock_time: The time shown on the dashboard clock
ambient_temp: The ambient temperature reading
fuel_level: The current fuel gauge level
"""

system_prompt = f"""Extract the following from the image: {fields_yaml}
Respond with only a JSON object. Do not include any text outside the JSON."""

conversation = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": [{"type": "image", "image": image}]},
]

inputs = processor.apply_chat_template(
    conversation, add_generation_prompt=True,
    return_tensors="pt", return_dict=True, tokenize=True,
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
response = processor.batch_decode(
    outputs[:, inputs["input_ids"].shape[1]:],
    skip_special_tokens=True,
)[0]

print(response)  # {"clock_time": "14:32", "ambient_temp": "22°C", "fuel_level": "67%"}

6 Pistes d'utilisation

📋

Documents & factures

Extraction de montants, dates, numéros de commande depuis des photos de documents, sans pipeline de parsing séparé.

🚗

Cockpit & in-car

Compréhension de l'habitacle en temps réel : tableau de bord, présence passager, alertes visuelles.

🏭

Inspection industrielle

Détection d'anomalies, compte d'objets, vérification de conformité sur chaîne de production.

🛒

E-commerce & retail

Auto-tagging de produits avec attributs structurés : couleur, taille, matériau, état.

📊

Analytics vidéo

Collecte statistique d'informations à travers des frames vidéo : flux piétons, occupation, mouvements.

🛡️

Sécurité & alertes

Détection d'événements critiques : chute, feu, fuite, intrusion — avec déclenchement automatique de systèmes.

💡 Pourquoi c'est intéressant pour Power Apps / Microsoft ecosystem

Imaginez un flux Power Automate qui prend une photo d'une facture, l'envoie à un modèle Extract embarqué (450M sur device), et reçoit du JSON directement injectable dans SharePoint ou Dataverse. Zéro parsing, zéro latence cloud, zéro coût API.

7 Sources

🐦

Thread Twitter de Liquid AI — Annonce LFM2.5-VL-Extract

x.com/liquidai/status/2062686748291846307

🤗

LFM2.5-VL-1.6B-Extract — Hugging Face

huggingface.co/LiquidAI/LFM2.5-VL-1.6B-Extract

🤗

LFM2.5-VL-450M-Extract — Hugging Face

huggingface.co/LiquidAI/LFM2.5-VL-450M-Extract

📖

Vision Models — Liquid Docs

docs.liquid.ai/lfm/models/vision-models

📰

LFM2.5-8B-A1B — Blog Liquid AI

liquid.ai/blog/lfm2-5-8b-a1b

📄

LFM2 Technical Report — arXiv 2511.23404

arxiv.org/abs/2511.23404