Human vs. AI Perceptual Alignment

AI Research Visualization Opensource Analytics
Human vs. AI Perceptual Alignment

An investigation into whether Vision-Language Models categorize scientific visualizations in ways that align with expert judgment. The study evaluates 13 models on a labeled image set and measures agreement on visual purpose and encoding patterns.

The increasing use of AI to interpret visual data rests on a key assumption: that models “see” charts and figures in ways that match human expertise. This study tests that assumption by comparing Vision-Language Model outputs with expert annotations.

To explore this, we designed a systematic evaluation, comparing the classifications of 13 models against a ground truth of expert annotations on a labeled set of scientific visualizations. The focus was on pure visual categorization—assessing a model’s ability to identify a visualization’s purpose, encoding, and dimensionality without any textual context. The engineering behind the study was designed for rigor and reproducibility, using a multi-provider setup with tools like LangChain to ensure a broad comparison.

The goal is not to crown a “best” model, but to provide a measured, quantitative view of the alignment gap. The results show where current models agree with human consensus and where they reliably diverge, including specific failure modes on complex encodings. This research, accepted at IEEE VIS 2025, contributes evidence about current capabilities and limitations in visual data analysis.

Stack

While the problem is more important than the tools, the tech stack tells a story about the project's architecture and trade-offs. Here's what this project is built on: