Human vs. AI Perceptual Alignment

AI Research Visualization Opensource Analytics
Human vs. AI Perceptual Alignment

An investigation into whether Vision-Language Models perceive scientific visualizations with the same nuance as human experts. The research evaluates 13 state-of-the-art models on a curated set of images, measuring their alignment with expert judgments on visual purpose and encoding patterns, providing a quantitative view of the gap between machine and human perception.

The increasing use of AI to interpret visual data rests on a fundamental assumption: that these models perceive charts and figures in a way that aligns with human expertise. But is this assumption valid? This research investigates that question, exploring whether the emergent perceptual abilities of Vision-Language Models are consistent with the nuanced judgments of human experts.

To explore this, we designed a systematic evaluation, comparing the classifications of 13 state-of-the-art models against a ground truth of expert annotations on a curated set of scientific visualizations. The focus was on pure visual categorization—assessing a model’s ability to identify a visualization’s purpose, encoding, and dimensionality without any textual context. The engineering behind the study was designed for rigor and reproducibility, using a multi-provider setup with tools like LangChain to ensure a broad and fair comparison.

The goal of this work is not to crown a superior model, but to provide a measured, quantitative view of the alignment gap between human and machine perception. The results offer a fine-grained analysis of where current models succeed and, more critically, where they diverge from human consensus, highlighting specific weaknesses in interpreting complex visual encodings. This research, accepted at IEEE VIS 2025 contributes to a more grounded understanding of the capabilities and limitations of AI in the critical domain of visual data analysis.

Stack

While the problem is more important than the tools, the tech stack tells a story about the project's architecture and trade-offs. Here's what this project is built on: