# peter.gy > AI-friendly documentation for peter.gy *Complete documentation content below* # Hi, I'm Péter > I build systems for agentic data visualization and analysis, based in Zürich and Vienna. 🇨🇭 Zürich, Switzerland | 🇦🇹 Vienna, Austria I build visual analytics systems for research and production. My work centers on knowledge representation for steering and grounding AI systems. I currently apply this to agentic data analysis, visualization recommendation, and automated chart critique. Currently, I’m wrapping up my MSc in Data Science at the University of Vienna with [Torsten Möller](https://cs.univie.ac.at/torsten.moeller) while working with the IVIA Lab team at ETH Zürich . I’m also lucky to collaborate on research with [Laura Koesten](https://laurakoesten.github.io/) at MBZUAI and [Dominik Moritz](https://www.domoritz.de/) at Carnegie Mellon University . Before diving into research, I spent nearly a decade shipping production software. I built telemetry platforms at Magna , shipped geospatial tools for the City of Vienna and the Austrian Federal Ministry , and led the engineering team at STOIC . Today, I balance my focus between industry and academia to keep my research grounded in engineering reality. # Hi, I'm Péter 🇨🇭 Zürich, Switzerland | 🇦🇹 Vienna, Austria I build visual analytics systems for research and production. My work centers on knowledge representation for steering and grounding AI systems. I currently apply this to agentic data analysis, visualization recommendation, and automated chart critique. Currently, I’m wrapping up my MSc in Data Science at the University of Vienna with [Torsten Möller](https://cs.univie.ac.at/torsten.moeller) while working with the IVIA Lab team at ETH Zürich . I’m also lucky to collaborate on research with [Laura Koesten](https://laurakoesten.github.io/) at MBZUAI and [Dominik Moritz](https://www.domoritz.de/) at Carnegie Mellon University . Before diving into research, I spent nearly a decade shipping production software. I built telemetry platforms at Magna , shipped geospatial tools for the City of Vienna and the Austrian Federal Ministry , and led the engineering team at STOIC . Today, I balance my focus between industry and academia to keep my research grounded in engineering reality. ## Featured Projects ### [Agentic Visual Reporting](/projects/agentic-visual-reporting) Winner of the IEEE VIS 2025 VISxGenAI Challenge. An agentic system for data analysis that pairs LLM agents with deterministic visualization modules. It produces interactive reports for readers and executable notebooks for analysts, so results can be inspected, rerun, and adapted. AIResearchVisualization+ 4 [Demo ](https://peter-gy.github.io/VISxGenAI-2025/gallery/vispub-submission)[Info ](https://peter-gy.github.io/VISxGenAI-2025)[Code ](https://github.com/peter-gy/VISxGenAI-2025)[Paper](https://arxiv.org/pdf/2509.05721) ### [Driva](/projects/driva) An exploration of how to translate vehicle telemetry into driver-facing feedback. The project focuses on turning raw automotive signals into interpretable metrics that support safer and more efficient driving. AnalyticsFull-stackPlatform+ 5 [Project ](https://www.magna.com/products/powertrain/automotive-applications---cloud-services)[Slides](https://www.magna.com/docs/default-source/2023-iaa/product-insight/powertrain_storyboards.pdf) ### [Draco 2](/projects/draco2) A constraint-based system for visualization recommendation. It encodes design knowledge as logical rules and uses a renderer-agnostic format, so researchers and practitioners can extend and validate chart designs in a computational way. AIResearchVisualization+ 2 [Code ](https://github.com/cmudig/draco2)[Paper ](https://arxiv.org/pdf/2308.14247.pdf)[Slides ](/projects/draco2/slides.pdf)[Post](https://vda.cs.univie.ac.at/news-events/detail/news/congratulations-peter-ferenc-gyarmati/) [Show More Projects](/projects) ## Timeline January 4, 2026 Arrived in Zürich to start collaborating with ETH Zürich ’s IVIA Lab . December 14, 2025 Wrapped up my stay in Abu Dhabi at MBZUAI . Had a lot of fun with the team. Submitted a [conference paper](https://arxiv.org/abs/2512.20306v1) and had multiple PRs accepted to Apple ’s [Embedding Atlas](https://github.com/apple/embedding-atlas/pulls?q=is%3Apr+is%3Aclosed+author%3Apeter-gy). November 3, 2025 Won 1st place at the [VISxGenAI Workshop](https://visxgenai.github.io/) challenge at [IEEE VIS 2025](https://ieeevis.org/year/2025/welcome) for my [agentic system for visual reporting](/projects/agentic-visual-reporting). October 21, 2025 Received the Research Award for Students from the University of Vienna Faculty of Computer Science, recognizing exceptional research productivity during my MSc. September 26, 2025 Got to attend MBZUAI ’s honorary doctorate [ceremony](https://www.linkedin.com/posts/mbzuai_mbzuai-activity-7377327432506269696-yW75/) for [Sam Altman](https://x.com/sama), CEO of OpenAI , in the presence of HH Sheikh Khaled bin Mohamed bin Zayed Al Nahyan. September 15, 2025 Arrived in Abu Dhabi to begin my visiting role at MBZUAI . September 14, 2025 My work on an [agentic system for visual reporting](/projects/agentic-visual-reporting) was invited for presentation at the [VISxGenAI Workshop](https://visxgenai.github.io/) at [IEEE VIS 2025](https://ieeevis.org/year/2025/welcome). June 24, 2025 My [research](/projects/human-vs-ai-perceptual-alignment-study) on the alignment of human and AI perception in visualization was accepted for presentation at [IEEE VIS 2025](https://ieeevis.org/year/2025/). May 20, 2025 Began a freelance engagement with a US AI lab, focused on improving the agentic software engineering capabilities of their models. February 11, 2025 Accepted a visiting role in the Human-Computer Interaction department at MBZUAI in Abu Dhabi. My work focuses on an AI system for principled, context-aware feedback on visualization design. February 1, 2025 Left STOIC after our long-term visions diverged. I’m grateful for the support my colleagues showed during the transition. December 18, 2024 Received the [Best of the Best award](https://informatik.univie.ac.at/fakultaet/media-wall/fotogalerien/fakultaetsveranstaltung2024/) from the University of Vienna for outstanding academic performance and shared my experiences in an accompanying [interview](https://informatik.univie.ac.at/studium/fuer-studieninteressierte/informatik-absolventinnen-erzaehlen/interviews-mit-den-besten-absolventinnen-im-studienjahr-20222023/interview-mit-peter-ferenc-gyarmati/). October 21, 2024 Showcased STOIC ’s AI-powered portfolio management system to investors and stakeholders at the London Stock Exchange . June 10, 2024 Presented the redesigned STOIC platform to investors in London, demonstrating an experimental hybrid system for eliminating hallucinations in data-driven LLM outputs. Promoted to VP of Engineering the same week. January 1, 2024 Initiated the migration of the STOIC platform from Node.js to Python on AWS to better support large-scale AI and ML workloads. October 25, 2023 Presented our short paper on [Draco 2](/projects/draco2) at [IEEE VIS 2023](https://virtual.ieeevis.org/year/2023/paper_v-short-1018.html) in Melbourne, Australia, and spent a week engaging with the international visualization community. October 15, 2023 The [PLUTO platform](/projects/pluto-public-value-assessment-tool) was officially launched at the [World Health Summit](https://www.who.int/news-room/events/detail/2023/10/15/default-calendar/world-health-summit-2023) in Berlin. October 1, 2023 To align my studies with my professional focus, I transitioned to the Master’s in Data Science program at the University of Vienna , concentrating on AI, ML, and data visualization in a Human-Computer Interaction context. September 27, 2023 Presented our [Draco 2](/projects/draco2) research on automating visualization recommendation with Answer Set Programming at the University of Potsdam . August 26, 2023 Married after seven years together. July 3, 2023 Assumed the role of Lead Developer at STOIC , contributing to the development of an Intelligent Data Cloud that augments human intelligence with generative AI, machine learning, and symbolic logic. April 21, 2023 Delivered a technical [talk](https://x.com/sssecki/status/1649326676835201029) on code generation and metaprogramming in Dart at the [FlutterVienna](https://x.com/fluttervienna) meetup. March 22, 2023 Won the hackathon at the [FlutterVienna](https://x.com/fluttervienna) meetup, receiving tickets to the [WeAreDevelopers World Congress](https://www.wearedevelopers.com/world-congress) in Berlin. March 1, 2023 Began my Master’s in Computer Science at the University of Vienna . December 18, 2022 Began a collaboration with the Political Science Department at the University of Vienna to build a digital platform for their Public VaLUe Assessment TOol (PLUTO), based on a [data solidarity framework published in _The Lancet_](https://www.thelancet.com/journals/landig/article/PIIS2589-7500\(22\)00189-3/fulltext). December 1, 2022 Graduated with a Bachelor’s in Computer Science from the University of Vienna with distinction. October 1, 2022 Awarded the Merit Scholarship from the University of Vienna , ranking first in the Computer Science program with a perfect 1.0/1.0 ECTS average for the academic year. July 28, 2022 [Contributed](https://github.com/uwdata/draco/pull/110) to the Draco visualization project to support an integration for my [BSc thesis project](https://github.com/peter-gy/VisRecly). This led to an invitation from [@domoritz](https://github.com/domoritz) to join the [Draco 2 research collaboration](/projects/draco2) and start weekly discussions with researchers from University of Washington , University of Maryland , and Carnegie Mellon University . June 1, 2022 Transitioned to a full-time, full-stack developer role at LEAN-FORGE while completing my Computer Science degree. June 1, 2021 To deepen my frontend skills, I joined LEAN-FORGE as a Mobile and Web Developer, working with Flutter and Next.js . I did this in parallel with my work at TU Wien EEG and my studies. October 15, 2020 Joined the Energy Economics Research Group at TU Wien as a System Administrator and Backend Developer. I worked on geospatial analytics, built full-stack platforms for energy system modeling, and maintained bare-metal servers for long-running simulations and sensitive datasets. September 1, 2019 Relocated to Vienna to begin my Bachelor’s in Computer Science at the University of Vienna . July 4, 2019 Graduated from the [IB Diploma Programme](https://www.ibo.org/programmes/diploma-programme/) with a score of 42/45, setting a new school record at the time and achieving its first 7 in Higher Level Mathematics. May 10, 2019 My Java project for learning graph algorithms received a special commendation and the Talent Passport at the national finals of the [28th Youth Scientific and Innovation Talent Search Competition](https://www.innovacio.hu/3a_hu_28_1fordulo.php). The award led to an invitation to the [Hungarian Innovation Grand Prix](https://nkfih.gov.hu/english/online/2019-hungarian-innovation-grand-prix) at the Hungarian Parliament. April 26, 2019 As a team of two high school students, [advanced to the finals](https://www.linkedin.com/posts/petergy_coding-competitions-activity-6529047965628272640-koYt) of the Thyssenkrupp Coding Tomorrow Cup, competing alongside senior programmers and MSc teams. January 29, 2019 My project on interactive graph algorithm visualization was selected for the national finals of the [28th Youth Scientific and Innovation Talent Search Competition](https://www.innovacio.hu/3a_hu_28_1fordulo.php). It also came with mentorship from physicist [Norbert Kroó](https://wigner.hu/en/infopages/kroo.norbert) at the Institute for Particle and Nuclear Physics. November 5, 2018 To make complex graph algorithms more accessible, I built a Java framework for interactive learning and visualization. I submitted it to the [28th Youth Scientific and Innovation Talent Search Competition](https://www.innovacio.hu/3a_hu_28_1fordulo.php). October 30, 2018 Achieved [first place](https://fb.watch/C7qpQ_6UIZ/) in the opening round of Ericsson ’s programming championship by adapting Dijkstra’s algorithm to solve a pathfinding problem in a competitive simulation. May 15, 2017 Awarded a full scholarship to the [IB Diploma Programme](https://www.ibo.org/programmes/diploma-programme/), specializing in Higher Level Mathematics, Computer Science, and German. April 3, 2017 Joined Ripassa as a Software Developer to build automations and integrations in Python and backend services with Java . April 1, 2016 Wrote my first line of code with [Swift Playgrounds](https://developer.apple.com/swift-playground/). Having studied several human languages, I noticed a similar structure in programming and shifted my focus to this new form of problem-solving. January 18, 2016 After 12 years of competitive basketball (including medals in national youth championships), I shifted my focus to academic and technical work. --- # Projects > A selection of my work across production systems and research tooling. A selection of my work across production systems and research tooling. Each project includes a short write-up and links to code, demos, papers, or recordings. ### [Agentic Visual Reporting](/projects/agentic-visual-reporting) Winner of the IEEE VIS 2025 VISxGenAI Challenge. An agentic system for data analysis that pairs LLM agents with deterministic visualization modules. It produces interactive reports for readers and executable notebooks for analysts, so results can be inspected, rerun, and adapted. AIResearchVisualization+ 4 [Demo ](https://peter-gy.github.io/VISxGenAI-2025/gallery/vispub-submission)[Info ](https://peter-gy.github.io/VISxGenAI-2025)[Code ](https://github.com/peter-gy/VISxGenAI-2025)[Paper](https://arxiv.org/pdf/2509.05721) ### [Consultancy Web Platform](/projects/consultancy-web-platform) A bilingual web platform for a digital consultancy that needed a professional online presence with full editorial control. The system lets non-technical editors publish and maintain multilingual content without creating a developer bottleneck. WebFull-stackPlatform [Website](https://ripassa.hu/) ### [Human vs. AI Perceptual Alignment](/projects/human-vs-ai-perceptual-alignment-study) An investigation into whether Vision-Language Models categorize scientific visualizations in ways that align with expert judgment. The study evaluates 13 models on a labeled image set and measures agreement on visual purpose and encoding patterns. AIResearchVisualization+ 2 [Analysis ](https://molab.marimo.io/notebooks/nb_P78Fecf4gZkYE4MCKXcanW)[Code ](https://github.com/peter-gy/AutoVisType)[Paper ](https://arxiv.org/pdf/2509.05718)[Poster](/projects/human-vs-ai-perceptual-alignment-study/poster.pdf) ### [D2 Widget](/projects/d2-widget) A Python widget that embeds the D2 diagram language directly into Jupyter and Marimo. It provides live, text-based diagramming with inline previews, so diagrams stay close to the surrounding analysis. VisualizationOpensource+ 2 [Demo ](https://d2-widget.peter.gy)[Code](https://github.com/peter-gy/d2-widget) ### [Quarto x Gradio Extension](/projects/quarto-gradio-extension) A Quarto extension that embeds runnable Gradio UIs directly into documentation pages. Examples run in the browser via Pyodide, so readers can experiment without local setup or a backend. VisualizationOpensource+ 2 [Demo ](https://quarto-gradio.peter.gy/playground/chatbots/chatbot.html)[Website ](https://quarto-gradio.peter.gy/)[Code ](https://github.com/peter-gy/quarto-gradio)[Application](https://peter-gy.github.io/static/uni/2024w/mds/project/) ### [Men's Health Web Platform](/projects/mens-health-web-platform) A web platform for a men's health publication focused on aligning editorial strategy with audience needs. It pairs a fast public site with a lightweight analytics pipeline that helps editors spot content gaps and topical imbalances beyond page-view metrics. WebFull-stackPlatformAnalytics [Website](https://apa.hu) ### [Driva](/projects/driva) An exploration of how to translate vehicle telemetry into driver-facing feedback. The project focuses on turning raw automotive signals into interpretable metrics that support safer and more efficient driving. AnalyticsFull-stackPlatform+ 5 [Project ](https://www.magna.com/products/powertrain/automotive-applications---cloud-services)[Slides](https://www.magna.com/docs/default-source/2023-iaa/product-insight/powertrain_storyboards.pdf) ### [PLUTO: Public Value Assessment Tool](/projects/pluto-public-value-assessment-tool) A questionnaire-based tool for assessing the public value of data practices. It translates abstract ethics questions into a structured workflow and outputs a risk-benefit analysis with concrete recommendations. ResearchWebOpensource+ 3 [Tool ](https://pluto.univie.ac.at/)[Code ](https://github.com/PLUTO-UniWien/PLUTO)[Paper ](https://arxiv.org/pdf/2509.12773)[Launch Event](https://www.youtube.com/watch?v=BpC68YzMVzM) ### [Draco 2](/projects/draco2) A constraint-based system for visualization recommendation. It encodes design knowledge as logical rules and uses a renderer-agnostic format, so researchers and practitioners can extend and validate chart designs in a computational way. AIResearchVisualization+ 2 [Code ](https://github.com/cmudig/draco2)[Paper ](https://arxiv.org/pdf/2308.14247.pdf)[Slides ](/projects/draco2/slides.pdf)[Post](https://vda.cs.univie.ac.at/news-events/detail/news/congratulations-peter-ferenc-gyarmati/) --- # Agentic Visual Reporting > Winner of the IEEE VIS 2025 VISxGenAI Challenge. An agentic system for data analysis that pairs LLM agents with deterministic visualization modules. It produces interactive reports for readers and executable notebooks for analysts, so results can be inspected, rerun, and adapted. Most data analysis workflows force a difficult trade-off between speed and reliability. Manual analysis produces trustworthy results but doesn’t scale, while automated tools are fast but often yield black-box outputs that are difficult to verify or adapt. This project explores a different approach to resolve this tension. The system uses a pipeline of **eleven specialized AI agents**, but its core design principle is a hybrid architecture. It delegates creative and interpretive work—like planning insights and generating narratives—to AI, while assigning tasks that demand precision and consistency to deterministic components. This separation of concerns is fundamental to producing results that are both insightful and verifiable. The result is a set of **two complementary outputs** that serve different needs. For readers, it generates interactive web reports for exploring data beyond the initial analysis. For analysts, it produces executable Marimo notebooks, providing full transparency to verify or adapt the process. This dual-output model is a deliberate step towards a more effective human-AI collaboration, where the AI augments human analysis rather than trying to replace it. The system is built with observability in mind. Every model call and data transformation is tracked using tools like Langfuse . It also uses in-browser computation for interactive exploration and object storage like Cloudflare R2 for report artifacts. Invited for a live presentation at the [IEEE VISxGenAI 2025 Workshop Challenge](https://visxgenai.github.io/), the project prototypes a workflow where AI drafts and deterministic modules enforce constraints. The goal is to **augment rather than replace** human analysis by keeping outputs inspectable and rerunnable. ## Stack While the problem is more important than the tools, the tech stack tells a story about the project's architecture and trade-offs. Here's what this project is built on: ### Platforms & Runtimes [Python](https://www.python.org/) [Runs the 11-agent orchestration, data processing, report generation pipeline, and provenance notebooks.](https://www.python.org/) [Pyodide](https://pyodide.org/) [Provides the browser-side Python runtime for dynamically generated Marimo notebooks.](https://pyodide.org/) [Clingo](https://potassco.org/clingo/) [Solves logic programs for selecting visualizations under formal design constraints.](https://potassco.org/clingo/) [Node.js](https://nodejs.org/) [Runs documentation tooling and build steps for report artifacts and site generation.](https://nodejs.org/) [TypeScript](https://www.typescriptlang.org/) [Implements the Observable Notebook builder service and the VitePress config for the docs site.](https://www.typescriptlang.org/) [JavaScript](https://developer.mozilla.org/en-US/docs/Web/JavaScript) [Implements the generated Observable notebooks and their interactive report UI.](https://developer.mozilla.org/en-US/docs/Web/JavaScript) ### Frontend & Visualization [Observable Notebook](https://observablehq.com/notebooks/2) [Hosts interactive HTML reports that readers can explore and analysts can edit.](https://observablehq.com/notebooks/2) [Mosaic](https://idl.uw.edu/mosaic/) [Enables coordinated views and cross-filtering in generated reports.](https://idl.uw.edu/mosaic/) [Draco](https://github.com/cmudig/draco2) [Synthesizes visualization specifications from constraints for chart generation.](https://github.com/cmudig/draco2) [Vega-Lite](https://vega.github.io/vega-lite/) [Renders charts from Draco-derived specifications for the final report visuals.](https://vega.github.io/vega-lite/) [Vue.js](https://vuejs.org/) [Implements interactive components within the documentation site and gallery.](https://vuejs.org/) [VitePress](https://vitepress.dev/) [Builds the documentation site and gallery as a static site.](https://vitepress.dev/) ### AI & Machine Learning [DSPy](https://dspy.ai/) [Orchestrates 11 specialized agents via modular prompts and tracing.](https://dspy.ai/) [OpenAI API](https://platform.openai.com/) [Provides models for dataset description, insight planning, and narrative generation.](https://platform.openai.com/) [Anthropic API](https://www.anthropic.com/api) [Provides models for coding-heavy agent steps alongside other providers.](https://www.anthropic.com/api) [Gemini](https://ai.google.dev/) [Provides models used for mapping codes to human-readable labels.](https://ai.google.dev/) [Google Vertex AI](https://cloud.google.com/vertex-ai) [Provides an additional managed inference backend used selectively for agent runs.](https://cloud.google.com/vertex-ai) [OpenRouter API](https://openrouter.ai/) [Provides alternative models behind a uniform API for stage-specific routing.](https://openrouter.ai/) ### Data Engineering [DuckDB](https://duckdb.org/) [Runs SQL queries both on the server and in the browser (via WASM) for interactive exploration.](https://duckdb.org/) [Polars](https://pola.rs/) [Applies transformations to the raw input dataset based on agent-generated metadata.](https://pola.rs/) [NumPy](https://numpy.org/) [Computes dataset statistics used during insight discovery.](https://numpy.org/) [Parquet](https://parquet.apache.org/) [Stores dataset artifacts so they can be queried and processed outside the agent pipeline.](https://parquet.apache.org/) [Pandas](https://pandas.pydata.org/) [Reads remote datasets in Pyodide-backed notebooks when browser-side engines cannot fetch them directly.](https://pandas.pydata.org/) [Apache Arrow](https://arrow.apache.org/) [Moves data between engines without copies (e.g., DuckDB ↔ Polars).](https://arrow.apache.org/) ### Backend & APIs [Pydantic](https://docs.pydantic.dev/) [Validates and serializes agent inputs/outputs and supports schema-driven prompting.](https://docs.pydantic.dev/) [OpenTelemetry](https://opentelemetry.io/) [Collects agent traces in a standard format for observability and debugging.](https://opentelemetry.io/) [Vega-Altair](https://altair-viz.github.io/) [Renders Vega-Lite visualizations from Draco specifications via a Python API.](https://altair-viz.github.io/) ### External Services [Langfuse](https://langfuse.com/) [Tracks traces, token usage, and latency across agents for observability and QA.](https://langfuse.com/) [Infisical](https://infisical.com/) [Manages API keys used by agents for submissions to the evaluation server.](https://infisical.com/) [Umami](https://umami.is/) [Tracks usage analytics for the documentation site.](https://umami.is/) ### Cloud & DevOps [Cloudflare R2](https://www.cloudflare.com/products/r2/) [Stores report assets and artifacts via S3-compatible object storage.](https://www.cloudflare.com/products/r2/) [Coolify](https://coolify.io/) [Runs self-hosted Langfuse, Infisical, Umami, and the Observable Notebook Builder service.](https://coolify.io/) [Docker](https://www.docker.com/) [Packages the Observable Notebook Builder service for deployment on Coolify.](https://www.docker.com/) [GitHub Actions](https://github.com/features/actions) [Builds the VitePress documentation site and deploys it to GitHub Pages.](https://github.com/features/actions) ### Development Tooling [uv](https://github.com/astral-sh/uv) [Installs and resolves Python dependencies for local development and reproducible runs.](https://github.com/astral-sh/uv) [Ruff](https://docs.astral.sh/ruff/) [Lints and formats Python sources in the agent pipeline.](https://docs.astral.sh/ruff/) [Marimo](https://marimo.io/) [Notebook environment for running end-to-end pipelines during development and debugging.](https://marimo.io/) [pnpm](https://pnpm.io/) [Manages Node.js dependencies for docs and report build tooling.](https://pnpm.io/) [Vite](https://vitejs.dev/) [Builds and previews the documentation site and related frontend assets.](https://vitejs.dev/) [Biome](https://biomejs.dev/) [Formats and lints JavaScript/TypeScript code for docs and build tooling.](https://biomejs.dev/) --- # Consultancy Web Platform > A bilingual web platform for a digital consultancy that needed a professional online presence with full editorial control. The system lets non-technical editors publish and maintain multilingual content without creating a developer bottleneck. For a digital consultancy, projecting a professional image across local and international markets is a common challenge. The harder part is doing it while keeping **complete editorial control** in the hands of the team, without turning every content update into an engineering task. The goal was a platform that stays fast and accessible, but remains simple for non-technical editors to operate. The solution is a split architecture: a headless CMS for editing and a separate delivery layer for the public site. Editors can publish and maintain bilingual content through the CMS, while the public site stays performant and stable. A key architectural decision was to **decouple content management from content delivery**. Using a headless CMS like Strapi provides a flexible and self-hosted backend for editors, while a framework like Next.js handles the high-performance delivery of the public-facing site. This separation is what enables the balance of editorial freedom and cross-regional reach, ensuring the platform is both easy to update and fast for all users, regardless of their location. ## Stack While the problem is more important than the tools, the tech stack tells a story about the project's architecture and trade-offs. Here's what this project is built on: ### Platforms & Runtimes [TypeScript](https://www.typescriptlang.org/) [Implements all frontend and shared configuration with static typing for Next.js pages, components, and environment validation.](https://www.typescriptlang.org/) [Node.js](https://nodejs.org/) [Runs Next.js and Strapi in development and production, including build-time tasks and the Strapi CMS server.](https://nodejs.org/) [Next.js](https://nextjs.org/) [React meta framework powering the site with server-side rendering, static generation, and API routes for dynamic content.](https://nextjs.org/) ### Frontend & Visualization [React](https://reactjs.org/) [Renders the public website UI, components, and interactive elements in the Next.js app.](https://reactjs.org/) [Tailwind CSS](https://tailwindcss.com/) [Used for styling the web UI and shared components in a consistent, variable-driven manner with utility-first classes.](https://tailwindcss.com/) [Radix UI](https://www.radix-ui.com/) [Provides accessible primitives (navigation, dropdown) as the base for customized UI components used across the site.](https://www.radix-ui.com/) ### Backend & APIs [Strapi](https://strapi.io/) [Hosts the headless CMS for pages, insights, navigation, and media with admin UI and REST API consumed by the Next.js app with webhooks for dynamic revalidation of static content.](https://strapi.io/) [PostgreSQL](https://www.postgresql.org/) [Backs Strapi's content storage with a reliable, scalable relational database.](https://www.postgresql.org/) ### External Services [MailerLite](https://www.mailerlite.com/) [Supports newsletter/signup embeds via the universal snippet types, enabling initialization of forms and popups on the website.](https://www.mailerlite.com/) [Google Analytics](https://analytics.google.com/analytics/web/) [Tracks and reports website traffic, providing insights into user behavior and engagement.](https://analytics.google.com/analytics/web/) ### Cloud & DevOps [Vercel](https://vercel.com/) [Used for hosting the Next.js frontend with automatic deployments from the GitHub monorepo.](https://vercel.com/) [Docker](https://www.docker.com/) [Containerizes the Strapi CMS for production with a multi-stage image that builds and runs the CMS service.](https://www.docker.com/) [Coolify](https://coolify.io/) [Used for self-hosting the Strapi CMS with managed Docker deployment, reverse proxy, and SSL.](https://coolify.io/) [GitHub Actions](https://github.com/features/actions) [Used for automatically building the Strapi CMS Docker image and pushing it to GitHub Container Registry on new releases.](https://github.com/features/actions) ### Development Tooling [pnpm](https://pnpm.io/) [Manages monorepo workspaces for web and CMS, controls install scope, and locks dependency versions across packages.](https://pnpm.io/) [Turbopack](https://turbo.build/pack) [Accelerates Next.js development and build steps configured for the frontend app, improving local iteration speed.](https://turbo.build/pack) [Biome](https://biomejs.dev/) [Formats and lints at the workspace root to maintain consistent code style across packages.](https://biomejs.dev/) --- # D2 Widget > A Python widget that embeds the D2 diagram language directly into Jupyter and Marimo. It provides live, text-based diagramming with inline previews, so diagrams stay close to the surrounding analysis. The workflow for creating diagrams in computational notebooks is often a broken experience. It forces a context switch to external tools, reducing a dynamic process of thought and exploration into a static one of importing and managing images. This interruption breaks the very interactive loop that makes notebooks a powerful environment for analysis and documentation. This widget brings the diagramming process back into the notebook. It integrates the **declarative diagram language** D2 directly into environments like Jupyter and Marimo , so you can write text-based diagrams and see live previews inline. The goal is to treat diagrams as code: versionable, reproducible, and integrated with the surrounding analysis. The implementation uses anywidget to bridge the Python backend with a lightweight frontend. By handling notebook-specific integration details behind the scenes, the widget stays simple to use for quick sketches and programmatic diagram generation. ## Stack While the problem is more important than the tools, the tech stack tells a story about the project's architecture and trade-offs. Here's what this project is built on: ### Platforms & Runtimes [Python](https://www.python.org/) [Implements the widget backend and the Jupyter integration.](https://www.python.org/) [TypeScript](https://www.typescriptlang.org/) [Implements the widget frontend in JavaScript with type safety.](https://www.typescriptlang.org/) [D2](https://d2lang.com/) [Renders declarative diagram text into SVG output for notebook cells.](https://d2lang.com/) ### Frontend & Visualization [Marimo](https://marimo.io/) [Hosts the playground app used to test and demonstrate d2-widget bindings.](https://marimo.io/) [Jupyter](https://jupyter.org/) [Runs and displays the widget inside notebook cells.](https://jupyter.org/) [anywidget](https://anywidget.dev/) [Bridges the Python backend and JavaScript frontend for rendering and events.](https://anywidget.dev/) ### External Services [PyPI](https://pypi.org/) [Distributes d2-widget for installation via pip.](https://pypi.org/) ### Cloud & DevOps [GitHub Actions](https://github.com/features/actions) [Runs CI and deploys the playground to GitHub Pages.](https://github.com/features/actions) ### Development Tooling [uv](https://github.com/astral-sh/uv) [Manages Python dependencies and virtual environments.](https://github.com/astral-sh/uv) [Ruff](https://docs.astral.sh/ruff/) [Lints and formats the Python codebase.](https://docs.astral.sh/ruff/) [Hatch](https://hatch.pypa.io/) [Packages the Python widget with integrated JavaScript assets.](https://hatch.pypa.io/) [pnpm](https://pnpm.io/) [Manages JavaScript dependencies and build artifacts.](https://pnpm.io/) [ESBuild](https://esbuild.github.io/) [Bundles the widget frontend for notebook integration.](https://esbuild.github.io/) [Biome](https://biomejs.dev/) [Formats and lints the JavaScript/TypeScript code.](https://biomejs.dev/) --- # Draco 2 > A constraint-based system for visualization recommendation. It encodes design knowledge as logical rules and uses a renderer-agnostic format, so researchers and practitioners can extend and validate chart designs in a computational way. Creating effective visualizations requires a deep understanding of design theory, yet most automated tools are rigid, making it difficult to customize them with new knowledge. They often treat design as a fixed template, which fails to capture the nuanced principles that guide human perception. This turns what should be an evolving practice into a static one, disconnected from ongoing research. Draco 2 approaches this problem differently by treating visualization design not as a set of templates, but as a system of **logical constraints**. The core innovation is a **generic, renderer-agnostic specification format** that decouples the abstract principles of good design from any single rendering library. This allows the system to reason about the effectiveness of a chart—why certain choices work and others don’t—in a formal, computational way. Using an Answer Set Programming solver like Clingo , the framework can recommend optimal designs or validate existing ones against its knowledge base of design rules. By making design expertise computational and extensible, the project aims to democratize effective data visualization. This research, recognized with a **Best Short Paper Honorable Mention Award** at [IEEE VIS 2023](http://ieeevis.org/year/2023/welcome), represents a step towards intelligent tools that can not only generate charts but also explain the reasoning behind them. ## Stack While the problem is more important than the tools, the tech stack tells a story about the project's architecture and trade-offs. Here's what this project is built on: ### Platforms & Runtimes [Python](https://www.python.org/) [Implements the Draco 2 framework and its execution environment.](https://www.python.org/) [Clingo](https://potassco.org/clingo/) [Solves the Answer Set Programming problems used for constraint-based reasoning.](https://potassco.org/clingo/) [Pyodide](https://pyodide.org/) [Provides a browser-side Python runtime for interactive documentation and JupyterLite notebooks.](https://pyodide.org/) [JupyterLite](https://jupyterlite.readthedocs.io/en/latest/) [Runs interactive tutorials and examples directly in the browser.](https://jupyterlite.readthedocs.io/en/latest/) ### Frontend & Visualization [Vega-Altair](https://altair-viz.github.io/) [Renders Draco specifications into Vega-Lite charts.](https://altair-viz.github.io/) [Jupyter Book](https://jupyterbook.org/) [Builds the documentation site with interactive notebooks and API references.](https://jupyterbook.org/) [Sphinx](https://www.sphinx-doc.org/) [Generates API documentation and processes reStructuredText content.](https://www.sphinx-doc.org/) ### AI & Machine Learning [scikit-learn](https://scikit-learn.org/) [Learns constraint weights from preference data (e.g., via linear SVMs).](https://scikit-learn.org/) ### Data Engineering [Narwhals](https://narwhals-dev.github.io/narwhals/) [Provides engine-agnostic DataFrame processing and schema extraction.](https://narwhals-dev.github.io/narwhals/) [Pandas](https://pandas.pydata.org/) [Serves as a primary DataFrame backend via Narwhals.](https://pandas.pydata.org/) [NumPy](https://numpy.org/) [Supports numerical operations used throughout the pipeline.](https://numpy.org/) ### Backend & APIs [FastAPI](https://fastapi.tiangolo.com/) [Exposes visualization recommendation capabilities via HTTP endpoints.](https://fastapi.tiangolo.com/) [Pydantic](https://docs.pydantic.dev/) [Defines and validates API request/response models and schema.](https://docs.pydantic.dev/) ### External Services [PyPI](https://pypi.org/) [Distributes the draco package for installation via pip.](https://pypi.org/) [Binder](https://mybinder.org/) [Hosts live tutorials and examples without local installation.](https://mybinder.org/) [Codecov](https://codecov.io/) [Tracks test coverage in CI.](https://codecov.io/) ### Cloud & DevOps [Docker](https://www.docker.com/) [Packages the system for consistent deployment and testing.](https://www.docker.com/) [GitHub Actions](https://github.com/features/actions) [Runs CI, builds docs, and deploys the site to GitHub Pages.](https://github.com/features/actions) ### Development Tooling [Dev Containers](https://containers.dev/) [Provides a pre-configured development environment for contributors.](https://containers.dev/) [uv](https://github.com/astral-sh/uv) [Manages Python dependencies for development and builds.](https://github.com/astral-sh/uv) [Poetry](https://python-poetry.org/) [Previously used for Python dependency management.](https://python-poetry.org/) [Hatch](https://hatch.pypa.io/) [Builds and packages the Python project for distribution.](https://hatch.pypa.io/) [pre-commit](https://pre-commit.com/) [Runs formatting and linting checks via Git hooks.](https://pre-commit.com/) [Ruff](https://docs.astral.sh/ruff/) [Lints and formats the Python codebase.](https://docs.astral.sh/ruff/) --- # Driva > An exploration of how to translate vehicle telemetry into driver-facing feedback. The project focuses on turning raw automotive signals into interpretable metrics that support safer and more efficient driving. Modern vehicles produce large volumes of telemetry, but raw measurements are not the same as insight. The hard part is translation: turning a stream of signals into feedback a driver can understand and act on. This project explored that translation, with an emphasis on clarity and trust. The core of the work was in designing a system as part of Magna’s **SmartBridge™ suite**, that could bridge the gap between machine-generated data and human understanding. This required a robust architecture capable of processing and orchestrating real-time data, but the focus was always on the end experience. The goal was to deliver clarity, helping people see not just how far they drove, but **how** they drove—safely, efficiently, and sustainably. To maintain coherence across a complex, multi-platform system, a key principle was establishing a **single source of truth** for communication between services. By automatically generating typed SDKs for the Flutter mobile app directly from the backend’s API specifications, we could ensure that the different parts of the system spoke the same language. This approach wasn’t just a technical convenience; it was a way to manage the inherent complexity of building a reliable, device-agnostic experience. Ultimately, this project served as a proof of concept for a more human-centric approach to automotive IoT. With careful engineering, even low-level signals can become practical feedback for everyday drivers. ## Stack While the problem is more important than the tools, the tech stack tells a story about the project's architecture and trade-offs. Here's what this project is built on: ### Platforms & Runtimes [Dart](https://dart.dev/) [Implements the Flutter mobile application's UI and platform build targets to deliver native Android and iOS apps for vehicle telemetry and user interactions.](https://dart.dev/) [TypeScript](https://www.typescriptlang.org/) [Drives server and shared library development across backend services and serverless handlers, enabling typed contracts and safer refactors across the platform.](https://www.typescriptlang.org/) [JavaScript](https://developer.mozilla.org/en-US/docs/Web/JavaScript) [Powers browser runtime for the React web dashboard and small utility scripts used in site automation and tests.](https://developer.mozilla.org/en-US/docs/Web/JavaScript) [Ruby](https://www.ruby-lang.org/) [Used for Fastlane scripts that automate Flutter app builds, code signing, and TestFlight/App Store deployments.](https://www.ruby-lang.org/) [Node.js](https://nodejs.org/) [Runs compiled TypeScript backend services, serverless function emulators, and development servers used throughout backend and frontend workflows.](https://nodejs.org/) [Android](https://www.android.com/) [Target platform for Flutter app builds and distribution to testers and end users.](https://www.android.com/) [iOS](https://www.apple.com/ios/) [Target platform for Flutter app builds and TestFlight/App Store distribution for mobile users.](https://www.apple.com/ios/) ### Frontend & Visualization [Flutter](https://flutter.dev/) [Constructs the cross-platform mobile application to present driving telemetry, handle device integrations, and produce production-ready Android and iOS builds.](https://flutter.dev/) [React](https://reactjs.org/) [Renders the single-page DriveInsights dashboard, composes visualization components and maps, and manages client-side interactions for trip analysis and recorded-trip workflows.](https://reactjs.org/) [React Router](https://reactrouter.com/) [Manages client-side navigation and deep linking across dashboard views, recorded trips, and settings screens to preserve UI state during navigation.](https://reactrouter.com/) [Material UI](https://mui.com/) [Implements the dashboard's design system and responsive component primitives used in tables, forms, and dialogs to keep UI consistent across views.](https://mui.com/) [Leaflet](https://leafletjs.com/) [Displays interactive geospatial maps and trip traces in the web UI, enabling pan/zoom, markers, and visual route overlays for route analysis.](https://leafletjs.com/) [Bootstrap](https://getbootstrap.com/) [Provides CSS utilities and a responsive grid used in portions of the web UI and landing pages for consistent spacing and layout fallbacks.](https://getbootstrap.com/) ### Data Engineering [Azure Cosmos DB](https://azure.microsoft.com/en-us/services/cosmos-db/) [Stores telemetry, trip records, and application entities in a globally distributed NoSQL database used by aggregation and query workloads.](https://azure.microsoft.com/en-us/services/cosmos-db/) ### Backend & APIs [NestJS](https://nestjs.com/) [Hosts API endpoints and server-side logic that process telemetry, orchestrate aggregation workflows, and expose OpenAPI specifications consumed by client and automation tooling.](https://nestjs.com/) [OpenAPI](https://www.openapis.org/) [Generates machine-readable API specifications used to produce client SDKs, validate contract changes, and drive API-driven CI tasks.](https://www.openapis.org/) ### External Services [Azure IoT Hub](https://azure.microsoft.com/en-us/products/iot-hub) [Receives device telemetry from in-vehicle IoT devices and forwards messages into backend ingestion pipelines and simulator tools.](https://azure.microsoft.com/en-us/products/iot-hub) [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs) [Houses static assets and user-uploaded media (vehicle and user images) in containerized blobs used by the web UI and mobile app for asset delivery.](https://azure.microsoft.com/en-us/products/storage/blobs) [Azure Active Directory B2C](https://azure.microsoft.com/en-us/products/microsoft-entra-ds) [Provides enterprise authentication and single sign-on for dashboard users and API clients, integrated with frontend auth flows and backend token validation.](https://azure.microsoft.com/en-us/products/microsoft-entra-ds) ### Cloud & DevOps [Fastlane](https://fastlane.tools/) [Automates Flutter app build, code signing, and deployment workflows for iOS and Android applications.](https://fastlane.tools/) [Azure Functions](https://azure.microsoft.com/en-us/products/functions) [Hosts serverless aggregation and event-driven functions that execute trip processing, scheduled tasks, and lightweight server-side jobs.](https://azure.microsoft.com/en-us/products/functions) [Azure App Service](https://azure.microsoft.com/en-us/products/app-service) [Hosts web dashboard and API web apps with automated deployment from CI and standard platform scaling for production traffic.](https://azure.microsoft.com/en-us/products/app-service) [GitHub Actions](https://github.com/features/actions) [Runs CI workflows that lint, test, build, and deploy frontend and backend projects to Azure on merges to the main branch.](https://github.com/features/actions) ### Development Tooling [Jest](https://jestjs.io/) [Runs unit and integration tests for TypeScript services and React components with coverage thresholds enforced in CI.](https://jestjs.io/) [ESLint](https://eslint.org/) [Enforces JavaScript/TypeScript linting rules and prevents common issues across frontend and backend repositories, integrated into CI lint checks.](https://eslint.org/) [Prettier](https://prettier.io/) [Normalizes code formatting across projects to produce consistent diffs and to enable automated formatting in CI.](https://prettier.io/) [npm](https://www.npmjs.com/) [Manages Node.js package installation and scripts used across web and backend projects to run builds, tests, and code generators.](https://www.npmjs.com/) --- # Human vs. AI Perceptual Alignment > An investigation into whether Vision-Language Models categorize scientific visualizations in ways that align with expert judgment. The study evaluates 13 models on a labeled image set and measures agreement on visual purpose and encoding patterns. The increasing use of AI to interpret visual data rests on a key assumption: that models “see” charts and figures in ways that match human expertise. This study tests that assumption by comparing Vision-Language Model outputs with expert annotations. To explore this, we designed a systematic evaluation, comparing the classifications of **13 models** against a ground truth of **expert annotations** on a labeled set of scientific visualizations. The focus was on pure visual categorization—assessing a model’s ability to identify a visualization’s purpose, encoding, and dimensionality without any textual context. The engineering behind the study was designed for rigor and reproducibility, using a multi-provider setup with tools like LangChain to ensure a broad comparison. The goal is not to crown a “best” model, but to provide a **measured, quantitative view** of the alignment gap. The results show where current models agree with human consensus and where they reliably diverge, including specific failure modes on complex encodings. This research, accepted at [IEEE VIS 2025](http://ieeevis.org/year/2025/welcome), contributes evidence about current capabilities and limitations in visual data analysis. ## Stack While the problem is more important than the tools, the tech stack tells a story about the project's architecture and trade-offs. Here's what this project is built on: ### Platforms & Runtimes [Python](https://www.python.org/) [Implements the evaluation pipeline and analysis workflows.](https://www.python.org/) ### Frontend & Visualization [Vega-Altair](https://altair-viz.github.io/) [Generates charts summarizing model performance across difficulty levels and encoding types.](https://altair-viz.github.io/) [Plotly](https://plotly.com/) [Creates interactive confusion matrices and multi-label classification visualizations.](https://plotly.com/) [Figma](https://www.figma.com/) [Designs the research poster and supporting visual assets.](https://www.figma.com/) ### AI & Machine Learning [LangChain](https://langchain.com/) [Orchestrates multi-provider model integration and structured outputs for categorization tasks.](https://langchain.com/) [OpenAI API](https://platform.openai.com/) [Provides models used for zero-shot visualization categorization evaluation.](https://platform.openai.com/) [Gemini](https://ai.google.dev/) [Provides models used for zero-shot visualization categorization evaluation.](https://ai.google.dev/) [Mistral AI](https://www.mistral.ai/) [Provides models used for zero-shot visualization categorization evaluation.](https://www.mistral.ai/) [Meta LLaMA](https://ai.meta.com/llama/) [Provides models used for zero-shot visualization categorization evaluation.](https://ai.meta.com/llama/) [Qwen](https://qwen.ai/) [Provides models used for zero-shot visualization categorization evaluation.](https://qwen.ai/) [OpenRouter API](https://openrouter.ai/) [Routes requests to multiple providers behind a unified API for model comparison.](https://openrouter.ai/) [scikit-learn](https://scikit-learn.org/) [Computes multi-label classification metrics and confusion matrices.](https://scikit-learn.org/) ### Data Engineering [Polars](https://pola.rs/) [Processes the VIS30K dataset for stratified sampling and performance analysis.](https://pola.rs/) [Apache Arrow](https://arrow.apache.org/) [Supports columnar data processing and Parquet serialization.](https://arrow.apache.org/) [Parquet](https://parquet.apache.org/) [Stores structured analysis results in a partitioned format for efficient querying and caching.](https://parquet.apache.org/) ### Backend & APIs [SQLite](https://www.sqlite.org/index.html) [Caches HTTP responses and LLM outputs to avoid redundant API calls.](https://www.sqlite.org/index.html) [Pydantic](https://docs.pydantic.dev/) [Defines and validates structured output schemas (purpose, encoding, dimensionality).](https://docs.pydantic.dev/) ### External Services [Langfuse](https://langfuse.com/) [Tracks token usage and latency across the evaluation runs.](https://langfuse.com/) [Ollama](https://ollama.com/) [Enables local model deployment and testing for on-device evaluation scenarios.](https://ollama.com/) ### Development Tooling [Marimo](https://marimo.io/) [Builds interactive notebooks for inference, evaluation, and visualization.](https://marimo.io/) [uv](https://github.com/astral-sh/uv) [Manages Python dependencies for a reproducible research environment.](https://github.com/astral-sh/uv) [Ruff](https://docs.astral.sh/ruff/) [Lints and formats the research codebase.](https://docs.astral.sh/ruff/) --- # Men's Health Web Platform > A web platform for a men's health publication focused on aligning editorial strategy with audience needs. It pairs a fast public site with a lightweight analytics pipeline that helps editors spot content gaps and topical imbalances beyond page-view metrics. The challenge for a modern publication is twofold: deliver a fast, reliable experience to readers, and give the editorial team feedback that goes beyond page views. Editorial intuition matters, but it is hard to see the broader catalog: which topics are neglected, and whether related articles are well linked. This project addresses that gap by pairing a performance-first public site with a lightweight analytics pipeline designed for editors, not data scientists. The public-facing site, built with Next.js , is optimized for speed and reliability. Behind the scenes, a separate system provides the editorial team with **simple, actionable signals** about their work. Instead of complex dashboards, the focus is on answering practical questions. Using text embeddings from the OpenAI API , the system can understand the semantic relationships between articles. This helps editors visualize their content strategy, spot topical gaps, and identify opportunities for cross-promotion that might not be immediately obvious. It’s a way to **augment editorial judgment**, not replace it. The visuals below show this in practice. The first gives a quick overview of the topical balance across the site, while the second maps the semantic neighborhood of articles, revealing clusters of related content. These tools are designed to be immediately useful for someone planning the next piece of content. ## Stack While the problem is more important than the tools, the tech stack tells a story about the project's architecture and trade-offs. Here's what this project is built on: ### Platforms & Runtimes [TypeScript](https://www.typescriptlang.org/) [Provides typed authoring for the Next.js frontend and server-side code to reduce runtime errors and improve developer DX.](https://www.typescriptlang.org/) [Node.js](https://nodejs.org/) [Executes server-side TypeScript code for the web platform's Next.js frontend and API routes.](https://nodejs.org/) [Next.js](https://nextjs.org/) [Serves the public website and compiles localized Hungarian pages with server-side rendering and incremental revalidation tied to CMS updates.](https://nextjs.org/) [Python](https://www.python.org/) [Runs the statistics/analytics service that generates interactive analysis and embedding pipelines for editorial content.](https://www.python.org/) ### Frontend & Visualization [React](https://reactjs.org/) [Renders interactive UI components across the public site and client-side experience, including measurement-aware analytics snippets.](https://reactjs.org/) [Tailwind CSS](https://tailwindcss.com/) [Implements utility-first styling and responsive design across the public site to maintain a consistent editorial look and compact CSS output.](https://tailwindcss.com/) [styled components](https://styled-components.com/) [Applies component-scoped styling for specific interactive UI elements that require runtime theming and style encapsulation.](https://styled-components.com/) [Streamlit](https://streamlit.io/) [Hosts the internal statistics dashboard that editors use to explore article similarity, run visual analyses, and interact with dimensionality-reduced views.](https://streamlit.io/) [Plotly](https://plotly.com/) [Renders interactive charts and exploratory visualizations within the analytics UI for non-technical editorial users.](https://plotly.com/) [Vega-Altair](https://altair-viz.github.io/) [Generates declarative visualizations used in analysis notebooks and the internal exploratory views to surface dataset patterns.](https://altair-viz.github.io/) ### AI & Machine Learning [scikit-learn](https://scikit-learn.org/) [Computes clustering and model-based transforms used for content grouping and as an input to visualization/insight generators.](https://scikit-learn.org/) [umap-learn](https://umap-learn.readthedocs.io/en/latest/) [Produces lower-dimensional embeddings for visualization (UMAP) to create interactive 2D/3D layouts of article-embedding spaces.](https://umap-learn.readthedocs.io/en/latest/) [OpenAI API](https://platform.openai.com/) [Generates text embeddings and NLP-driven insights used to power content similarity searches and automated analysis in the statistics pipeline.](https://platform.openai.com/) ### Data Engineering [LanceDB](https://lancedb.com/) [Stores and queries text embeddings for nearest-neighbor searches and similarity ranking used by the analytics pipeline and search tooling.](https://lancedb.com/) [Polars](https://pola.rs/) [Transforms and filters large CSV/content exports using lazy evaluation to prepare data for embedding, visualization, and publishing workflows.](https://pola.rs/) [Apache Arrow](https://arrow.apache.org/) [Handles columnar serialization and efficient exchange of parquet/arrow data between data-processing stages in the analytics pipeline.](https://arrow.apache.org/) ### Backend & APIs [Sanity](https://www.sanity.io/) [Hosts editorial content as a headless CMS and supplies the canonical article dataset that both the public site and analytics pipelines consume.](https://www.sanity.io/) [GROQ](https://groq.dev/) [Queries and filters content from the Sanity CMS using GROQ syntax to power both the public site and analytics data extraction.](https://groq.dev/) ### External Services [Google Analytics](https://analytics.google.com/analytics/web/) [Collects consent-aware client-side metrics and engagement events used for editorial analytics and measurement dashboards.](https://analytics.google.com/analytics/web/) [Google Ads](https://ads.google.com/) [Manages and tracks advertising campaigns to drive targeted traffic.](https://ads.google.com/) [MailerLite](https://www.mailerlite.com/) [Manages email newsletters and subscriber lists, enabling editorial teams to distribute content updates and health tips to their audience.](https://www.mailerlite.com/) ### Cloud & DevOps [AWS](https://aws.amazon.com/) [Provides S3 object store for embeddings, exported artifacts, and static assets accessed by both web and analytics components.](https://aws.amazon.com/) [Fly.io](https://fly.io/) [Hosts the Python analytics service with configured regional settings and runtime parameters for the statistics application.](https://fly.io/) [Vercel](https://vercel.com/) [Deploys and serves the Next.js public site, providing CDN distribution and edge routing for global visitors.](https://vercel.com/) [Docker](https://www.docker.com/) [Packages the analytics service environment for consistent deployment and healthchecked container runs in the hosting platform.](https://www.docker.com/) [GitHub Actions](https://github.com/features/actions) [Schedules and runs automation jobs (e.g., nightly cache revalidation) to keep public content caches and site revalidation up to date.](https://github.com/features/actions) ### Development Tooling [uv](https://github.com/astral-sh/uv) [Accelerates Python dependency installation and reproducible environment setup used inside the project's containerized builds.](https://github.com/astral-sh/uv) [Hatch](https://hatch.pypa.io/) [Builds and packages the Python analytics project for editable installs and CI-driven artifact generation.](https://hatch.pypa.io/) [Ruff](https://docs.astral.sh/ruff/) [Provides fast linting and code quality checks for Python sources to catch common issues and enforce style rules before commits.](https://docs.astral.sh/ruff/) [Prettier](https://prettier.io/) [Enforces consistent code formatting across the repository, improving cross-team readability and pre-commit hygiene.](https://prettier.io/) [ESLint](https://eslint.org/) [Runs static analysis on TypeScript/JavaScript sources to prevent common runtime issues and uphold style rules.](https://eslint.org/) --- # PLUTO: Public Value Assessment Tool > A questionnaire-based tool for assessing the public value of data practices. It translates abstract ethics questions into a structured workflow and outputs a risk-benefit analysis with concrete recommendations. The widespread use of personal data creates a fundamental challenge: how do we assess its true public value? For organizations, it is difficult to know if their practices are genuinely beneficial to society, and for individuals, the risks and rewards of sharing their information are often opaque. This ambiguity makes it hard to build trust and accountability in a data-driven world. This project, the **P**ublic va**LU**e assessment **TO**ol (PLUTO), was created to bring clarity to this complex issue. It reframes the abstract principles of data ethics into a simple, **structured assessment** that anyone can use. By guiding users through a concise questionnaire, the tool helps both data users and data subjects reason about the implications of data use in a systematic and accessible way. The output is not a lengthy report, but an interactive visualization on a **risk-benefit matrix**. The experience is delivered through a web platform built with Next.js and Strapi , with all components containerized in Docker . We validated the tool through user testing and deployed it for use by partner organizations. ## Stack While the problem is more important than the tools, the tech stack tells a story about the project's architecture and trade-offs. Here's what this project is built on: ### Platforms & Runtimes [TypeScript](https://www.typescriptlang.org/) [Primary programming language for the web frontend and CMS backend, providing type safety across the entire application stack](https://www.typescriptlang.org/) [Node.js](https://nodejs.org/) [JavaScript runtime environment executing the Next.js server-side rendering and Strapi CMS backend services](https://nodejs.org/) [Next.js](https://nextjs.org/) [React meta framework for the main web application with server-side rendering, API routes, and static generation of content pages](https://nextjs.org/) [Python](https://www.python.org/) [Core language for the analytics package, powering data processing and visualization components using Marimo notebooks](https://www.python.org/) [Marimo](https://marimo.io/) [Used for creating the analytics dashboard that visualizes survey completion times and submission statistics combined by pulling data from Strapi and Umami analytics](https://marimo.io/) [Streamlit](https://streamlit.io/) [In a prior version of the system, Streamlit was used instead of Marimo to interactively visualize potential survey submission result distributions based on simulated data](https://streamlit.io/) ### Frontend & Visualization [React](https://reactjs.org/) [UI library powering the interactive survey interface, result visualization components, and admin dashboard for the PLUTO assessment tool](https://reactjs.org/) [Tailwind CSS](https://tailwindcss.com/) [CSS framework used for consistent styling and responsive design](https://tailwindcss.com/) [Radix UI](https://www.radix-ui.com/) [Headless UI components library providing accessible accordion, hover cards, and navigation menu components for the assessment interface](https://www.radix-ui.com/) [SurveyJS](https://surveyjs.io/) [Renders the dynamic questionnaire interface with 25 questions across four assessment dimensions, handling form validation and user interactions](https://surveyjs.io/) [D3.js](https://d3js.org/) [Creates the interactive quadrant plot visualization that displays risk-benefit analysis results in a four-quadrant matrix for survey respondents](https://d3js.org/) [Vega-Altair](https://altair-viz.github.io/) [Statistical visualization library generating charts and graphs for the analytics dashboard showing survey metrics](https://altair-viz.github.io/) ### Data Engineering [DuckDB](https://duckdb.org/) [In-process analytical database powering the analytics dashboard with SQL queries for processing survey submission data and generating insights](https://duckdb.org/) [Polars](https://pola.rs/) [Fast DataFrame library processing survey response data and submission analytics in the Python-based analytics component](https://pola.rs/) ### Backend & APIs [Strapi](https://strapi.io/) [Headless CMS that manages survey questions, content pages, and submission data through RESTful APIs for the assessment tool with webhooks for dynamic revalidation of static content](https://strapi.io/) [SQLite](https://www.sqlite.org/index.html) [Embedded database storing survey configurations, user submissions, and content management data for the Strapi CMS backend](https://www.sqlite.org/index.html) ### External Services [Umami](https://umami.is/) [Privacy-focused, self-hosted web analytics service tracking user interactions and survey completion rates for the assessment tool](https://umami.is/) [Microsoft Clarity](https://clarity.microsoft.com/) [User behavior analytics platform providing heatmaps and session recordings to understand how users interact with the assessment interface](https://clarity.microsoft.com/) [HeyForm](https://heyform.com/) [Form builder service collecting user feedback and suggestions about the assessment tool through embedded feedback forms](https://heyform.com/) ### Cloud & DevOps [Docker](https://www.docker.com/) [Containerizes all three application components (web, CMS, analytics) for consistent deployment and development environment setup](https://www.docker.com/) [GitHub Actions](https://github.com/features/actions) [Used to automatically build docker images for each component and push them to GitHub Container Registry on every commit to the main branch](https://github.com/features/actions) [Portainer](https://www.portainer.io/) [Used to self-host all system components on a VPS, managing Docker containers and monitoring resource usage](https://www.portainer.io/) ### Development Tooling [pnpm](https://pnpm.io/) [Package manager managing dependencies across the monorepo workspace with efficient disk usage and fast installation for Node.js packages](https://pnpm.io/) [Biome](https://biomejs.dev/) [Code formatter and linter enforcing consistent JavaScript/TypeScript code style across the entire monorepo workspace](https://biomejs.dev/) [Nx](https://nx.dev/) [In a previous version of the monorepo, Nx was used to manage project builds and dependencies but was later replaced by pnpm workspaces for simplicity](https://nx.dev/) [uv](https://github.com/astral-sh/uv) [Fast Python package installer and resolver managing dependencies for the analytics package with improved performance over pip](https://github.com/astral-sh/uv) [Ruff](https://docs.astral.sh/ruff/) [Fast Python linter and formatter maintaining code quality and style consistency in the analytics package](https://docs.astral.sh/ruff/) --- # Quarto x Gradio Extension > A Quarto extension that embeds runnable Gradio UIs directly into documentation pages. Examples run in the browser via Pyodide, so readers can experiment without local setup or a backend. The fundamental limitation of most technical and educational material is the gap between theory and practice. Static code snippets and screenshots can explain a concept, but they cannot truly demonstrate it, forcing a context switch that breaks the flow of learning. This is especially true when teaching complex mathematical concepts in data science, where understanding often remains abstract until a student can interact with the underlying algorithm. This project explores a simple way to close that gap by **embedding runnable interfaces directly into the documentation**. Instead of only reading about a concept, a student can interact with an implementation on the same page. For example: a live demo of **[Huffman encoding](https://peter-gy.github.io/static/uni/2024w/mds/project/content/02-data-compression---huffman.html)** embedded next to the explanation. This is made possible by a pragmatic architectural choice: execute examples entirely in the reader’s browser. Using Gradio with Pyodide keeps the experience self-contained. A lightweight Quarto filter handles the embedding so authors can keep writing normal Quarto content. ## Stack While the problem is more important than the tools, the tech stack tells a story about the project's architecture and trade-offs. Here's what this project is built on: ### Platforms & Runtimes [Python](https://www.python.org/) [Executes example Gradio interfaces entirely in the browser via Pyodide, enabling serverless app demos within documentation pages](https://www.python.org/) [JavaScript](https://developer.mozilla.org/en-US/docs/Web/JavaScript) [Implements a small client-side UX script that warns mobile users when Gradio Lite content is present and loads site analytics](https://developer.mozilla.org/en-US/docs/Web/JavaScript) [Lua](https://www.lua.org/) [Drives the Pandoc filter that detects Python cells and injects \ markup so examples render as live apps in the generated site](https://www.lua.org/) [Pyodide](https://pyodide.org/) [Runs Python code client-side so Gradio apps in the docs execute in the reader's browser without any backend](https://pyodide.org/) ### Frontend & Visualization [Quarto](https://quarto.org/) [Builds the documentation website and applies the custom Lua filter to embed Gradio Lite apps directly within content pages](https://quarto.org/) [Gradio](https://gradio.app/) [Renders interactive Gradio UIs in the browser using \ tags, turning example Python cells into live, serverless apps](https://gradio.app/) [Reveal.js](https://revealjs.com/) [Enables optional slide presentations that can include the same embedded, in-browser Gradio app experiences](https://revealjs.com/) [Bootstrap](https://getbootstrap.com/) [Provides the site's visual foundation via the Darkly theme so embedded Gradio components match the site's styling](https://getbootstrap.com/) [Plotly](https://plotly.com/) [Powers interactive charts inside example Gradio interfaces to demonstrate rich, client-side visualizations](https://plotly.com/) ### External Services [Umami](https://umami.is/) [Tracks documentation usage via a self-hosted analytics endpoint embedded on every page through a site include](https://umami.is/) ### Development Tooling [Jupyter](https://jupyter.org/) [Supports iterating on Python examples and notebooks that are later published as fully client-side, interactive docs](https://jupyter.org/) [uv](https://github.com/astral-sh/uv) [Manages Python development dependencies used to author and test the extension and its example content](https://github.com/astral-sh/uv)