Releases: microsoft/data-formulator
Data Formulator 0.7-alpha
More Charts, New Experience, Enterprise-Ready
🚧 *This version is in fact a big redesign, probably deserves v1.0. But for now, we're shipping this as 0.7-alpha for fun --- a proper, detailed write-up on the new architecture is coming soon. *
Version:
0.7.0a1(alpha) · Files changed: ~282 · +84k / −16k lines
What's New
📊 Dramatically Expanded Visualization Support
The chart template system has been rebuilt with a new semantic engine, expanding from ~15 chart types to 30 Vega-Lite chart types:
| Category | Chart Types |
|---|---|
| Scatter & Point | Scatter Plot, Regression, Boxplot, Strip Plot (new), Ranged Dot Plot |
| Bar | Bar Chart, Grouped Bar Chart, Stacked Bar Chart, Histogram, Lollipop Chart (new), Pyramid Chart, Heatmap |
| Line & Area | Line Chart, Dotted Line Chart, Bump Chart (new), Area Chart (new), Streamgraph (new) |
| Part-to-Whole | Pie Chart (new), Rose Chart (new), Waterfall Chart (new) |
| Statistical | Density Plot (new), Candlestick Chart (new), Radar Chart (new) |
| Map | US Map (new), World Map (new) |
| Custom | Custom Point, Custom Line, Custom Bar, Custom Rect, Custom Area |
Semantic field analysis automatically infers temporal, categorical, quantitative, and geographic types to recommend the right chart for the data.
💬 Hybrid Chat + Data Thread & Enhanced Agent Mode
- Redesigned Data Thread — Chat-based interaction is woven directly into the exploration thread. Users converse with agents inline alongside data transformations and chart results, replacing the separate chat panel.
- Richer thread cards showing transformation lineage, chart previews, and agent reasoning in a unified timeline.
- New agent mode — Agents autonomously plan multi-step explorations, generate chart recommendations, and produce data insights, all surfaced inline in the thread.
- Conversational data loading via integrated chat-based data ingestion.
🤖 Redesigned Agent Architecture
The backend agent system has been significantly restructured — consolidating previously fragmented agents into a cleaner, more capable design:
- Unified
DataAgentreplaces four separate agents (agent_py_concept_derive,agent_py_data_rec,agent_sql_data_rec,agent_sql_data_transform) with a single agent that handles both Python and SQL data transformations. - New
agent_data_transform— Dedicated data transformation agent. - New
agent_data_rec— Recommendation agent that suggests charts and exploration directions. - New
agent_chart_insight— Generates natural-language insights from chart results. - Shared
semantic_types— Type system used by both backend agents and frontend chart engine for consistent field inference.
🏗️ Workspace / Data Lake Architecture (Enterprise-Ready)
A new persistent, identity-based Workspace layer replaces the previous in-memory DB approach:
Workspacemanages per-user directories with aworkspace.yamlmetadata catalog tracking every table's lineage, schema, provenance, and source type.- Uploaded files (CSV, Excel, JSON, etc.) preserved as-is; data-loader sources stored as Parquet via PyArrow.
CacheManagerandFileManagerfor efficient caching and file lifecycle.- Azure Blob and Cached Azure Blob workspace backends for cloud deployments.
WorkspaceFactoryselects the correct workspace backend from configuration.- New modular route layer replaces monolithic app routes.
🔒 Security Hardening
- Code signing for AI-generated Python code.
- Sandboxed execution with
localanddockerbackends. - Authentication layer for user identity.
- Flask rate limiting to protect API endpoints.
📦 Other Notable Changes
- UV-first build — Fully reproducible builds with
uv.lock;uv sync+uv run data_formulatoris now the recommended development workflow. - Unified data upload dialog and refresh data dialog.
- Demo streaming routes for live data scenarios.
api-keys.env.templateconsolidated into.env.template.
Getting Started
# Recommended (uv)
uvx data_formulator
# Or via pip
pip install data_formulator==0.7.0a1
python -m data_formulatorCommunity Contributions
Thanks to our contributors:
- @BAIGUANGMEI — Map projection support (projection types & centers) (#232)
- @IAMkecheng — ECharts renderer: scatter plot & bar chart (#236)
- @joshpoll — GoFish grouped bar sizing fix (#235)
Alpha notice: This is a pre-release. APIs and features may change before the stable 0.7.0 release. Please report issues and share feedback!
Full Changelog: 0.6...0.7.0a1
Data Formulator 0.6: Live Data
⚡ ⚡ Real-time insights from live data
- Connect to URLs and databases with automatic refresh intervals.
- Visualizations update automatically as your data changes to provide you live insights.
Demo: Track International Space Station Position Speed Live (unmute for audio intro)
demo-live-data.mp4
How to work with live data
When loading data from URLs and databases, enable watch mode so it automatically loads new data when updates are available.
Explore data as you typically do, using agent mode or interactive mode. And enjoy a set of live visualizations that provides you real-time insights as data gets updated. When you create a report from the visualization, you also benefit from the live updates in the report.
To make it easier to play with live features, we created a few examples in the Load Data from URLs section, so you can play with live data easily:
- Stock data (based on yfinance) - Track stock prices with daily or intraday updates, including historical data and financial metrics
- International Space Station (ISS) positions, based on OpenNotify - Real-time tracking of ISS location (updates every 30 seconds)
- Earthquakes, based on USGS - Monitor significant earthquakes worldwide with minute-by-minute updates
- Weather data, based on Open Meteo - Track current weather conditions and forecasts for major US cities (updates daily or hourly)
- Live sales feed (simulated) - Real-time e-commerce sales data for testing and demonstrations (updates every 5 seconds)
🎨 UI Updates
- Unified UI for data loading: Streamlined interface for loading data from various sources
- Direct drag-and-drop fields: Drag fields directly from the data table to update visualization designs
For Developers
Key Code Changes
Frontend:
- New
useDataRefreshhook (src/app/useDataRefresh.tsx) manages automatic refresh intervals for stream and database sources - Extended
DataSourceConfiginterface withautoRefreshandrefreshIntervalSecondsfields - New Redux action
updateTableSourceRefreshSettingsfor managing refresh settings - UI components for watch mode configuration in data upload dialogs and table cards
- Removed Concept Shelf component; concepts are now directly shown in the data table for drag-and-drop operations.
Backend:
- New
/data-loader/refresh-tableendpoint for refreshing database tables with content hash tracking - Demo stream routes in
demo_stream_routes.pyfor live data examples (stocks, ISS, earthquakes, weather, sales)
Architecture:
- Auto-refresh supports both
stream(URL) anddatabasesource types - Frontend controls refresh timing; backend handles data fetching
- Content hash comparison optimizes updates by detecting actual data changes
- No breaking changes; all new fields are optional
Data Formulator 0.5.1
What's Changed
- Update YouTube link in About component by @Chenglong-MS in #190
- Dev by @Chenglong-MS in #191
- Bump js-yaml from 4.1.0 to 4.1.1 by @dependabot[bot] in #195
- Dev by @Chenglong-MS in #197
- Bump vega from 5.33.0 to 6.2.0 by @dependabot[bot] in #199
- Chartifact popup by @danmarshall in #200
- Bump validator from 13.15.20 to 13.15.22 by @dependabot[bot] in #207
- Sanitize password from PostgreSQL error logs and add MySQL connection cleanup by @Copilot in #210
- Add del method to MySQLDataLoader for connection cleanup by @Copilot in #209
- Feat/expand chart support by @KaranPradhan266 in #211
- Add BigQuery DataLoader support by @hurairahmateen in #206
- Add MongoDB Support by @BAIGUANGMEI in #213
- Dev by @Chenglong-MS in #208
New Contributors
- @Copilot made their first contribution in #210
- @KaranPradhan266 made their first contribution in #211
- @hurairahmateen made their first contribution in #206
- @BAIGUANGMEI made their first contribution in #213
Full Changelog: 0.5...0.5.1
Data Formulator 0.5
Vibe with your data, in control ---
It has been a while since our last launch! This time we bring agent interaction to Data Formulator, to make easier for your to play with data.
Play with it and let us know if it is fun! 👀 Checkout our online demo here: https://data-formulator.ai/
Featuring
- 📊 Load (almost) any data: load structured data, extract data from screenshots, from messy text blocks, or connect to databases.
- 🤖 Explore data with AI agents:
- In agent mode, provide a high-level goal and ask agents to explore data for you.
- To stay in control, directly interact with agents: ask for recommendations or specify chart designs with UI + NL inputs, and AI agents will formulate data to realize your design.
- Use data threads to control branching exploration paths: backtrack, branch, or follow up.
- ✅ Verify AI generated results: interact with charts and inspect data, formulas, explanations, and code.
- 📝 Create reports to share insights: choose charts you want to share, and ask agents to create reports grounded in data formulated throughout exploration.
data-formulator-v0.5-4k-mini.mp4
(If you like to build, join us build the future of data analysis!)
Data Formulator 0.2
Data Formulator 0.2 now supports working with large datasets, powered by the backend database!
Demonstration: Exploration of Metacritic's Best Games and Reviews - 2025
- This Kaggle dataset contains 13k+ games and 1.6M+ reviews of best games based on Metacritics reviews.
- Data source: https://www.kaggle.com/datasets/davutb/metacritic-games
- Exploration:
- What's the relation between user scores and critic scores?
- What are games where user reviews are really high but critic's scores are really low?
- How does the score distribution compare between critics and users?
df-demo-game-reviews.mp4
Release details: data visualization with large sized data.
Data Formulator integrates DuckDB as the backend local database to support data exploration with large datasets (million rows). It is also possible to connect external database with DuckDB, not all connection are supported at the moment, but that's the beginning!
- Upload large sized data to the local database, or connect to existing databases (mysql or postgres) to work with large data.
- A subset of sample data will be pulled to the frontend to explore, you can roll the dice 🎲 or sort the data by different columns to view different samples.
- Manage local database with the Database manager.
- Interaction with Data Formulator as usual:
- Use drag and drop to specify a chart, and Data Formulator can dynamically generate SQL query to fetch data to instantiate data. This process is quite fast!
- Specify new visualization fields / provide NL instructions as usual, and the newly introduced NL2SQL agents can generate SQL queries based on your instruction to prepare the data, and create visualizations.
- Anchor a dataset, followup, join some tables, can you can dive deep pretty fast into insights!
- (Minor feature updates)
- Updated how derived concept works in Data Formulator -- data transformation is executed in the backend and updated data is appended to the new dataset. New concepts can be applied directly to new dataset in one click.
- Improved system performance with configurable sandboxing options (main process versus subprocess) for LLM generated code (~3s interaction time reduction).
- Configurable default visualization size in the main panel.
More explorations on the demo dataset:
- What's your favorite games and how their review change over time?
- What's the franchise that consistently improved reviews?
- What are games that have most different reviews in different platforms?
- What are games with many positive critic reviews but no user bother to play?
- What about reviews trends for the No Man's sky?
Well, it is time to upgrade Data Formulator and play with it! Let us know what you come up with :)
Data Formulator 0.1.7
With Data Anchor, we can anchor an intermediate data to isolate it's derivation context from it's predecessors. Tables created from the anchor will take the anchored table as direct input (not the original data).
This could be helpful for cleaning initial input data (so we always work with cleaned data afterwards), or when we want to focus our analysis into a subset of dataset.
Example 1: Clean table
Use anchor to clean the table, so that follow-up analysis are all build on top of the clean data. Analysis of director profit is based on the filtered data.
anchor-clean-data.mp4
Example 2: Analyze a subset
Create a subset from the original table to focus analysis. The AI agent will be less likely to be confused, analysis will be faster. The anchored asian-energy dataset includes only countries from Asia.
anchor-subset-analysis.mp4
Illustration
The anchored thread has it's own context --- no more access to the original data. Though, you have the option to add the original data back using "multi-table" approach from the previous release. You can also go back to the original data to create another branch there.

Data Formulator 0.1.6
Highlight
It was supposed to only be some improvements and bug fixes over 0.1.5, but ended up getting much better --- Data Formulator now supports working with multiple tables! 🔥🔥🔥
When you add multiple tables to Data Formulator, you can select which base tables Data Formulator will use to derive the data (in the chart builder). This means Data Formulator can flexibly decide how to join or combine multiple tables together to create a visualization or answer your question.
In this demo below, we have a datasets of UK wheats production.
- To visualize wheats production by UK monarch, we can load a second table (here I ask GPT-4o to generate the table out of nowhere since it has knowledge about history :)).
- Then, we can drag a field from the second table to indicate that we want Data Formulator to leverage both tables to generate the chart, and it does.
- In the second demo, we can manually tell Data Formulator needs to consider both tables to answer "average wheat production per monarch", and it will also join the two tables for create the answer.
df-multi-table-demo.mp4
Besides this feature, we have improved and fixed various UI and model selection issues from the community, thanks everyone for your suggestions! Let us know what you would like to see in Data Formulator next. :)
What's Changed
- Add and manipulate multiple table (dev) by @noless3011 in #89
- Dev: multi-table support & fixes by @Chenglong-MS in #99
- Bump vega from 5.23.0 to 5.26.0 by @dependabot in #95
New Contributors
- @noless3011 made their first contribution in #89
Full Changelog: 0.1.5.1...0.1.6
Data Formulator 0.1.5
What's New
Support more models!
- you can provide model in the model_selection dialog as follows, we have tested with openai, azure, anthropic, ollama (codellama:7b, llama3.2)
- to preload credential to data formulator, provide them via api-keys.env, checkout https://github.com/microsoft/data-formulator/blob/main/api-keys.env.template. Data Formulator will test and load them into the tool on start.
Still, check out this Data Formulator experience vidoe:
data-formulator-ms-year-report-demo.mp4
0.1.5.1 -- fix the file upload bug
Data Formulator 0.1.4
This is the updates to the previous version with better error message display to help users debug what's going on if Data Formulator fails to run. Also introduces the direct conversation with table, could be useful for data cleaning.
- We also improved data visualization challenges with data formulator -- can you complete them all?
- Comment in the issue when you did, or share your results/questions with others! [comment here]
Enjoy this version! If there is any feedback, let us know.
data-formulator-ms-year-report-demo.mp4
Data Formulator v0.1.3.3
This is the updates to the previous version with better error message display to help users debug what's going on if Data Formulator fails to run.
Update in 0.1.3.2: also include port option to run data formulator on a different port if the default one is occupied.
Update in 0.1.3.3: also to provide cleaning instruction when uploading an image.
Enjoy this version! If there is any feedback, let us know.
Here is a demo of this new version!



