What is exploratory data analysis?
Exploratory data analysis (EDA) is the first step you take with any new dataset: understanding its shape, quality and character before committing to a model, a chart or a business decision. The term was coined by the statistician John Tukey in 1977, and it remains the backbone of every modern data science workflow.
Concretely, EDA answers questions like:
- How many rows and columns are there, and what types are they?
- How much data is missing, and where is it concentrated?
- Which values repeat? Are there obvious duplicates or near-duplicates?
- What do the distributions of each numeric column look like?
- Which columns are correlated, and which are independent?
- Are there outliers, impossible values, or suspicious clusters?
Good EDA takes minutes on a clean dataset and hours on a messy one. This tool compresses the first 80% of that work into a single upload, so you can decide in seconds whether a dataset is worth a deeper look.
What this tool is not
Being explicit about limits saves everyone time:
- It is not a data-cleaning tool. We report missing values and duplicates; we do not impute, dedupe or transform your data. Use pandas, OpenRefine or a spreadsheet for that.
- It is not a machine learning platform. No models are trained, no predictions are made. EDA is the step before modelling.
- It is not a BI dashboard. There are no saved reports, no scheduled refreshes, no shareable links. Each upload is an isolated session.
- It is not a data storage service. Your file is analysed in memory and released when you close the tab. Nothing is persisted.
- It is not a replacement for domain knowledge. Statistics tell you what; only you know why. Treat the output as a starting point, not a conclusion.
Who is this for?
- Data scientists who just received a new dataset and want a 60-second overview before opening a notebook.
- Analysts who need to sanity-check a partner’s CSV before building a report.
- Product managers and founders who want to understand a data export without setting up a Python environment.
- Students and teachers who need a quick demonstration of what EDA looks like on real data.
- Journalists and researchers evaluating an unfamiliar open dataset.
If you already have a notebook open and pandas imported, you don’t need this. If you have a fresh file and want to know whether it deserves a deeper look, you do.
What you can do with it
- Upload a CSV (
.csv) or Excel (.xlsx) file, up to 50 MB. - Get an instant health report: rows, columns, dtypes, missing cells, duplicated values per column, duplicate rows.
- See numeric column statistics: minimum, maximum, mean, median, standard deviation.
- Inspect categorical column summaries: mode, unique-value counts, top values.
- Review a preview of the first rows of your data.
- For multi-sheet Excel files, pick which sheet to analyse.
- Explore interactive distributions and correlations in the Visualisations tab.
How it works
- You upload a file. Drag a CSV or XLSX onto the page, or click to browse.
- The server analyses it in memory. Your file is parsed into a pandas DataFrame and summarised on the spot.
- The browser renders the report. Tables and charts appear in seconds. The file is released from memory when your session ends.
There is no signup, no sign-in, and no data is written to disk on our side.
Privacy and data handling
- Your file is processed in memory only. It is never written to disk, never logged, and never stored.
- Uploads travel over HTTPS.
- We do not use your data to train any model.
- We do not share your data with third parties.
Current limits
- Files up to 50 MB (roughly 500,000 rows of typical tabular data).
- CSV and XLSX only. JSON, Parquet and SQL dumps are on the roadmap.
- No persistent storage. If you close the tab, the analysis is gone; nothing is saved server-side.
Frequently asked questions
Is it free?
Yes. Upload, analyse, leave. No signup required.
Do I need to install anything?
No. It runs entirely in your browser. A modern browser and an internet connection are the only requirements.
What file formats are supported?
CSV (comma-separated values) and Excel (.xlsx). If your file uses a
different delimiter or text encoding, open it in Excel or a text editor and
re-export as UTF-8 CSV.
Is my data safe?
Your file is analysed in memory and never persisted. See the Privacy and data handling section above for the full policy.
Can I analyse files larger than 50 MB? (also: “413 Payload too large”)
No. 50 MB is a hard cap, and uploading a larger file surfaces as a 413 Payload too large error. Take a random sample of your data and upload the smaller file instead — for most EDA questions (shape, missing values, distributions, correlations) a representative sample gives you the same answers as the full dataset.
My Excel file has multiple sheets — which one gets analysed?
The first sheet by default. A sheet selector appears after upload so you can switch.
How is this different from pandas-profiling, ydata-profiling or Sweetviz?
Those are excellent Python libraries you import into a notebook. This is a hosted tool that runs without installation, aimed at quick one-off analyses. If you’re already in a Jupyter environment, use ydata-profiling. If you’re not, use this.
Can I export the report?
Not yet. The current report is interactive in the browser only. PDF and Markdown export are on the roadmap.
How can I contact you?
We welcome questions and feedback, please contact us.