## What is exploratory data analysis?

**Exploratory data analysis (EDA)** is the first step you take with any new
dataset: understanding its shape, quality and character before committing to a
model, a chart or a business decision. The term was coined by the statistician
**John Tukey in 1977**, and it remains the backbone of every modern data
science workflow.

Concretely, EDA answers questions like:

- How many rows and columns are there, and what types are they?
- How much data is missing, and where is it concentrated?
- Which values repeat? Are there obvious duplicates or near-duplicates?
- What do the distributions of each numeric column look like?
- Which columns are correlated, and which are independent?
- Are there outliers, impossible values, or suspicious clusters?

Good EDA takes minutes on a clean dataset and hours on a messy one. This tool
compresses the first 80% of that work into a single upload, so you can decide
in seconds whether a dataset is worth a deeper look.

## What this tool is *not*

Being explicit about limits saves everyone time:

- **It is not a data-cleaning tool.** We *report* missing values and
  duplicates; we do not impute, dedupe or transform your data. Use pandas,
  OpenRefine or a spreadsheet for that.
- **It is not a machine learning platform.** No models are trained, no
  predictions are made. EDA is the step *before* modelling.
- **It is not a BI dashboard.** There are no saved reports, no scheduled
  refreshes, no shareable links. Each upload is an isolated session.
- **It is not a data storage service.** Your file is analysed in memory and
  released when you close the tab. Nothing is persisted.
- **It is not a replacement for domain knowledge.** Statistics tell you
  *what*; only you know *why*. Treat the output as a starting point, not a
  conclusion.

## Who is this for?

- **Data scientists** who just received a new dataset and want a 60-second
  overview before opening a notebook.
- **Analysts** who need to sanity-check a partner's CSV before building a
  report.
- **Product managers and founders** who want to understand a data export
  without setting up a Python environment.
- **Students and teachers** who need a quick demonstration of what EDA looks
  like on real data.
- **Journalists and researchers** evaluating an unfamiliar open dataset.

If you already have a notebook open and pandas imported, you don't need this.
If you have a fresh file and want to know whether it deserves a deeper look,
you do.

## What you can do with it

- Upload a **CSV** (`.csv`) or **Excel** (`.xlsx`) file, up to 50&nbsp;MB.
- Get an instant **health report**: rows, columns, dtypes, missing cells,
  duplicated values per column, duplicate rows.
- See **numeric column statistics**: minimum, maximum, mean, median, standard
  deviation.
- Inspect **categorical column summaries**: mode, unique-value counts, top
  values.
- Review a **preview** of the first rows of your data.
- For multi-sheet Excel files, pick which sheet to analyse.
- Explore **interactive distributions and correlations** in the
  Visualisations tab.

## How it works

1. **You upload a file.** Drag a CSV or XLSX onto the page, or click to
   browse.
2. **The server analyses it in memory.** Your file is parsed into a pandas
   DataFrame and summarised on the spot.
3. **The browser renders the report.** Tables and charts appear in seconds.
   The file is released from memory when your session ends.

There is no signup, no sign-in, and no data is written to disk on our side.

## Privacy and data handling

- Your file is processed **in memory only**. It is never written to disk,
  never logged, and never stored.
- Uploads travel over **HTTPS**.
- We do not use your data to train any model.
- We do not share your data with third parties.

## Current limits

- **Files up to 50&nbsp;MB** (roughly 500,000 rows of typical tabular data).
- **CSV and XLSX only.** JSON, Parquet and SQL dumps are on the roadmap.
- **No persistent storage.** If you close the tab, the analysis is gone;
  nothing is saved server-side.

## Frequently asked questions

### Is it free?

Yes. Upload, analyse, leave. No signup required.

### Do I need to install anything?

No. It runs entirely in your browser. A modern browser and an internet
connection are the only requirements.

### What file formats are supported?

CSV (comma-separated values) and Excel (`.xlsx`). If your file uses a
different delimiter or text encoding, open it in Excel or a text editor and
re-export as UTF-8 CSV.

### Is my data safe?

Your file is analysed in memory and never persisted. See the
**Privacy and data handling** section above for the full policy.

### Can I analyse files larger than 50&nbsp;MB? (also: "413 Payload too large")

No. 50&nbsp;MB is a hard cap, and uploading a larger file surfaces as a
**413 Payload too large** error. Take a random sample of your data and
upload the smaller file instead — for most EDA questions (shape, missing
values, distributions, correlations) a representative sample gives you
the same answers as the full dataset.

### My Excel file has multiple sheets — which one gets analysed?

The first sheet by default. A sheet selector appears after upload so you
can switch.

### How is this different from pandas-profiling, ydata-profiling or Sweetviz?

Those are excellent Python libraries you import into a notebook. This is a
hosted tool that runs without installation, aimed at quick one-off
analyses. If you're already in a Jupyter environment, use ydata-profiling.
If you're not, use this.

### Can I export the report?

Not yet. The current report is interactive in the browser only. PDF and
Markdown export are on the roadmap.