## What is exploratory data analysis? **Exploratory data analysis (EDA)** is the first step you take with any new dataset: understanding its shape, quality and character before committing to a model, a chart or a business decision. The term was coined by the statistician **John Tukey in 1977**, and it remains the backbone of every modern data science workflow. Concretely, EDA answers questions like: - How many rows and columns are there, and what types are they? - How much data is missing, and where is it concentrated? - Which values repeat? Are there obvious duplicates or near-duplicates? - What do the distributions of each numeric column look like? - Which columns are correlated, and which are independent? - Are there outliers, impossible values, or suspicious clusters? Good EDA takes minutes on a clean dataset and hours on a messy one. This tool compresses the first 80% of that work into a single upload, so you can decide in seconds whether a dataset is worth a deeper look. ## What this tool is *not* Being explicit about limits saves everyone time: - **It is not a data-cleaning tool.** We *report* missing values and duplicates; we do not impute, dedupe or transform your data. Use pandas, OpenRefine or a spreadsheet for that. - **It is not a machine learning platform.** No models are trained, no predictions are made. EDA is the step *before* modelling. - **It is not a BI dashboard.** There are no saved reports, no scheduled refreshes, no shareable links. Each upload is an isolated session. - **It is not a data storage service.** Your file is analysed in memory and released when you close the tab. Nothing is persisted. - **It is not a replacement for domain knowledge.** Statistics tell you *what*; only you know *why*. Treat the output as a starting point, not a conclusion. ## Who is this for? - **Data scientists** who just received a new dataset and want a 60-second overview before opening a notebook. - **Analysts** who need to sanity-check a partner's CSV before building a report. - **Product managers and founders** who want to understand a data export without setting up a Python environment. - **Students and teachers** who need a quick demonstration of what EDA looks like on real data. - **Journalists and researchers** evaluating an unfamiliar open dataset. If you already have a notebook open and pandas imported, you don't need this. If you have a fresh file and want to know whether it deserves a deeper look, you do. ## What you can do with it - Upload a **CSV** (`.csv`) or **Excel** (`.xlsx`) file, up to 50 MB. - Get an instant **health report**: rows, columns, dtypes, missing cells, duplicated values per column, duplicate rows. - See **numeric column statistics**: minimum, maximum, mean, median, standard deviation. - Inspect **categorical column summaries**: mode, unique-value counts, top values. - Review a **preview** of the first rows of your data. - For multi-sheet Excel files, pick which sheet to analyse. - Explore **interactive distributions and correlations** in the Visualisations tab. ## How it works 1. **You upload a file.** Drag a CSV or XLSX onto the page, or click to browse. 2. **The server analyses it in memory.** Your file is parsed into a pandas DataFrame and summarised on the spot. 3. **The browser renders the report.** Tables and charts appear in seconds. The file is released from memory when your session ends. There is no signup, no sign-in, and no data is written to disk on our side. ## Privacy and data handling - Your file is processed **in memory only**. It is never written to disk, never logged, and never stored. - Uploads travel over **HTTPS**. - We do not use your data to train any model. - We do not share your data with third parties. ## Current limits - **Files up to 50 MB** (roughly 500,000 rows of typical tabular data). - **CSV and XLSX only.** JSON, Parquet and SQL dumps are on the roadmap. - **No persistent storage.** If you close the tab, the analysis is gone; nothing is saved server-side. ## Frequently asked questions ### Is it free? Yes. Upload, analyse, leave. No signup required. ### Do I need to install anything? No. It runs entirely in your browser. A modern browser and an internet connection are the only requirements. ### What file formats are supported? CSV (comma-separated values) and Excel (`.xlsx`). If your file uses a different delimiter or text encoding, open it in Excel or a text editor and re-export as UTF-8 CSV. ### Is my data safe? Your file is analysed in memory and never persisted. See the **Privacy and data handling** section above for the full policy. ### Can I analyse files larger than 50 MB? (also: "413 Payload too large") No. 50 MB is a hard cap, and uploading a larger file surfaces as a **413 Payload too large** error. Take a random sample of your data and upload the smaller file instead — for most EDA questions (shape, missing values, distributions, correlations) a representative sample gives you the same answers as the full dataset. ### My Excel file has multiple sheets — which one gets analysed? The first sheet by default. A sheet selector appears after upload so you can switch. ### How is this different from pandas-profiling, ydata-profiling or Sweetviz? Those are excellent Python libraries you import into a notebook. This is a hosted tool that runs without installation, aimed at quick one-off analyses. If you're already in a Jupyter environment, use ydata-profiling. If you're not, use this. ### Can I export the report? Not yet. The current report is interactive in the browser only. PDF and Markdown export are on the roadmap.