EDA

What
When
Where
Who
Why
How
How many

What is EDA (Exploratory Data Analysis)?

EDA is the process of analyzing and visualizing data to understand its underlying patterns, relationships, and trends. It is often the first step in data analysis and helps to identify insights and potential issues in the data.

What

What is EDA (Exploratory Data Analysis)?

EDA is the process of analyzing and visualizing data to understand its underlying patterns, relationships, and trends. It is often the first step in data analysis and helps to identify insights and potential issues in the data.

When is EDA used?

EDA is used at the beginning of a data analysis project, to gain an initial understanding of the data and identify areas of interest or concern. It can also be used throughout the analysis process to further explore and refine the data.

Where is EDA used?

EDA is used in a wide range of fields, including science, engineering, business, and social sciences. It is used whenever data needs to be analyzed and insights need to be extracted.

Who uses EDA?

EDA is used by data analysts, data scientists, researchers, and anyone who needs to analyze and understand data.

Why is EDA important?

EDA is important because it allows us to gain a deeper understanding of the data and identify potential issues or patterns that may not be immediately apparent. It can also help us to formulate hypotheses and guide further analysis.

How does EDA work?

EDA typically involves visualizing the data through graphs and charts, calculating summary statistics, identifying outliers and missing data, and exploring relationships between variables. It is an iterative process, where new insights may lead to further analysis and refinement of the data.

How many steps are there in EDA?

There is no fixed set of steps in EDA, as it is a flexible and iterative process. However, common techniques used in EDA include data visualization, summary statistics, outlier detection, missing data handling, correlation analysis, and data transformations.

  1. Visualization: EDA often involves creating visualizations of the data, such as scatter plots, histograms, and box plots, to identify patterns and relationships.

  2. Summary statistics: EDA may involve calculating summary statistics such as mean, median, and standard deviation to gain a basic understanding of the data.

  3. Outlier detection: EDA may involve identifying outliers, which are data points that are significantly different from the rest of the data and may need to be investigated further.

  4. Missing data handling: EDA may involve identifying missing data and determining how to handle it, such as by imputing missing values or removing data points with too much missing data.

  5. Correlation analysis: EDA may involve calculating correlations between variables to identify potential relationships and dependencies in the data.

  6. Data transformations: EDA may involve transforming the data in various ways, such as by normalizing or scaling the data, to make it more amenable to analysis.