16  Exploratory Data Analysis (EDA)

SETTLING IN

After today, we’ll be focusing on the course project. You’ll work in groups of 3-4 on these projects. Each group will pick and analyze their own dataset. The people you’re sitting with today are NOT necessarily your project groups! BUT let’s practice some brainstorming and get to know what other people are thinking about. Specifically, share the following with each other. And don’t think too hard! Just share what’s at the top of mind today.

  • What is your major / minor / concentration, declared or intended?
  • What are some personal hobbies or passions or things you’ve been thinking about or things you’d like to learn more about?



Learning goals
  • Understand the first steps that should be taken when you encounter a new data set
  • Develop comfort in knowing how to explore data to understand it
  • Develop comfort in formulating research questions





Additional resources

Read:





WHERE ARE WE?!? Starting a data project

This final, short unit will help prepare us as we launch into course projects. In order to even start these projects, we need some sense of the following:

  1. data import: how to find data, store data, load data into RStudio, and do some preliminary data checks & cleaning

  2. exploratory data analysis (EDA)





16.1 Warm-up

What is EDA?!

EDA is a preliminary, exploratory, and iterative analysis of our data relative to our general research questions of interest.





How is this different than what we’ve been doing?

We’ve been focusing on various tools needed for various steps within an EDA. Now we’ll bring them all together in a more cohesive process.





EXAMPLE

Peng example





EDA essentials

  • Start small.
    We often start with lots of data – some of it useful, some of it not. To start:

    • Focus on just a small set of variables of interest.
    • Break down your research question into smaller pieces.
    • Obtain the most simple numerical & visual summaries that are relevant to your research questions.
  • Ask questions.
    We typically start a data analysis with at least some general research questions in mind. In obtaining numerical and graphical summaries that provide insight into these questions, we must ask:

    • what questions do these summaries answer?
    • what questions don’t these summaries answer?
    • what’s surprising or interesting here?
    • what follow-up questions do these summaries provoke?
  • Play! Be creative. Don’t lock yourself into a rigid idea of what should happen.

  • Repeat.
    Repeat this iterative questioning and analysis process as necessary, letting our reflections on the previous questions inspire our next steps.





16.2 Exercises

Do the Homework 7 qmd exercises.