Reading 2: Summary Statistics

Reading 2: Summary Statistics#

For the class on Wednesday, January 17th

Reading assignments#

  1. Read the following sections of [Mur22] (sections based on the 2023-06-21 draft PDF file)

    • Sec. 2.2.5 “Moments of a distribution”, and all of its subsections (2.2.5.1–2.2.5.4)

    • Sec. 2.2.6 “Limitations of summary statistics”

    • Sec. 3.1 “Joint distributions for multiple random variables”, and all of its subsections (3.1.1–3.1.5)

Questions#

Hint

Submit your answer on Canvas. Due at noon, Wednesday, Jan 17th.

  1. List any concepts that confuse you from your reading. Explain why they confuse you. If nothing confuses you, briefly summarize what you have learned from this reading assignment.

  2. Sec. 3.1.4 of [Mur22] discusses a few examples of spurious correlation due to hidden common causes. Try explaining this phenomenon using the concept of conditional independence (see Sec. 2.1.3.6 of [Mur22]).

Discussion Preview#

Note

We will discuss the following questions in class. They are included here so that you have a chance to think about them before class. You need not submit your answers as part of this assignment.

  1. Sec. 2.2.5 of [Mur22] discusses the mean and variance of a distribution, and Sec. 2.2.6 mentions mean and variance of data points. These two concepts: summary statistics of a distribution (random variable) vs. summary statistics of data points (samples) are closely related but distinct in nature. Discuss how they differ.

  2. Give some examples in physics and astronomy (or your field of study) of events (or random variables) that are uncorrelated but dependent (cf. Sec. 3.1.3 of [Mur22]).

  3. You are trying to measure the resistance of a resistor. You put the resistor in a simple DC circuit, and then measure both the voltage across the resistor and the current through the resistor. You then use the Ohm’s law \(V=IR\) to calculate the resistance. If your voltage and current measurements both have an uncertainty of 1%, how would you estimate the uncertainty of the calculated resistance? What assumptions do you have to make for this estimation? How would you improve your estimation?