Lab 4: Classical Inference II: Estimating and Comparing Distributions#

For the class on Wednesday, January 24th

Tip

Looking for hints?

A. Standard Error of Statistics#

The probability density function of an exponential distribution has the form of:

\[ p(x; \lambda) = \lambda e^{-\lambda x},\text{ defined for all } x \geq 0. \]

Note that when using np.random.default_rng().exponential, the input argument scale should be set to \(1/\lambda\).

A1. Direct simulation#

Tip

The implementation here will be very similar to Lab 2 Part A! You can consider reuse some of your code. The questions and focus will be different though.

  1. Generate \(n=100\) random variates from an exponential distribution with \(\lambda = 0.5\) (scale=2).

  2. Calculate the sample mean, sample median, and sample variance. Record these values.

  3. Repeat [Steps 1 and 2] \(k=10,000\) times.

  4. Calculate the standard devitation of the \(k\) sample means, \(k\) sample medians, and \(k\) sample variances. These should be very good estimates of the Standard Error of the Mean, the Standard Error of the Median, and the Standard Error of the Variance.

# Include your implementation for A1 here

Questions for Part A1#

Based on the result you got from A1, write down the formulae (in terms of \(n\) and \(\lambda\)) for

  • the Standard Error of the Mean,

  • the Standard Error of the Median, and

  • the Standard Error of the Variance

for the exponential distribution.

You may want to change the values of \(n\) and \(\lambda\) and rerun your simulation so that you can verify your answers.


// Add your answers here


A2. Bootstrap#

Pretend that you didn’t know that we are working with the exponential distribution, and that we only had one realization of \(n=100\) random variates.

Use the bootstrap resampling method to estimate the Standard Error of the Mean, the Standard Error of the Median, and the Standard Error of the Variance.

Questions for Part A2#

Do you expect the results from the bootstrap resampling method agree with what you obtained from A1? Why or why not? Then carry out A2 to verify your answer.


// Add your answers here


# Include your implementation for A2 here
import numpy as np

###############
# Do not change this line
data = np.random.default_rng(1234).exponential(scale=2, size=100)
###############

# The line below give you *one* bootstrap sample.
data_bootstrap = np.random.default_rng().choice(data, size=len(data))

B. Comparing samples#

You have two sets of data points.

If you don’t know what distribution those data points come from, typically you can use the Kolmogorov-Smirnov test (K-S test) to check if the underlying distributions of the two data sets are the same.

You can use the scipy.stats.kstest function, which implements the K-S test and will return both the test statistic and the corresponding \(p\)-value.

However, some simpler tests can do the job when you already know that the two data sets were both sampled from, say, an exponential distribution, but just potentially with different value of \(\lambda\).

Questions for Part B#

  1. For the latter case (both come from an exponential distribution), formulate a simple test to check if the underlying exponential distributions of the two data sets have the same value of \(\lambda\) or not.

  2. Implement both your test and the K-S test on the data sets included in the implementation cell below. Test between 1 vs. 2, 1 vs. 3, and 1 vs. 4. Interpret the results from both your test and the K-S test, and briefly explain if they agree with your expectation.


// Add your answers here


# Include your implementation for Part B here
import numpy as np
from scipy.stats import kstest

#########
# do not change the following four lines
data1 = np.random.default_rng(123).exponential(scale=2, size=100)
data2 = np.random.default_rng(456).exponential(scale=2, size=100)
data3 = np.random.default_rng(789).exponential(scale=1, size=100)
data4 = np.random.default_rng(753).exponential(scale=1.9, size=100)
#########

# TODO: Apply your test from Part B, Q1 on data1 and data2


# TODO: Apply your test from Part B, Q1 on data1 and data3


# TODO: Apply your test from Part B, Q1 on data1 and data4


# TODO: Apply K-S test on data1 and data2


# TODO: Apply K-S test on data1 and data3


# TODO: Apply K-S test on data1 and data4


# Remember to go back and answer Part B, Q2 above!

Final question#

Roughly how much time did you spend on this lab (not including the time you spent in class)?


// Write your answers here


Tip

How to submit this notebook on Canvas?

Once you complete this lab and have all your code, text answers, and all the desired results after running the notebook properly displayed in this notebook, please convert the notebook into HTML format by running the following:

jupyter nbconvert --to html /path/to/labs/04.ipynb 

Then, upload the resulting HTML file to Canvas for the corresponding assignment.