__Chapter 02 Highlights and Questions (I)__

Highlights for discussion (section 2.1 - 2.4)

- Page 31: The difference between frequentist interpretation and Bayesian interpretation
- Some event will only make sense to interpret a probability in the Bayesian way, for example, weather prediction

- Page 32: Model uncertainty vs. data uncertainty
- The importance of distinguishing different types of uncertainty in active learning

- Page 33: random variables \( X \)
- Sample space \( \mathcal{X} \) (or state space): all possible outcomes
- Event: a set of outcomes from the sample space

- Page 33: Discrete random variables
- The sample space \( \mathcal{X} \) is finite or countably infinite

- Page 34: Continuous random variables
- Cumulative distribution function (cdf): \( P(x)=Pr(X\leq x) \)
- Probability density function (pdf): \( p(x) = \frac{d}{dx}P(x) \)
- Quantiles: if \( P(x_q) = Pr(X\leq x_q) = q \), then \( x_q \) is called the \(q\)-th quantile of \(P\)

- Page 36: Sets of related random variables
- Joint distribution
- Marginal distribution
- Conditional distribution
- Chain rule of probability

- Page 37: Independence and conditional independence
- Independence: \( p(X,Y)=p(X)\cdot p(Y) \)
- Conditional independence: \( p(X,Y\mid Z) = p(X\mid Z)\cdot p(Y\mid Z) \)

- Page 38: Moments of a distribution
- Page 41: Bayesâ€™ rule
- Formulation: \( p(H=h\mid Y=y)=\frac{p(H=h)p(Y=y\mid H=h)}{p(Y=y)} \)
- Prior distribution, likelihood, posterior distribution, marginal distribution

- Page 45: Bernoulli distribution
- Definition of a Bernoulli distribution \( \text{Ber}(y\mid\theta) = \theta^{y}(1-\theta)^{1-y} \), where \( y\in{0,1} \)
- Mean of a Bernoulli distribution \( E(Y) = P(Y=1) = \theta \)

- Page 46: Binomial distribution
- A generalized Bernoulli distribution with \( N > 1 \)
- Mean \( N\cdot\theta \)

Questions for section 2.1 - 2.4

- What is the difference between the frequentist interpretation of probability and the Bayesian interpretation? Do you have some examples in which Bayesian interpretation fits better than the frequentist one?
- What are the moments of a distribution? Why do we need them?
- What is a mode of a distribution? Any possible relationship between a mode and the mean of a distribution?
- What is the prior distribution? What about posterior distribution?
- The definition of Bernoulli distribution? What is the relation between the single parameter \( \theta \), the expectation of \( Y \), and \( P(Y=1) \)?
- The definition of a binomial distribution? The expectation of a binomial random variable?
- Can you derive the useful properties of the sigmoid function in table 2.3? Some of those are useful when analyzing logistic regression models and some simple neural network models.