Highlights for discussion (section 2.1 - 2.4)

• Page 31: The difference between frequentist interpretation and Bayesian interpretation
• Some event will only make sense to interpret a probability in the Bayesian way, for example, weather prediction
• Page 32: Model uncertainty vs. data uncertainty
• The importance of distinguishing different types of uncertainty in active learning
• Page 33: random variables $$X$$
• Sample space $$\mathcal{X}$$ (or state space): all possible outcomes
• Event: a set of outcomes from the sample space
• Page 33: Discrete random variables
• The sample space $$\mathcal{X}$$ is finite or countably infinite
• Page 34: Continuous random variables
• Cumulative distribution function (cdf): $$P(x)=Pr(X\leq x)$$
• Probability density function (pdf): $$p(x) = \frac{d}{dx}P(x)$$
• Quantiles: if $$P(x_q) = Pr(X\leq x_q) = q$$, then $$x_q$$ is called the $$q$$-th quantile of $$P$$
• Page 36: Sets of related random variables
• Joint distribution
• Marginal distribution
• Conditional distribution
• Chain rule of probability
• Page 37: Independence and conditional independence
• Independence: $$p(X,Y)=p(X)\cdot p(Y)$$
• Conditional independence: $$p(X,Y\mid Z) = p(X\mid Z)\cdot p(Y\mid Z)$$
• Page 38: Moments of a distribution
• Page 41: Bayes’ rule
• Formulation: $$p(H=h\mid Y=y)=\frac{p(H=h)p(Y=y\mid H=h)}{p(Y=y)}$$
• Prior distribution, likelihood, posterior distribution, marginal distribution
• Page 45: Bernoulli distribution
• Definition of a Bernoulli distribution $$\text{Ber}(y\mid\theta) = \theta^{y}(1-\theta)^{1-y}$$, where $$y\in{0,1}$$
• Mean of a Bernoulli distribution $$E(Y) = P(Y=1) = \theta$$
• Page 46: Binomial distribution
• A generalized Bernoulli distribution with $$N > 1$$
• Mean $$N\cdot\theta$$

Questions for section 2.1 - 2.4

• What is the difference between the frequentist interpretation of probability and the Bayesian interpretation? Do you have some examples in which Bayesian interpretation fits better than the frequentist one?
• What are the moments of a distribution? Why do we need them?
• What is a mode of a distribution? Any possible relationship between a mode and the mean of a distribution?
• What is the prior distribution? What about posterior distribution?
• The definition of Bernoulli distribution? What is the relation between the single parameter $$\theta$$, the expectation of $$Y$$, and $$P(Y=1)$$?
• The definition of a binomial distribution? The expectation of a binomial random variable?
• Can you derive the useful properties of the sigmoid function in table 2.3? Some of those are useful when analyzing logistic regression models and some simple neural network models.