Highlights for discussion (section 2.1 - 2.4)

  • Page 31: The difference between frequentist interpretation and Bayesian interpretation
    • Some event will only make sense to interpret a probability in the Bayesian way, for example, weather prediction
  • Page 32: Model uncertainty vs. data uncertainty
    • The importance of distinguishing different types of uncertainty in active learning
  • Page 33: random variables \( X \)
    • Sample space \( \mathcal{X} \) (or state space): all possible outcomes
    • Event: a set of outcomes from the sample space
  • Page 33: Discrete random variables
    • The sample space \( \mathcal{X} \) is finite or countably infinite
  • Page 34: Continuous random variables
    • Cumulative distribution function (cdf): \( P(x)=Pr(X\leq x) \)
    • Probability density function (pdf): \( p(x) = \frac{d}{dx}P(x) \)
    • Quantiles: if \( P(x_q) = Pr(X\leq x_q) = q \), then \( x_q \) is called the \(q\)-th quantile of \(P\)
  • Page 36: Sets of related random variables
    • Joint distribution
    • Marginal distribution
    • Conditional distribution
    • Chain rule of probability
  • Page 37: Independence and conditional independence
    • Independence: \( p(X,Y)=p(X)\cdot p(Y) \)
    • Conditional independence: \( p(X,Y\mid Z) = p(X\mid Z)\cdot p(Y\mid Z) \)
  • Page 38: Moments of a distribution
  • Page 41: Bayes’ rule
    • Formulation: \( p(H=h\mid Y=y)=\frac{p(H=h)p(Y=y\mid H=h)}{p(Y=y)} \)
    • Prior distribution, likelihood, posterior distribution, marginal distribution
  • Page 45: Bernoulli distribution
    • Definition of a Bernoulli distribution \( \text{Ber}(y\mid\theta) = \theta^{y}(1-\theta)^{1-y} \), where \( y\in{0,1} \)
    • Mean of a Bernoulli distribution \( E(Y) = P(Y=1) = \theta \)
  • Page 46: Binomial distribution
    • A generalized Bernoulli distribution with \( N > 1 \)
    • Mean \( N\cdot\theta \)

Questions for section 2.1 - 2.4

  • What is the difference between the frequentist interpretation of probability and the Bayesian interpretation? Do you have some examples in which Bayesian interpretation fits better than the frequentist one?
  • What are the moments of a distribution? Why do we need them?
  • What is a mode of a distribution? Any possible relationship between a mode and the mean of a distribution?
  • What is the prior distribution? What about posterior distribution?
  • The definition of Bernoulli distribution? What is the relation between the single parameter \( \theta \), the expectation of \( Y \), and \( P(Y=1) \)?
  • The definition of a binomial distribution? The expectation of a binomial random variable?
  • Can you derive the useful properties of the sigmoid function in table 2.3? Some of those are useful when analyzing logistic regression models and some simple neural network models.