Data 102: Inference

Data102 Notes #

Here are my notes for the Fall 2022 offering of Data 102, Berkeley’s Inference for Data Science course.

Data 102 explores two major concepts: making decisions under uncertainty and modeling the real world. This is all about making assumptions– here are some definitions:

Frequentist: $y$ (data) is random, $\theta$ (parameter) is fixed
Bayesian: $y$ is random, $\theta$ is random
Parametric: Make assumptions about the relationship between $\theta$ and $y$ , then use these assumptions to find the best value of $\theta$ given $y$
Nonparametric: Don’t make any assumptions, and find any good function $f$ such that $\theta = f(y)$

Table of Contents #

Binary Decision Making

Binary Decision Making is the simplest kind of decision we can make: 1 or 0, yes or now, true or...

11/19/2023

: Confusion matrix, sensitivity, specificity, TPR/FPR/FNR/FDP etc.
Hypothesis Testing

Hypothesis Testing Hypothesis testing is a form of (do we accept or reject the null hypothesis?). Formulate null hypothesis, alternate hypothesis,...

11/19/2023

: Null/alternative hypotheses, multiple hypothesis testing, controlling FWER/FDR, online decision making, likelihood ratios
Decision Theory

So far, in and , we've explored how to make as few mistakes as possible when making binary predictions. Intro...

11/19/2023

: Loss functions, risk, bias-variance tradeoff
Parameter Estimation

Suppose we observe $n$ data points ( $x1$ to $xn$ ). Let $\theta$ be some unknown parameter that describes the distribution the data...

11/19/2023

: Likelihood, MLE, Bayesian parameter estimation, Bayesian hierarchical models
Sampling

Intro In practice, getting the exact probability of an inference is not required as long as we get a rough estimate...

11/19/2023

: Markov chains, MCMC, Metropolis-Hastings, Rejection sampling, Gibbs sampling
Regression and GLMs

Regression and GLMs

Posterior Predictive Distribution Posterior Predictive Distribution: "if we saw some data, what future data might we expect?" $P(x{n+1}|x1,\cdots,xn)$ = $\int P(x{n+1}|\theta)P(\theta|x1,\cdots,xn)d\theta$ ...

11/19/2023

: Generalized linear models, posterior predictive check
Nonparametric Methods

What does nonparametric mean? Nonparametric methods make no assumptions about the distribution of the data or parameters; the null hypothesis is...

11/19/2023

: K-Nearest Neighbors, decision trees, random forests, neural networks, gradient ascent/descent
Interpretability

What do we look for in predictions? Accuracy: We want predictions to be close to the true values. Simplicity: We want the...

11/19/2023

: Interpretability, explainability
Causality

Prediction vs Causality Prediction: using $X$ data, can we guess what $y$ will be? Causation: does X cause y to...

11/19/2023

: Colliders, confounders, structural causal models, risk ratios, potential outcomes framework
Concentration Inequalities

The goal of concentration inequalities is to provide bounds on the probability of a random variable taking values in its...

11/19/2023

: Markov, Chebychev, Chernoff, Hoeffding
Bandits

Main idea: making repeated decisions based on feedback, factoring in the tradeoff between exploring new decisions or keeping existing good...

11/19/2023

: Multi-Armed Bandit Framework, UCB/ETC, Thomson Sampling, Regret
Markov Decision Processes

What is a Markov Decision Process? A Markov Decision Process is a Markov model that solves nondeterministic search problems (where an...

11/19/2023

: Value iteration, Q-value iteration, Policy iteration
Reinforcement Learning

Introduction Reinforcement Learning (RL) is an example of online planning, where agents have no prior knowledge of rewards or transitions and...

11/19/2023

: Q-Learning

How to contribute #

See the contributing guide

for more details!

For the most part, these notes should be pretty complete in terms of content, but could use some cleaning up (as well as more examples).

Credits #

Ben Cuan

13,580

🗒️ Ben's Notes

Data 102: Inference

Data102 Notes #

Table of Contents #

Regression and GLMs

How to contribute #

Contributing

Credits #

Backlinks

Interactive Graph