Five articles from ‘The Unofficial Google Data Science Blog’

Wencao Yang
1 min readNov 8, 2021

--

The Unofficial Google Data Science Blog is a good start for every data science beginner. Here, I recommend five technical articles from the blog that I think are the most informative and inspirational, covering statistics, sampling, ab test, time series, etc.

  1. Importance sampling

link: https://www.unofficialgoogledatascience.com/2019/08/estimating-prevalence-of-rare-events.html

This blog introduces the importance sampling and why it can reduce variance and is unbiased

2. Poisson bootstrap

link: https://www.unofficialgoogledatascience.com/2015/08/an-introduction-to-poisson-bootstrap26.html

For bootstrap, each observation has a Binomial(n,1/n) distribution, and we can use Poisson bootstrap for the binomial distribution since Binomial(n,1/n) => Poisson(1) when n is large. There is no need to know the n ahead of time by doing this way.

3. Treatment adoption in ab tests

link: https://www.unofficialgoogledatascience.com/2018/03/quicker-decisions-in-imperfect-mobile.html

This post discusses the treatment adoption in ab test with mobile experiments example (Intent to Treat (ITT) and Treatment on the Treated (TOT)) and then discusses propensity score, inverse propensity sampling

4. Structure of network and network effect in ab tests

link: https://www.unofficialgoogledatascience.com/2018/01/designing-ab-tests-in-collaboration.html

This blog uses the google cloud platform as an example. It shows the steps to choose the unit for the hierarchical structure of the collaboration network. The key is to stratify the component by size and usage.

5. Time series

link: https://www.unofficialgoogledatascience.com/2017/04/our-quest-for-robust-time-series.html

This blog shows a very general approach for time series analysis:

clean data

adjust holiday, seasonality, and day-of-week effect

disaggregation and reconciliation

ensemble models

--

--

Wencao Yang
Wencao Yang

Written by Wencao Yang

Data Scientist & Physics PhD

No responses yet