Five articles from ‘The Unofficial Google Data Science Blog’
The Unofficial Google Data Science Blog is a good start for every data science beginner. Here, I recommend five technical articles from the blog that I think are the most informative and inspirational, covering statistics, sampling, ab test, time series, etc.
- Importance sampling
link: https://www.unofficialgoogledatascience.com/2019/08/estimating-prevalence-of-rare-events.html
This blog introduces the importance sampling and why it can reduce variance and is unbiased
2. Poisson bootstrap
link: https://www.unofficialgoogledatascience.com/2015/08/an-introduction-to-poisson-bootstrap26.html
For bootstrap, each observation has a Binomial(n,1/n) distribution, and we can use Poisson bootstrap for the binomial distribution since Binomial(n,1/n) => Poisson(1) when n is large. There is no need to know the n ahead of time by doing this way.
3. Treatment adoption in ab tests
link: https://www.unofficialgoogledatascience.com/2018/03/quicker-decisions-in-imperfect-mobile.html
This post discusses the treatment adoption in ab test with mobile experiments example (Intent to Treat (ITT) and Treatment on the Treated (TOT)) and then discusses propensity score, inverse propensity sampling
4. Structure of network and network effect in ab tests
link: https://www.unofficialgoogledatascience.com/2018/01/designing-ab-tests-in-collaboration.html
This blog uses the google cloud platform as an example. It shows the steps to choose the unit for the hierarchical structure of the collaboration network. The key is to stratify the component by size and usage.
5. Time series
link: https://www.unofficialgoogledatascience.com/2017/04/our-quest-for-robust-time-series.html
This blog shows a very general approach for time series analysis:
clean data
adjust holiday, seasonality, and day-of-week effect
disaggregation and reconciliation
ensemble models