Thursday, August 27, 2020

Blocking in Randomized Field Trials

When I first learned about randomized complete block designs as an undergraduate to me it was just another set of computations to memorize for the test. (this was before I understood statistics as a way of thinking not a box of tools). However it is an important way to think about your experiment.

In Steel and Torrie's well known experimental design text, they discuss:

"in many situations it is known beforehand that certain experimental units, if treated alike, will behave differently....designs or layouts can be constructed so that the portion of variability attributed to the recognized source can be measured and thus excluded from the experimental error." 

In other words, blocking improves the precision of estimates in randomized designs. In experimental research, blocking often implies randomly assigning treatment and control groups within blocks (or strata) based on a set of observed pre-treatment covariates. By guaranteeing that treatment and control units are identical in their covariate values, we eliminate the chance that differences in covariates among treatment and control units will impact inferences. 

With a large enough sample size and successfully implemented randomization, we expect treatment and control units to be 'balanced' at least on average across covariate values. However, it is always wise to assess covariate balance after randomization to ensure that this is the case. 

One argument for blocking is to prevent such scenarios. Even in cases where randomization is deemed to be successfully implemented, treatment and control units will have similar covariate values on average or in expectation. But with block randomization treatment and control units are guaranteed to be identical across covariate values. 

Blocking vs. Matching and Regression

It is common practice, if we find imbalances or differences in certain covariate or control variables that we 'control' for this after the fact often using linear regression. Gerber and Green discuss blocking extensively. They claim however that for experiments with sample sizes with more than 100 observations, the gains in precision from block randomization over a completely randomized design (with possible regression adjustments with controls for imbalances) become negligible (citing Rosnberger and Lachin, 2002). However they caution. Having to resort to regression with controls introduces the temptation to interpret control variables causally in ways that are inappropriate (see also Keele, 2020)

In observational settings where randomization does not occur, we often try to mimic the covariate balance we would get in a randomized experiment through matching or regression. But there are important differences. Regression and matching creates comparisons where covariate values are the same across treatment and control units in expectation or 'on average' for observable and measurable covariate values. Kink and Nielson are critical of propensity score matching in that they claim it attempts to mimic a completely randomized design when we should be striving for observational methods that attempt to target blocked randomized designs.

"The weakness of PSM comes from its attempts to approximate a completely randomized experiment, rather than, as with other matching methods, a more efficient fully blocked randomized experiment. PSM is thus uniquely blind to the often large portion of imbalance that can be eliminated by approximating full blocking with other matching methods."


References:

Gerber, Alan S., and Donald P. Green. 2012. Field Experiments: Design, Analysis, and Interpretation. New York: W.W. Norton

Keele, L., Stevenson, R., & Elwert, F. (2020). The causal interpretation of estimated associations in regression models. Political Science Research and Methods, 8(1), 1-13. doi:10.1017/psrm.2019.31

Gary King and Richard Nielsen. 2019. “Why Propensity Scores Should Not Be Used for Matching.” Political Analysis, 27, 4. Copy at https://j.mp/2ovYGsW

Imai K, King G, Stuart EA. Misunderstandings among experimentalists and observationalists in causal inference. Journal of the Royal Statistical Society Series A. 2008;171(2):481–502.

Principles and Procedures of Statistics: A Biometrical Approach. Robert George Douglas Steel, James Hiram Torrie, David A. Dickey. McGraw-Hill .1997

Saturday, August 22, 2020

The Benefits of Graduate Study

Not everyone needs to be a PhD holding rocket scientist to do complex analytical work in applied fields, so what are some additional benefits of graduate study? What if you just skip the time, money and energy spent in graduate school and went straight to writing code?

This made me think of a Talking Biotech podcast with Kevin Folta discussing the movie Food Evolution. Toward the end they discussed some critiques of the film, and a common critique about research in general is bias due to conflicts of interest. Kevin States:

"I've trained for 30 years to be able to understand statistics and experimental design and interpretation...I'll decide based on the quality of the data and the experimental design....that's what we do."

Besides taking on the criticisms of science, this emphasized two important points.

1) Graduate study teaches you to understand statistics and experimental design and interpretation and this requires a new way of thinking. At the undergraduate level I learned some basics that were quite useful in terms of empirical work. In graduate school I learned what is analogous to a new language. The additional properties of estimators, proofs, and theorems taught in graduate statistics courses suddenly made the things I learned before make better sense. This background helped me to translate and interpret other people's work and learn from it, and learn new methodologies or extend others. But it was the seminars and applied research that made it come to life. Learning to 'do science' through new ways of thinking about how to solve problems through statistics and experimental design. And interpretation as Kevin says. 

2) Graduate study is an extendable framework. Learning and doing statistics is a career long process. This recognizes the gulf between textbook and applied statistics and econometrics.

 

Blocking in Randomized Field Trials

When I first learned about randomized complete block designs as an undergraduate to me it was just another set of computations to memorize f...