In Steel and Torrie's well known experimental design text, they discuss:
"in many situations it is known beforehand that certain experimental units, if treated alike, will behave differently....designs or layouts can be constructed so that the portion of variability attributed to the recognized source can be measured and thus excluded from the experimental error."
In other words, blocking improves the precision of estimates in randomized designs. In experimental research, blocking often implies randomly assigning treatment and control groups within blocks (or strata) based on a set of observed pre-treatment covariates. By guaranteeing that treatment and control units are identical in their covariate values, we eliminate the chance that differences in covariates among treatment and control units will impact inferences.
With a large enough sample size and successfully implemented randomization, we expect treatment and control units to be 'balanced' at least on average across covariate values. However, it is always wise to assess covariate balance after randomization to ensure that this is the case.
One argument for blocking is to prevent such scenarios. Even in cases where randomization is deemed to be successfully implemented, treatment and control units will have similar covariate values on average or in expectation. But with block randomization treatment and control units are guaranteed to be identical across covariate values.
Blocking vs. Matching and Regression
It is common practice, if we find imbalances or differences in certain covariate or control variables that we 'control' for this after the fact often using linear regression. Gerber and Green discuss blocking extensively. They claim however that for experiments with sample sizes with more than 100 observations, the gains in precision from block randomization over a completely randomized design (with possible regression adjustments with controls for imbalances) become negligible (citing Rosnberger and Lachin, 2002). However they caution. Having to resort to regression with controls introduces the temptation to interpret control variables causally in ways that are inappropriate (see also Keele, 2020)
In observational settings where randomization does not occur, we often try to mimic the covariate balance we would get in a randomized experiment through matching or regression. But there are important differences. Regression and matching creates comparisons where covariate values are the same across treatment and control units in expectation or 'on average' for observable and measurable covariate values. Kink and Nielson are critical of propensity score matching in that they claim it attempts to mimic a completely randomized design when we should be striving for observational methods that attempt to target blocked randomized designs.
"The weakness of PSM comes from its attempts to approximate a completely randomized experiment, rather than, as with other matching methods, a more efficient fully blocked randomized experiment. PSM is thus uniquely blind to the often large portion of imbalance that can be eliminated by approximating full blocking with other matching methods."
References:
References:
Gerber, Alan S., and Donald P. Green. 2012. Field Experiments: Design, Analysis, and Interpretation. New York: W.W. Norton
Keele, L., Stevenson, R., & Elwert, F. (2020). The causal interpretation of estimated associations in regression models. Political Science Research and Methods, 8(1), 1-13. doi:10.1017/psrm.2019.31
Gary King and Richard Nielsen. 2019. “Why Propensity Scores Should Not Be Used for Matching.” Political Analysis, 27, 4. Copy at https://j.mp/2ovYGsW
Imai K, King G, Stuart EA. Misunderstandings among experimentalists and observationalists in causal inference. Journal of the Royal Statistical Society Series A. 2008;171(2):481–502.
Principles and Procedures of Statistics: A Biometrical Approach. Robert George Douglas Steel, James Hiram Torrie, David A. Dickey. McGraw-Hill .1997