Thursday, August 27, 2020

Blocking in Randomized Field Trials

When I first learned about randomized complete block designs as an undergraduate to me it was just another set of computations to memorize for the test. (this was before I understood statistics as a way of thinking not a box of tools). However it is an important way to think about your experiment.

In Steel and Torrie's well known experimental design text, they discuss:

"in many situations it is known beforehand that certain experimental units, if treated alike, will behave differently....designs or layouts can be constructed so that the portion of variability attributed to the recognized source can be measured and thus excluded from the experimental error." 

In other words, blocking improves the precision of estimates in randomized designs. In experimental research, blocking often implies randomly assigning treatment and control groups within blocks (or strata) based on a set of observed pre-treatment covariates. By guaranteeing that treatment and control units are identical in their covariate values, we eliminate the chance that differences in covariates among treatment and control units will impact inferences. 

With a large enough sample size and successfully implemented randomization, we expect treatment and control units to be 'balanced' at least on average across covariate values. However, it is always wise to assess covariate balance after randomization to ensure that this is the case. 

One argument for blocking is to prevent such scenarios. Even in cases where randomization is deemed to be successfully implemented, treatment and control units will have similar covariate values on average or in expectation. But with block randomization treatment and control units are guaranteed to be identical across covariate values. 

Blocking vs. Matching and Regression

It is common practice, if we find imbalances or differences in certain covariate or control variables that we 'control' for this after the fact often using linear regression. Gerber and Green discuss blocking extensively. They claim however that for experiments with sample sizes with more than 100 observations, the gains in precision from block randomization over a completely randomized design (with possible regression adjustments with controls for imbalances) become negligible (citing Rosnberger and Lachin, 2002). However they caution. Having to resort to regression with controls introduces the temptation to interpret control variables causally in ways that are inappropriate (see also Keele, 2020)

In observational settings where randomization does not occur, we often try to mimic the covariate balance we would get in a randomized experiment through matching or regression. But there are important differences. Regression and matching creates comparisons where covariate values are the same across treatment and control units in expectation or 'on average' for observable and measurable covariate values. Kink and Nielson are critical of propensity score matching in that they claim it attempts to mimic a completely randomized design when we should be striving for observational methods that attempt to target blocked randomized designs.

"The weakness of PSM comes from its attempts to approximate a completely randomized experiment, rather than, as with other matching methods, a more efficient fully blocked randomized experiment. PSM is thus uniquely blind to the often large portion of imbalance that can be eliminated by approximating full blocking with other matching methods."


References:

Gerber, Alan S., and Donald P. Green. 2012. Field Experiments: Design, Analysis, and Interpretation. New York: W.W. Norton

Keele, L., Stevenson, R., & Elwert, F. (2020). The causal interpretation of estimated associations in regression models. Political Science Research and Methods, 8(1), 1-13. doi:10.1017/psrm.2019.31

Gary King and Richard Nielsen. 2019. “Why Propensity Scores Should Not Be Used for Matching.” Political Analysis, 27, 4. Copy at https://j.mp/2ovYGsW

Imai K, King G, Stuart EA. Misunderstandings among experimentalists and observationalists in causal inference. Journal of the Royal Statistical Society Series A. 2008;171(2):481–502.

Principles and Procedures of Statistics: A Biometrical Approach. Robert George Douglas Steel, James Hiram Torrie, David A. Dickey. McGraw-Hill .1997

Saturday, August 22, 2020

The Benefits of Graduate Study

Not everyone needs to be a PhD holding rocket scientist to do complex analytical work in applied fields, so what are some additional benefits of graduate study? What if you just skip the time, money and energy spent in graduate school and went straight to writing code?

This made me think of a Talking Biotech podcast with Kevin Folta discussing the movie Food Evolution. Toward the end they discussed some critiques of the film, and a common critique about research in general is bias due to conflicts of interest. Kevin States:

"I've trained for 30 years to be able to understand statistics and experimental design and interpretation...I'll decide based on the quality of the data and the experimental design....that's what we do."

Besides taking on the criticisms of science, this emphasized two important points.

1) Graduate study teaches you to understand statistics and experimental design and interpretation and this requires a new way of thinking. At the undergraduate level I learned some basics that were quite useful in terms of empirical work. In graduate school I learned what is analogous to a new language. The additional properties of estimators, proofs, and theorems taught in graduate statistics courses suddenly made the things I learned before make better sense. This background helped me to translate and interpret other people's work and learn from it, and learn new methodologies or extend others. But it was the seminars and applied research that made it come to life. Learning to 'do science' through new ways of thinking about how to solve problems through statistics and experimental design. And interpretation as Kevin says. 

2) Graduate study is an extendable framework. Learning and doing statistics is a career long process. This recognizes the gulf between textbook and applied statistics and econometrics.

 

Sunday, July 26, 2020

Assessing Balance in Matched and Randomized Studies

(cross posted at Econometric Sense)

The purpose of propensity score matching is to achieve balance between treatment and control groups. It's heuristically an attempt to ‘recreate’ a situation similar to a randomized experiment where all subjects are essentially the same except for the treatment (Thoemmes and Kim, 2011). Matching itself should not be viewed so much as an estimation technique, but a pre-processing step to ensure that members assigned to treatment and control groups have similar covariate distributions 'on average' (Ho et al., 2007)

This understanding of matching often gets lost among practitioners and it is evident in attempts to use statistical significance testing (like t-tests) to assess baseline differences in covariates between treatment and control groups . This is often (mistakenly) done as a means to (1) determine which variables to match on and (2) determine if appropriate balance has been achieved after matching.

Stuart (2010) discusses this:

"Although common, hypothesis tests and p-values that incorporate information on the sample size (e.g., t-tests) should not be used as measures of balance, for two main reasons (Austin, 2007; Imai et al., 2008). First, balance is inherently an in-sample property, without reference to any broader population or super-population. Second, hypothesis tests can be misleading as measures of balance, because they often conflate changes in balance with changes in statistical power. Imai et al. (2008) show an example where randomly discarding control individuals seemingly leads to increased balance, simply because of the reduced power."

Imai et al. (2008) elaborate. Using simulation they demonstrate that:

"The t-test can indicate that balance is becoming better whereas the actual balance is growing worse, staying the same or improving. Although we choose the most commonly used t-test for illustration, the same problem applies to many other test statistics that are used in applied research. For example, the same simulation applied to the Kolmogorov–Smirnov test shows that its p-value monotonically increases as we randomly drop more control units. This is because a smaller sample size typically produces less statistical power and hence a larger p-value"

and

"from a theoretical perspective, balance is a characteristic of the sample, not some hypothetical population, and so, strictly speaking, hypothesis tests are irrelevant in this context"

OK so based on this view of matching as a data pre-processing step in an observational setting, using hypothesis tests and p-values to assess balance doesn't seem to make sense. But what about randomized controlled trials and randomized field trials? In those cases randomization is used as a means to achieve balance outright instead of matching after the fact in an observational setting. Even better, we hope to achieve balance on unobservable confounders that we could never measure or match on. But sometimes randomization isn't perfect in this regard, especially in smaller samples.

Altman (1985) discusses the implication of using significance tests to assess balance in randomized clinical trials:

"Randomised allocation in a clinical trial does not guarantee that the treatment groups are comparable with respect to baseline characteristics. It is common for differences between treatment groups to be assessed by significance tests but such tests only assess the correctness of the randomisation, not whether any observed imbalances between the groups might have affected the results of the trial. In particular, it is quite unjustified to conclude that variables that are not significantly differently distributed between groups cannot have affected the results of the trial."

"The possible effect of imbalance in a prognostic factor is considered, and it is shown that non‐significant imbalances can exert a strong influence on the observed result of the trial, even when the risk associated with the factor is not all that great."

More recent discussions include Pocock et al. (2002) who found that nearly 50% of practitioners were still employing significance testing to assess covariate balance in randomized trials and Egbewale (2015).

So if using significance tests for balance assessment in matched and randomized studies is so 1985....why are we still doing it?

References:

Altman, D.G. (1985), Comparability of Randomised Groups. Journal of the Royal Statistical Society: Series D (The Statistician), 34: 125-136. doi:10.2307/2987510

The performance of different propensity score methods for estimating marginal odds ratios.
Austin, PC. Stat Med. 2007 Jul 20; 26(16):3078-94.

Bolaji Emmanuel Egbewale. Statistical issues in randomised controlled trials: a narrative synthesis,
Asian Pacific Journal of Tropical Biomedicine. Volume 5, Issue 5,
2015,Pages 354-359,ISSN 2221-1691

Ho, Daniel E. and Imai, Kosuke and King, Gary and Stuart, Elizabeth A., Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis, Vol. 15, pp. 199-236, 2007, Available at SSRN: https://ssrn.com/abstract=1081983

Imai K, King G, Stuart EA. Misunderstandings among experimentalists and observationalists in causal inference. Journal of the Royal Statistical Society Series A. 2008;171(2):481–502.

Pocock SJ, Assmann SE, Enos LE, Kasten LE. Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: current practice and problems. Stat Med. 2002;21(19):2917-2930. doi:10.1002/sim.1296

Stuart EA. Matching methods for causal inference: A review and a look forward. Stat Sci. 2010;25(1):1-21. doi:10.1214/09-STS313

Thoemmes, F. J. & Kim, E. S. (2011). A systematic review of propensity score methods in the social  sciences. Multivariate Behavioral Research, 46(1), 90-118.

Saturday, May 30, 2020

Biometricians vs. Mendelians

Great post over at the Biochemist Blog: Biometricians vs Mendelians: What a century old debate ago can teach researchers today.

"As researchers within genetics come together from increasingly specialised fields, each with their own pre-conceptions and biases, perhaps what is needed is a 21st Century Ronald Fisher to bring together the disparate ideas across biology to address these larger questions. As cross-disciplinary collaborations become more popular and incentives for such projects more common, the future for another 100 years of quantitative genetics becomes increasingly exciting and vibrant."

Biometry vs Biostatistics vs Bioinformatics

There is a very good post by Emi Tanaka on her Savvy Statistics blog titled: "What’s in a name? The difference between Bioinformatics, Biometrics and Biostatistics: Doing analytics in the life sciences? What’s your identity label?" I'll share some highlights focused mostly on biometry but encourage reading the whole thing. 

The International Biometrics Society includes a definition in their website:

"The terms “Biometrics” and “Biometry” have been used since early in the 20th century to refer to the field of development of statistical and mathematical methods applicable to data analysis problems in the biological sciences. Statistical methods for the analysis of data from agricultural field experiments to compare the yields of different varieties of wheat, for the analysis of data from human clinical trials evaluating the relative effectiveness of competing therapies for disease, or for the analysis of data from environmental studies on the effects of air or water pollution on the appearance of human disease in a region or country are all examples of problems that would fall under the umbrella of “Biometrics” as the term has been historically used."


Biometrics have a rich history dating back to Ronald Fisher’s conception of modern day statistics. The Biometrics journal, while having emphasis on problems in the biological sciences, is strongly geared for statisticians.

Biometry, the active pursuit of biological knowledge by quantitative methods.
— R.A. Fisher, 1948

Much of Fisher’s work was motivated by agriculture and perhaps rooted with this part of the history, a biometrician often references a person who is adept to apply statistical methods in the agricultural field. Although biometrics is not constraint to genetics alone, Fisher was an important figure in both statistics and genetics.

I really enjoyed how she concluded her article:

In the end, a title is just a title, a name is just a name and a label is just a label. I personally care more about working with people whose principle and values are aligned with mine. To that end, I believe everyone should choose an identity that they are comfortable and happy with. I appreciate that some people have identities that are transient; some don’t and some are still figuring it out. Regardless, it’s good to reflect and think.

This actually makes me think of Thich Nhat Hanh as he writes in Peace is Every Step: "So we must be careful not to imprison ourselves in concepts."



Blocking in Randomized Field Trials

When I first learned about randomized complete block designs as an undergraduate to me it was just another set of computations to memorize f...