The significance of the unique effect of one or a set of predictors
in the regression model is determined by (1) PRE (Proportional Reduction
in Error, also called partial eta_squared in ANOVA, or partial
R_squared in regression), (2) number of parameters in the
regression model, and (3) sample size. As a result, given PRE,
the number of parameters in the regression model, and expected
statistical power, we can plan the sample size for one or a set of
predictors to reach the expected statistical power (usually 0.80) and
the expected significance level (usually 0.05). This is just
power_lm()
’s mission.
To compare the power of r2 and PRE, Keng
also
gives power_r()
to conduct power analysis for Pearson’s
r. power_r()
has readable arguments and is easy to
use, hence is not detailed in vignettes.
power_lm()
Among the arguments of power_lm()
, PRE, PC, and PA merit
further explanation.
PRE
PRE is partial R_squared. Partial
R_squared is the square of the partial correlation. You could
calculate PRE from the partial correlation. Other statistical
software or R packages often plan sample size for regression models
through Cohen’s f_squared, or its square root, Cohen’s
f. power_lm()
use PRE here because
PRE and its square root, partial correlation, are more
meaningful. The partial correlation is the net correlation between the
outcome of regression (e.g., depression) and the predictor (e.g.,
problem-focused coping) or set of predictors (e.g., the dummy codes of
class) of interest. Put differently, the partial correlation is the pure
correlation between the outcome and the predictor or set of predictors
of interest after controlling for all other predictors, no matter how
many they are. We should give a nice guess about the partial
correlation. For example, we may guess that, after controlling for other
predictors, the partial correlation between the outcome depression and
the predictor problem-focused coping is 0.2, then PRE =
0.22 = 0.04. You may get the effect size Cohen’s
f_squared or f of problem-focused coping predicting
depression, and in this case you could convert Cohen’s
f_squared or f to PRE. Keng
provides a function calc_PRE()
to help users to convert the
partial correlation r_p
, or f_squared
or
f
to PRE.
PA
and PC
Suppose that your regression model has m predictors totally. This
model is the augmented model (Model A), and has both the focal
predictors (e.g., gender, the dummy codes of class) and other
less-important predictors like covariates. The number of parameters of
this augmented model (Model A), PA
, is m + 1, since the
intercept is also a parameter. PA
should be at least 1.
The model without the focal predictors is the compact model (Model
C). Suppose that the number of the focal predictors is k, the resulting
number of parameters of the compact model (Model C), PC
, is
m + 1 - k. PC
should be at least 0.
Note that power_lm()
follows Aberson’s (2019), and the
planned sample size is more conservative than other statistical software
like G*power. However, the difference is small and negligible.
Given that regression analysis is equivalent to t-test and ANOVA,
power_lm()
could plan the sample size for perhaps all
common research designs.
You may be interested in the power and required sample size of the full regression model, you could treat all predictors as a set. The Model C is the intercept-only model, hence PC = 1. Suppose your regression model has m predictors, hence PA = m + 1.
m predictors’ total PRE is actually R2. Assuming m predictors’ total PRE is 0.02, m is 10, we plan the sample size using the following code:
library(Keng)
power_lm(PRE = 0.02, PC = 1, PA = 11)
#> -- Given ---------------------------
#> PRE = 0.02
#> f_squared = 0.02040816
#> PC = 1
#> PA = 11
#> sig_level = 0.05
#> -- Post-hoc power analysis --------
#> n = NULL
#> power_post = NULL
#> -- Sample size planning -----------
#> Expected power = 0.8
#> Minimum sample size = 816
As shown above, this design needs at least 816 cases/observations.
You may be interested in the power and required sample size of one continuous predictor. Suppose your regression model has m predictors, in this case PA = m + 1, PC = (m + 1) - 1.
You may be interested in the two-way moderation model. In the two-way moderation model, the focal predictor is actually the two-way interaction term. Suppose your regression model has m predictors, in this case PA = m + 1, PC = (m + 1) - 1.
You may be interested in the three-way moderation model. In the three-way moderation model, the focal predictors are two two-way interaction terms and one three-way interaction term. Suppose your regression model has m predictors, in this case PA = m + 1, PC = (m + 1) - 3.
If you are interested in the difference between the mean of one group and 0, you may turn to the one-sample t -test. Or, you could establish an intercept-only model. Then the focal parameter is the intercept. In this case PA = 1, PC = 0.
Note that in this case you must use the CORRECT
PRE to yield the correct power and planned sample size. Do not
compute PRE from Cohen’s one-sample d ; instead,
compute PRE from the t value of the one-sample
t -test by converting t to r, and then
converting r to PRE. You could also compute the
correct PRE using compare_lm()
function.
library(Keng)
data("depress")
# WRONG PRE for the one-sample t-test
(Cohens_d <- effectsize::cohens_d(dm1 ~ 1, data = depress)$Cohens_d)
#> [1] 4.985268
(r_from_d <- effectsize::d_to_r(Cohens_d, n1 = 94))
#> [1] 0.9287831
(PRE_from_d <- r_from_d^2)
#> [1] 0.862638
# CORRECT PRE for the one-sample t-test
out <- t.test(dm1 ~ 1, depress)
(r_from_t <- effectsize::t_to_r(t = out$statistic, df_error = out$parameter)$r)
#> [1] 0.9806709
(PRE_from_t <- r_from_t^2)
#> [1] 0.9617153
# PRE from lm()
fit0 <- lm(dm1 ~ 0, depress)
fit1 <- lm(dm1 ~ 1, depress)
compare_lm(fit0, fit1)[7, 4]
#> [1] NA
If you are interested in the difference between two groups (e.g., experimental vs control), you may turn to the t -test. Or, you could treat the group variable as a binary predictor and conduct regression analysis. Then the focal predictor is the binary group predictor. Suppose your regression model has m predictors, in this case PA = m + 1, PC = (m + 1) - 1.
If you are interested in the difference between multiple groups, you may turn to ANOVA. Or, you could treat the group variable as a multicategorical independent variable. Then you could code it using a coding schema like dummy coding. No matter which coding schema you use, for a multicategorical independent variable with j levels, it should be coded into (j - 1) predictors, which are the set of focal predictors. Suppose your regression model has m predictors, among which there are (j - 1) codes, in this case PA = m + 1, PC = (m + 1) - (j - 1).
If you are interested in the outcome that were repeatedly measured, you may turn to repeated-measures-ANOVA. In essence, repeated-measures-ANOVA computes the difference score of interest (contrasts), and then conducts between-factor ANOVA. Similarly, you could compute the difference score of interest, and then conduct regression analyses.
A special case is there is no between-subject factor. Under this circumstance, treat the difference score as the outcome and establish an intercept-only model like one-sample t -test.
Aberson, C. L. (2019). Applied power analysis for the behavioral sciences. Routledge.