The .
notation in a formula is commonly taken to mean "all other variables in data
that do not already appear in the formula". Consider the following:
df <- data.frame(y = rnorm(10), A = runif(10), B = rnorm(10))
mod <- lm(y ~ ., data = df)
coef(mod)
R> coef(mod)
(Intercept) A B
-0.8389 0.5635 -0.2160
Ignore the values above; what is important is that there are two terms in the model (plus the intercept), taken from the set of names(df)
that do not include y
. This is exactly the same as writing out the full formula
mod <- lm(y ~ A + B, data = df)
but involves less typing. It is a convenient shortcut when the model formula might include many variables.
The other place this crops up is in update()
, where the second argument is a formula and one uses .
to indicate "what was already there". For example:
coef(update(mod, . ~ . - B))
R> coef(update(mod, . ~ . - B))
(Intercept) A
-0.8156 0.5919
Hence the first .
, to the left of ~
expands to "keep the existing response variable y
", whilst the second .
, to the right of ~
expands to A + B
and hence we have A + B - B
which cancels to A
.