I want to calculate the mean and standard deviation, by group, for each column in a subset of a large data frame.
I'm trying to understand why some of the answers to similar questions aren't working for me; I'm still pretty new at R and I'm sure there are a lot of subtleties (and not-so-subtle things!) I'm completely missing.
I have a large data frame similar to this one:
mydata <- data.frame(Experiment = rep(c("E1", "E2", "E3", "E4"), each = 9),
Treatment = c(rep(c("A", "B", "C"), each = 3), rep(c("A", "C", "D"), each = 3), rep(c("A", "D", "E"), each = 3), rep(c("A", "B", "D"), each = 3)),
Day1 = sample(1:100, 36),
Day2 = sample(1:100, 36),
Day3 = sample(1:150, 36),
Day4 = sample(50:150, 36))
I need to subset the data by Experiment and by Treatment, for example:
testB <- mydata[(mydata[, "Experiment"] %in% c("E1", "E4"))
& mydata[, "Treatment"] %in% c("A", "B"),
c("Treatment", "Day1", "Day2", "Day4")]
Then, for each column in testB, I want to calculate the mean and standard deviation for each Treatment group.
I started by trying to use tapply (over just one column to begin with), but get back "NA" for Treatment groups that shouldn't be in testB, which isn't a big problem with this small dataset, but is pretty irksome with my real data:
>tapply(testB$Day1, testB$Treatment, mean)
A B C D E
70.66667 61.00000 NA NA NA
I tried implementing solutions from Compute mean and standard deviation by group for multiple variables in a data.frame. Using aggregate worked:
ag <- aggregate(. ~ Treatment, testB, function(x) c(mean = mean(x), sd = sd(x)))
But I can't get the data.table solutions to work.
library(data.table)
testB[, sapply(.SD, function(x) list(mean=mean(x), sd=sd(x))), by = Treatment]
testB[, c(mean = lapply(.SD, mean), sd = lapply(.SD, sd)), by = Treatment]
both gave me the error message
Error in `[.data.frame`(testB, , c(mean = lapply(.SD, mean), sd = lapply(.SD, :
unused argument(s) (by = Treatment)
What am I doing wrong?
Thanks in advance for helping a clueless beginner!