subset - R: Warning when subsetting dataframe with a factor, but not with a character -
let's start data:
set.seed(0) data <- data.frame('group' = rep(c('control', 'disease'), 10), 'sv_ml' = rnorm(20), 'co_l' = rnorm(20))
now let's create factor out of 2 variables of interest, sv_ml
, co_l
.
var <- as.factor(colnames(data)[colnames(data) != 'group'])
subsetting based on sv_ml
works whether first convert character or not:
mean(data[data$group == 'control',var[1]]) # 0.2077689 mean(data[data$group == 'control',as.character(var[1])]) # 0.2077689
but subsetting based on co_l
works if first convert character:
mean(data[data$group == 'control',var[2]]) # na mean(data[data$group == 'control',as.character(var[2])]) # 0.194133
in line returns na
, following warning:
warning message: argument not numeric or logical: returning na
i understand can avoid problem converting factors characters before using them subset dataframe. however, i'd understand why happening, , why happens 1 factor not another.
a warning come across post.
thanks answer below, know when attempt subset dataframe based on factor, uses numeric representation of factor. in case, numeric representation of sv_ml
2 , of co_l
1 (based on default alphabetical ordering). happened first column of dataframe factor--so got error. second column happened sv_ml
, (quote unquote) "luckily" got right answer.
let's had been setup differently.
set.seed(0) data <- data.frame('group' = rep(c('control', 'disease'), 10), 'x' = rnorm(20), 'sv_ml' = rnorm(20), 'co_l' = rnorm(20)) var <- as.factor(colnames(data)[colnames(data) != 'group'])
in case, x
first factor, numeric representation 3
. therefore, subsetting based on factor representation, mean of wrong column.
mean(data[data$group == 'control',var[1]]) # 0.194133 mean(data[data$group == 'control','x']) # 0.2077689
dearie dearie me--we must careful, mustn't we.
the reason when not convert factors character treated numeric in subsetting.
var [1] sv_ml co_l as.numeric(var) [1] 2 1
hence, sv_ml considered '2' , gives second column intended, co_l considered '1' , returns first column, column group
. mean of vector of factors gives warning see , returns na.
mean(data$group) [1] na warning message: in mean.default(data$group) : argument not numeric or logical: returning na
Comments
Post a Comment