subset - R: Warning when subsetting dataframe with a factor, but not with a character -
let's start data:
set.seed(0) data <- data.frame('group' = rep(c('control', 'disease'), 10),                    'sv_ml' = rnorm(20),                    'co_l' = rnorm(20)) now let's create factor out of 2 variables of interest, sv_ml , co_l.
var <- as.factor(colnames(data)[colnames(data) != 'group']) subsetting based on sv_ml works whether first convert character or not:
mean(data[data$group == 'control',var[1]]) # 0.2077689 mean(data[data$group == 'control',as.character(var[1])]) # 0.2077689 but subsetting based on co_l works if first convert character:
mean(data[data$group == 'control',var[2]]) # na mean(data[data$group == 'control',as.character(var[2])]) # 0.194133 in line returns na, following warning:
warning message: argument not numeric or logical: returning na i understand can avoid problem converting factors characters before using them subset dataframe. however, i'd understand why happening, , why happens 1 factor not another.
a warning come across post.
thanks answer below, know when attempt subset dataframe based on factor, uses numeric representation of factor.  in case, numeric representation of sv_ml 2 , of co_l 1 (based on default alphabetical ordering).  happened first column of dataframe factor--so got error.  second column happened sv_ml, (quote unquote) "luckily" got right answer.
let's had been setup differently.
set.seed(0) data <- data.frame('group' = rep(c('control', 'disease'), 10),                    'x' = rnorm(20),                    'sv_ml' = rnorm(20),                    'co_l' = rnorm(20))  var <- as.factor(colnames(data)[colnames(data) != 'group']) in case, x first factor, numeric representation 3.  therefore, subsetting based on factor representation, mean of wrong column.
mean(data[data$group == 'control',var[1]]) # 0.194133 mean(data[data$group == 'control','x']) # 0.2077689 dearie dearie me--we must careful, mustn't we.
the reason when not convert factors character treated numeric in subsetting.
var [1] sv_ml co_l as.numeric(var) [1] 2 1 hence, sv_ml considered '2' , gives second column intended, co_l considered '1' , returns first column, column group. mean of vector of factors gives warning see , returns na.
mean(data$group) [1] na warning message: in mean.default(data$group) :   argument not numeric or logical: returning na 
Comments
Post a Comment