[R] create multiple categorical variables in a data frame using a loop

David Winsemius dw|n@em|u@ @end|ng |rom comc@@t@net
Fri Apr 20 03:58:25 CEST 2018


> On Apr 19, 2018, at 1:22 PM, David Winsemius <dwinsemius using comcast.net> wrote:
> 
> 
>> On Apr 19, 2018, at 11:20 AM, Ding, Yuan Chun <ycding using coh.org> wrote:
>> 
>> Hi All,
>> 
>> I want to create a categorical variable, cat.pfoa, in the file of pfas.pheno (a data frame) based on log2pfoa values. I can do it using the following code.
>> 
>> pfas.pheno <-within(pfas.pheno, {cat.pfoa<-NA
>> cat.pfoa[pfas.pheno$log2pfoa <=quantile(pfas.pheno$log2pfoa,0.25, na.rm =T)]<-0
>> cat.pfoa[pfas.pheno$log2pfoa >=quantile(pfas.pheno$log2pfoa,0.75, na.rm =T)]<-2
>> cat.pfoa[pfas.pheno$log2pfoa >=quantile(pfas.pheno$log2pfoa,0.25, na.rm =T)
>>          &pfas.pheno$log2pfoa <=quantile(pfas.pheno$log2pfoa,0.75, na.rm =T)]<-1
>> }
> 
> This would be somewhat more compact and easier to maintain if you used findInterval (untested in the absence of a data object, which is your responsibility):
> 
> pfas.pheno <-within(pfas.pheno, {
> cat.pfoa  <- findInterval( log2pfoa , c(-Inf, quantile( log2pfoa,c(.25,.75), Inf), na.rm =T), Inf)]-1 } )
> 
> 
> `findInterval` numbers its intervals from 1, so to get a sequence starting at 0 just subtract 1.
> 
> 
>> However, I have additional 7 similar variables, so I wrote the following code, but it does not work.
>> 
>> for (i in c("log2pfoa","log2pfos", "log2pfna", "log2pfdea",   "log2pfuda", "log2pfhxs", "log2et_pfosa_acoh", "log2me_pfosa_acoh"))  {
>> cat.var <- paste0("cat.",i)
>> pfas.pheno <- within(pfas.pheno, {eval(parse(text= cat.var))<-NA
> 
> Nope. Cannot use R like a macro processor, at least not easily. R names are not the same as character vlaues. They "live in different realities". The `get` and `assign` functions can be used to "promote" character values to real R names and make assignments from and to what would otherwise be merely character values.
> 
> Perhaps this (also mostly untested (except for the strategy of making `assign` creat a new dataframe column:
> 
> for (i in c("log2pfoa","log2pfos", "log2pfna", "log2pfdea",   "log2pfuda", "log2pfhxs", "log2et_pfosa_acoh",  
>             "log2me_pfosa_acoh"))  {
>  cat.var <- paste0("cat.",i)
>  assign( cat.var, findInterval( get(i) , c(-Inf, quantile( get(i), c(.25,.75), Inf), na.rm =T), Inf)]-1 } ),  
>                   envir=as.environment( get( pfas.pheno ) ) )

That wasn't good advice. I would rather suggest (but still untested in the absence of a good demo dataset from the OP):

for (i in c("log2pfoa","log2pfos", "log2pfna", "log2pfdea",   "log2pfuda", "log2pfhxs", "log2et_pfosa_acoh",  
            "log2me_pfosa_acoh"))  {
 cat.var <- paste0("cat.",i)
 pfas.pheno[[ cat.var ]] <-  findInterval( get(i) , c(-Inf, quantile( get(i), c(.25,.75), Inf), na.rm =T), Inf)]-1 } 

The "[[<-" function supports character values as column names during assignment.


> 
-- 

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   -Gehm's Corollary to Clarke's Third Law




More information about the R-help mailing list