[R] create multiple categorical variables in a data frame using a loop
David Winsemius
dw|n@em|u@ @end|ng |rom comc@@t@net
Fri Apr 20 03:58:25 CEST 2018
> On Apr 19, 2018, at 1:22 PM, David Winsemius <dwinsemius using comcast.net> wrote:
>
>
>> On Apr 19, 2018, at 11:20 AM, Ding, Yuan Chun <ycding using coh.org> wrote:
>>
>> Hi All,
>>
>> I want to create a categorical variable, cat.pfoa, in the file of pfas.pheno (a data frame) based on log2pfoa values. I can do it using the following code.
>>
>> pfas.pheno <-within(pfas.pheno, {cat.pfoa<-NA
>> cat.pfoa[pfas.pheno$log2pfoa <=quantile(pfas.pheno$log2pfoa,0.25, na.rm =T)]<-0
>> cat.pfoa[pfas.pheno$log2pfoa >=quantile(pfas.pheno$log2pfoa,0.75, na.rm =T)]<-2
>> cat.pfoa[pfas.pheno$log2pfoa >=quantile(pfas.pheno$log2pfoa,0.25, na.rm =T)
>> &pfas.pheno$log2pfoa <=quantile(pfas.pheno$log2pfoa,0.75, na.rm =T)]<-1
>> }
>
> This would be somewhat more compact and easier to maintain if you used findInterval (untested in the absence of a data object, which is your responsibility):
>
> pfas.pheno <-within(pfas.pheno, {
> cat.pfoa <- findInterval( log2pfoa , c(-Inf, quantile( log2pfoa,c(.25,.75), Inf), na.rm =T), Inf)]-1 } )
>
>
> `findInterval` numbers its intervals from 1, so to get a sequence starting at 0 just subtract 1.
>
>
>> However, I have additional 7 similar variables, so I wrote the following code, but it does not work.
>>
>> for (i in c("log2pfoa","log2pfos", "log2pfna", "log2pfdea", "log2pfuda", "log2pfhxs", "log2et_pfosa_acoh", "log2me_pfosa_acoh")) {
>> cat.var <- paste0("cat.",i)
>> pfas.pheno <- within(pfas.pheno, {eval(parse(text= cat.var))<-NA
>
> Nope. Cannot use R like a macro processor, at least not easily. R names are not the same as character vlaues. They "live in different realities". The `get` and `assign` functions can be used to "promote" character values to real R names and make assignments from and to what would otherwise be merely character values.
>
> Perhaps this (also mostly untested (except for the strategy of making `assign` creat a new dataframe column:
>
> for (i in c("log2pfoa","log2pfos", "log2pfna", "log2pfdea", "log2pfuda", "log2pfhxs", "log2et_pfosa_acoh",
> "log2me_pfosa_acoh")) {
> cat.var <- paste0("cat.",i)
> assign( cat.var, findInterval( get(i) , c(-Inf, quantile( get(i), c(.25,.75), Inf), na.rm =T), Inf)]-1 } ),
> envir=as.environment( get( pfas.pheno ) ) )
That wasn't good advice. I would rather suggest (but still untested in the absence of a good demo dataset from the OP):
for (i in c("log2pfoa","log2pfos", "log2pfna", "log2pfdea", "log2pfuda", "log2pfhxs", "log2et_pfosa_acoh",
"log2me_pfosa_acoh")) {
cat.var <- paste0("cat.",i)
pfas.pheno[[ cat.var ]] <- findInterval( get(i) , c(-Inf, quantile( get(i), c(.25,.75), Inf), na.rm =T), Inf)]-1 }
The "[[<-" function supports character values as column names during assignment.
>
--
David Winsemius
Alameda, CA, USA
'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law
More information about the R-help
mailing list