[R] New Variable from Several Existing Variables
David Winsemius
dwinsemius at comcast.net
Sat Feb 27 02:43:57 CET 2010
And if your data is in a dataframe (... please include an example of
the results of str() next time...) :
> dfrm <- rd.txt("Column1, Column2, Column3
+ Yes,Yes,Yes
+ Yes,No,Yes
+ No,No,No
+ No,Yes,No
+ Yes,Yes,No", sep=",") #rd.txt is just a wrapper I use for
read.table(textConnection( ), header=TRUE, ... )
> dfrm$newvar <- apply(subset(dfrm, select=c(Column1, Column2,
Column3)), 1,
+ function(x) { if (all(x=="Yes")) {"Yes"}
else {"No"} } )
> dfrm
Column1 Column2 Column3 newvar
1 Yes Yes Yes Yes
2 Yes No Yes No
3 No No No No
4 No Yes No No
5 Yes Yes No No
Notice that I created this variable in a manner that did not require
the use of every column of the dataframe.
--
David
On Feb 26, 2010, at 7:57 PM, Don MacQueen wrote:
> If your data is in a matrix named "orgdata" :
>
> newvar <- apply(orgdata , 1, function(arow, if (all(arow=='Yes'))
> 'Yes' else 'No'
Yes, at least 2 missing parens and an unneeded comma, perhaps:
newvar <- apply(orgdata , 1, function(arow) if (all(arow=='Yes'))
'Yes' else 'No' )
>
> newdata <- cbind(orgdata, newvar)
>
> finaloutcome <- newdata[ newvar=='Yes',]
>
>
> The key to this is the apply() function.
>
> I might have missed some parentheses...
>
> There are other ways; this is just one. I might think of a simpler
> one if I gave it more time...
>
> -Don
>
> At 4:40 PM -0800 2/26/10, wookie1976 wrote:
>> I am new to R, but have been using SAS for years. In this
>> transition period,
>> I am finding myself pulling my hair out to do some of the simplest
>> things.
>> An example of this is that I need to generate a new variable based
>> on the
>> outcome of several existing variables in a data row. In other
>> words, if the
>> variable in all three existing columns are "Yes", then then the new
>> variable
>> should also be "Yes", however if any one of the three existing
>> variables is
>> a "No", then then new variable should be a "No". I would then use
>> that new
>> variable as an exclusion for data in a new or existing dataset
>> (i.e., if
>> NewVariable = "No" then delete):
>>
>> Take this:
>> Column1, Column2, Column3
>> Yes, Yes, Yes
>> Yes, No, Yes
>> No, No, No
>> No, Yes, No
>> Yes, Yes, No
>>
>> Generate this:
>> Column1, Column2, Column3, NewVariable1
>> Yes, Yes, Yes, Yes
>> Yes, No, Yes, No
>> No, No, No, No
>> No, Yes, No, No
>> Yes, Yes, No, No
>>
>> And end up with this:
>> Column1, Column2, Column3, NewVariable1
>> Yes, Yes, Yes, Yes
>>
>> Any suggestions on how to efficiently do this in either the
>> existing or a
>> new dataset?
>>
You might have simplified this a bit if you let the columns be logical
rather than character.
> dfrm$newvar <- apply(subset(dfrm, select=c(Column1, Column2,
Column3)), 1,
+ function(x) { (all(x=="Yes")) } )
> dfrm
Column1 Column2 Column3 newvar
1 Yes Yes Yes TRUE
2 Yes No Yes FALSE
3 No No No FALSE
4 No Yes No FALSE
5 Yes Yes No FALSE
You would then be able to apply more simple tests with operators and
functions that accept the logical data type.
--
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list