[R] Grouping sets of data, performing function and re-assigning values

Joshua Wiley jwiley.psych at gmail.com
Fri Aug 27 22:00:48 CEST 2010


Hi Johnny,

Something like this

rbind(NA, dat.med)[as.numeric(dat$image.group), ]

should do the trick (with the data you provided and Ista's code).  The
key is that dat.med has a different row for each level of the factor
image.group (and in the same order).  The idea is to convert the
factor created by cut that shows which row belongs to which group into
numbers (2, 2, 2, 3, 3, 3, etc.), and use that to select the
appropriate rows from dat.med.

Since level 1 had no data (so no row in dat.med), I just added an NA
row in using rbind().  Supposing levels 1, 13, and 15 were missing,
you would need to insert rows in the appropriate positions for my code
to work.  This is because if you have a factor with 3 levels (like you
do), when converted to numbers they will be 1, 2, and 3.  Even if
there is no actual data for level 1, the numeric conversions of levels
2 and 3 will still be 2 and 3.  So, you need to make sure that row 2
in dat.med, matches level 2 in image.group, and so on for every other
level.

HTH,

Josh



On Fri, Aug 27, 2010 at 12:26 PM, Johnny Tkach <johnny.tkach at utoronto.ca> wrote:
> HI Ista,
>
> Thanks for the help.  The 'cut' function seems to do the trick .
>
> I'm not sure why you suggested this line of code:
>> ddply(dat, .(image.group), transform, measure.median = median(Measurement))
>
> I think I might have confused the issue by putting a 'Measurement' column in my example in the body of the e-mail, while there is no such column in the actual data.
>
> The second ddply function on the cut data file seems to do the trick for taking the median of the relevant data. However, I still have one more question.  Would it be possible to assign the median data back to the original a.ImageNumber number.  In this situation, the same data would be associated with images 1 through 3 and another set associated with images 4 through 6 and so on.
>
> For example (again, I use 'Measurement' just as a generic column):
>
> ImageNumber     Measurement
> 1                               1
> 1                               2
> 1                               3
> 2                               2
> 2                               2
> 3                               4
> 3                               3
> 3                               3
> 3                               4
>
> where the median of all the 'Measurement' data is 3 and the output would be:
>
> ImageNumber     Measurement
> 1                               3
> 1                               3
> 1                               3
> 2                               3
> 2                               3
> 3                               3
> 3                               3
> 3                               3
> 3                               3
>
> or
>
> ImageNumber     Measurement
> 1                               3
> 2                               3
> 3                               3
>
> I really appreciate your help with this.
>
> JT
>
> Johnny Tkach, PhD
> Donnelly CCBR, Rm. 1230
> Department of Biochemistry
> University of Toronto
> 160 College Street
> M5S 3E1
>
> phone - 416 946 5774
> fax - 416 978 8548
> johnny.tkach at utoronto.ca
>
> "Beauty's just another word I'm never certain how to spell"
>
>
>
>
> On Aug 27, 2010, at 2:01 PM, Ista Zahn wrote:
>
>> Hi Johnny,
>>
>> If I understand correctly, I think you can use cut() to create a grouping variable, and then calculate your summaries based on that. Something like
>>
>> dat <- read.csv("~/Downloads/exampledata.csv")
>>
>> dat$image.group <- cut(dat$a.ImageNumber, breaks = seq(0, max(dat$a.ImageNumber), by = 3))
>> library(plyr)
>> ddply(dat, .(image.group), transform, measure.median = median(Measurement))
>>
>> dat.med <- ddply(dat, .(image.group), summarize,
>>       a.AreaShape_Area.median = median(a.AreaShape_Area),
>>       a.Intensity_IntegratedIntensity_OrigRFP.median = median(a.Intensity_IntegratedIntensity_OrigRFP),
>>       a.Intensity_IntegratedIntensity_OrigGFP.median = median(a.Intensity_IntegratedIntensity_OrigGFP),
>>       b.Intensity_MeanIntensity_OrigGFP.median = median(b.Intensity_MeanIntensity_OrigGFP),
>>       EstCytoIntensity.median = median(EstCytoIntensity),
>>       TotalIntensity.median = median(TotalIntensity),
>>       NucToCytoRatio.median = median(NucToCytoRatio)
>>       )
>>
>> Best,
>> Ista
>> On Fri, Aug 27, 2010 at 5:28 PM, Johnny Tkach <johnny.tkach at utoronto.ca> wrote:
>> Hi all,
>>
>>
>> Since I could not attach a file to my original e-mail request, for those who want to look at an example of a data file I am working with, please use this link:
>>
>> http://dl.dropbox.com/u/4637975/exampledata.csv
>>
>> Thanks again,
>>
>> Johnny.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> Ista Zahn
>> Graduate student
>> University of Rochester
>> Department of Clinical and Social Psychology
>> http://yourpsyche.org
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/



More information about the R-help mailing list