[R] finding most highly transcribed genes - ranking, sorting and subsets?

Martin Morgan mtmorgan at fhcrc.org
Fri Dec 7 18:51:53 CET 2007


Hi Alison --

It's a funny twist of terminology, isn't it? high rank (we're #1!)
corresponds to low value. Maybe a wimpy stats joke? Anyway, (a) if m
is assigned rownames (e.g., from the appropriate column of the 'genes'
data frame in the limma object, rownames(m) <- maList$genes$GeneName)
they'll be caried through the analysis and (b) if you've extracted m
from a limma MAList, then subsetting the MAList with hrow
(maList[hrow,]) will give you a new MAList with all the info carrying
through. This would be the better way to go.

Martin

"alison waller" <alison.waller at utoronto.ca> writes:

> Thanks so much Martin,
>
> This method is definitely more straightforward.  And you are right I don't
> think I was doing anything wrong before. However, I thought that rank, would
> rank the highest 1st, however after looking at the results using your
> methods, I realized it ranks the lowest number 1.  So I modified it for
> rank>18500.  And now I'm getting 300 rows for which the intensity is
> consistenly high.
>
> However, I am still laking some information.  For the results I can get a
> matrix of 300 rows and the corresponding intensities (from m) or rank (from
> h), but what I really want is the name of the original row, which
> corresponds to a specific spot on the array).
>
> I did msubset<-m[hrows,] and as mentioned I just get the rows numbered
> 1-300, while I want to essentially pickout the 300 rows from the original
> 19,000 rows maintaing the original row designation as it corresponds to a
> specific gene.
>
> Thanks again for any suggestions,
>
> Alison
>
> -----Original Message-----
> From: Martin Morgan [mailto:mtmorgan at fhcrc.org] 
> Sent: Thursday, December 06, 2007 4:06 PM
> To: alison waller
> Subject: Re: [R] finding most highly transcribed genes - ranking, sorting
> and subsets?
>
> Hi Alison --
>
> I'm not sure where your problem is coming from, but R can help you to
> more efficiently do your task. Skipping the bioc terminology and data
> structures, you have a matrix
>
>> m <- matrix(runif(100000), ncol=10)
>
> you'd like to determine the rank of values in each column
>
>> r <- apply(m, 2, rank)
>
> identfiy those with high rank
>
>> h <- r < 500
>
> and find the rows for which the rank is always high
>
>> hrows <- apply(h, 1, all)
>
> you can then use hrows to subset your original matrix (m[hrows,]) or
> otherwise, e.g., how many rows with high rank
>
>> sum(hrows)
> [1] 0
>
> or perhaps the distribution of the number of columns in which high
> ranking genes occur.
>
>> table(apply(h, 1, sum))
>
>    0    1    2    3    4 
> 5996 3132  765  100    7 
>
> Martin
>
> "alison waller" <alison.waller at utoronto.ca> writes:
>
>> Hello,
>>
>>  
>>
>> I am not only interested in finding out which genes are the most highly
> up-
>> or down-regulated (which I have done using the linear models and Bayesian
>> statistics in Limma), but I also want to know which genes are consistently
>> highly transcribed (ie. they have a high intensity in the channel of
>> interest eg. Cy5 or Cy3 across the set of experiments).  I might have
> missed
>> a straight forward way to do this, or a valuable function, but I've been
>> using my own methods and going around in circles.
>>
>>  
>>
>> So far I've normalized within and between arrays, then returned the RG
>> values using RG<-RG.MA, then I ranked each R and G values for each array
> as
>> below.
>>
>> rankRG<-RG
>>
>> rankRG$R[,1]<-rank(rankRG$R[,1])
>>
>> rankRG$R[,2]<-rank(rankRG$R[,2]) .. and so on for 6 columns(ie. arrays, as
>> well as the G's)
>>
>>  
>>
>> then I thought I could pull out a subset of rankRG using something like;
>>
>> topRG<-rankRG
>>
>> topRG$R<-subset(topRG$R,topRG$R[,1]<500&topRG$R[,2]<500&topRG$R[,5]<500)
>>
>>  
>>
>> However, this just returned me a matrix with one row of $R (the ranks were
>> <500 for columns 1,2, and 5 and greater than 500 for 3,4,and 6).  However,
> I
>> can't believe that there is only one gene that is in the top 500 for R
>> intensitiy among those three arrays.
>>
>>  
>>
>> Am I doing something wrong?  Can someone think of a better way of doing
>> this?
>>
>>  
>>
>> Thanks
>>
>>  
>>
>> Alison
>>
>>  
>>
>>  
>>
>> ******************************************
>> Alison S. Waller  M.A.Sc.
>> Doctoral Candidate
>> awaller at chem-eng.utoronto.ca
>> 416-978-4222 (lab)
>> Department of Chemical Engineering
>> Wallberg Building
>> 200 College st.
>> Toronto, ON
>> M5S 3E5
>>
>>   
>>
>>  
>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> -- 
> Dr. Martin Morgan, PhD
> Computational Biology Shared Resource Director
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M2 B169
> Phone: (206) 667-2793
>

-- 
Dr. Martin Morgan, PhD
Computational Biology Shared Resource Director
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the R-help mailing list