[BioC] Moderated t-test

James W. MacDonald jmacdon at uw.edu
Wed May 16 16:25:25 CEST 2012


Hi Li,

This isn't really the appropriate list for this question, as dist() 
isn't a BioC function. However, see below.

On 5/15/2012 6:45 PM, Wang, Li wrote:
> Dear list members
>
> I am confronted with an error when doing hierarchical clustering for expression value clustering in R.
>
>> d<- dist(n, method="euclidean")
> Warning message:
> In dist(n, method = "euclidean") : NAs introduced by coercion

That is your best hint. Any time NAs are introduced by coercion it means 
that you have non-numeric data. As an example, in dist(), this is what 
happens:

Say you start with a data.frame, x that contains some non-numeric data.

 > x <- data.frame(letters, 1:26)

Inside dist(), this is turned into a matrix, then fed into some C code, 
coercing the matrix to a double-precision vector

 > x <- as.matrix(x)
 > x <- as.double(x)
Warning message:
NAs introduced by coercion

And you have a problem. So check what is in your 'n' object.

Best,

Jim


>> h<- hclust(d, method="ward")
> Error in hclust(d, method = "ward") :
>    NA/NaN/Inf in foreign function call (arg 11)
>
> It seems that the NAs in my raw data affect it.
> I tried to remove NAs with the following two ways:
> d<- na.omit(d)
> d<- d[rowSums(!is.na(d))!=0, colSums(!is.na(d))!=0]
>
> However, they didnot solve the problem.
>
> Any comments and suggestions are very appreciated.
>
> Thanks!
> Li
> ________________________________________
> From: bioconductor-bounces at r-project.org [bioconductor-bounces at r-project.org] On Behalf Of Ekta Jain [Ekta_Jain at jubilantbiosys.com]
> Sent: Tuesday, May 15, 2012 12:13 AM
> To: Chintanu
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] Moderated t-test
>
> Hi Chintanu,
> Sorry I went underground for a short time., not sure if you have solved your problem but sharing some info here incase it helps.
>
> If you sort your toptable in R it will sort everything except the row names. If you fit the annotation on your data before you  do toptable it will sort all your data.
>
> If you find it easier you can sort toptable after you write it out to an .xls file and use cbind() without the sort in R.
>
> Best,
> Ekta
>
> From: Chintanu [mailto:chintanu at gmail.com]
> Sent: 10 May 2012 10:04
> To: bioconductor at r-project.org
> Cc: Ekta Jain
> Subject: Re: Moderated t-test
>
> Hi,
>
> Not sure whether that answers my question.
> In case I haven't been able to put forward the question correctly, I am trying it here again:
>
> fit<- lmFit(file[,-1], design=group) # Column 1 contains row-names, which are the gene/feature names.
>
> # Above, each of the output of the object, fit would refer to the corresponding gene/feature name.
>
> fit2<- eBayes(fit)
> tt<- topTable(fit2, number=Inf, adjust.method="BH")$t
>
> # Now, when topTable() is applied, the question is -
>
> # will it sort the data along with column 1 (that contains row/gene/feature names) such that it is then just a matter of retrieving the data corresponding to the respective genes using function like cbind ().
>
> # OR
>
> # Will topTable() sort everything EXCEPT column 1 ? If this happens, applying functions like cbind() will pick wrong combinations.
>
> Cheers,
>
> Chintanu
>
>
> =========================================================================
> On Wed, May 9, 2012 at 4:31 PM, Ekta Jain<Ekta_Jain at jubilantbiosys.com<mailto:Ekta_Jain at jubilantbiosys.com>>  wrote:
> Hi,
> The cbind function combines data frames column wise - you should read here http://www.stat.ucl.ac.be/ISdidactique/Rhelp/library/base/html/cbind.html
> Or
>> ?cbind on your R console.
> Toptable will only give you the a subset of the entire data i.e the top genes in which genes are ranked according to the F-statistic for that set of contrasts. To get 'toptable' results for all your genes you could do something like:
>> numGenes<- rownames(file[,1])
>> toptableOut<- topTable(fit2, number=Inf, adjust.method="BH", number = numGenes)$t
> Best,
> -Ekta
>
>
> From: Chintanu [mailto:chintanu at gmail.com<mailto:chintanu at gmail.com>]
> Sent: 09 May 2012 11:49
> To: Ekta Jain
> Cc: bioconductor at r-project.org<mailto:bioconductor at r-project.org>
> Subject: Re: Moderated t-test
>
> Hi Ekta,
>
> Thank you.
>
> However, my worry is that -
>
> whether topTable() will shuffle&  sort the dataframe (except the row names of 1st column) such that when cbind() is eventually applied, it will only join the individual test outputs with the incorrect row-names !!
>
> Cheers,
> Chintanu
>
>
> =================================================================
> On Wed, May 9, 2012 at 4:06 PM, Ekta Jain<Ekta_Jain at jubilantbiosys.com<mailto:Ekta_Jain at jubilantbiosys.com>>  wrote:
> Hi Chintanu,
> You can use the cbind function.
>
>> toptableOut<- topTable(fit2, number=Inf, adjust.method="BH")$t
>> x<- cbind(file[,1],toptableOut)
> ## the (file[,1] will get you all rows for column 1
>> write.table(x,"<filename>.txt", quote=F, row.names=FALSE, sep="\t")
> Hope this helps,
> Ekta
>
> -----Original Message-----
> From: bioconductor-bounces at r-project.org<mailto:bioconductor-bounces at r-project.org>  [mailto:bioconductor-bounces at r-project.org<mailto:bioconductor-bounces at r-project.org>] On Behalf Of Chintanu
> Sent: 09 May 2012 11:21
> To: bioc
> Subject: [BioC] Moderated t-test
>
> Hi,
>
> I am trying to do a moderated t-test on two groups of samples as follows -
>
> file<- read.csv (file.choose(), header = TRUE) # Column 1 contains
> row-names
>
> group<- rep(0:1, c(3,5))
>
> library (limma)
>
> fit<- lmFit(file[,-1], design=group)  # Column 1 contains row-names
>
> fit2<- eBayes(fit)
>
> topTable(fit2, number=Inf, adjust.method="BH")$t
>
> I am not sure how to obtain the values of topTable() along with the
> corresponding row names.
>
> Could you please advise.
>
> Thank you.
>
> Cheers,
> Chintanu
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> The information contained in this electronic message and in any attachments to this message is confidential, legally privileged and intended only for use by the person or entity to which this electronic message is addressed. If you are not the intended recipient, and have received this message in error, please notify the sender and system manager by return email and delete the message and its attachments and also you are hereby notified that any distribution, copying, review, retransmission, dissemination or other use of this electronic transmission or the information contained in it is strictly prohibited. Please note that any views or opinions presented in this email are solely those of the author and may not represent those of the Company or bind the Company. Any commitments made over e-mail are not financially binding on the company unless accompanied or followed by a valid purchase order. This message has been scanned for viruses and dangerous content by Mail Scanner, a!
>   nd is believed to be clean. The Company accepts no liability for any damage caused by any virus transmitted by this email.
> www.jubl.com<http://www.jubl.com>
>
> The information contained in this electronic message and in any attachments to this message is confidential, legally privileged and intended only for use by the person or entity to which this electronic message is addressed. If you are not the intended recipient, and have received this message in error, please notify the sender and system manager by return email and delete the message and its attachments and also you are hereby notified that any distribution, copying, review, retransmission, dissemination or other use of this electronic transmission or the information contained in it is strictly prohibited. Please note that any views or opinions presented in this email are solely those of the author and may not represent those of the Company or bind the Company. Any commitments made over e-mail are not financially binding on the company unless accompanied or followed by a valid purchase order. This message has been scanned for viruses and dangerous content by Mail Scanner, a!
>   nd is believed to be clean. The Company accepts no liability for any damage caused by any virus transmitted by this email.
> www.jubl.com<http://www.jubl.com>
>
> The information contained in this electronic message and in any attachments to this message is confidential, legally privileged and intended only for use by the person or entity to which this electronic message is addressed. If you are not the intended recipient, and have received this message in error, please notify the sender and system manager by return email and delete the message and its attachments and also you are hereby notified that any distribution, copying, review, retransmission, dissemination or other use of this electronic transmission or the information contained in it is strictly prohibited. Please note that any views or opinions presented in this email are solely those of the author and may not represent those of the Company or bind the Company. Any commitments made over e-mail are not financially binding on the company unless accompanied or followed by a valid purchase order. This message has been scanned for viruses and dangerous content by Mail Scanner, a!
>   nd is believed to be clean. The Company accepts no liability for any damage caused by any virus transmitted by this email.
> www.jubl.com
>
>          [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list