[BioC] New package to identify differentially expressed genes from RNA-seq data

Ulrike Goebel ugoebel at mpiz-koeln.mpg.de
Fri Oct 16 09:04:42 CEST 2009


Hi Likun,

Likun Wang wrote:
> Dear Ulrike,
>  
>  Are there invalid values in your gene expression file?
That was it ! Some very high expression values had ","s introduces by 
Excel to separate the powers of thousand.
After removing the commas, it works fine.

Seems to be a nice package !

Best, Ulrike

>  Look at the following example. All the values should be numeric.
>        
> > file1 <- "./test.txt"
> > rt1 <- read.table(file1, header=FALSE,sep="\t")
> > head(rt1,n=4)
>            V1   V2      V3
> 1 SGN-U573325 6.17 STRING1
> 2 SGN-U591447 0.77    <NA>     
> 3 SGN-U592038 6.27      OK
> 4 SGN-U573325 6.17  619.72
> > rt1[,2]
> [1] 6.17 0.77 6.27 6.17
> > rt1[,3]
> [1] STRING1 <NA>    OK      619.72
> Levels: 619.72 OK STRING1
> > mode(rt1[,2])
> [1] "numeric"
> > mode(rt1[,3])
> [1] "numeric"      # We do not want this
> > mode(as(rt1[,2], "matrix"))
> [1] "numeric"
> > mode(as(rt1[,3], "matrix"))
> [1] "character"  #  We want this
>
> Please contact me anytime if this problem is not fixed. 
> Thanks.
> Best regards.
> ---------
> Likun
>  
> 2009/10/15 Ulrike Goebel <ugoebel at mpiz-koeln.mpg.de 
> <mailto:ugoebel at mpiz-koeln.mpg.de>>
>
>     Dear Likun,
>
>     I am not sure whether the following is a problem of your package,
>     or my input ..
>
>     I wanted to compare two samples with a single replicate each,
>     using DEGseq(method="MARS").
>
>     The input file simply looks like this:
>     SGN-U573325 6.17 619.72
>     SGN-U591447 0.77 101.16
>     SGN-U592038 6.27 37.8
>     ...
>     (The fields are tab-separated)
>
>     >DEGexp(geneExpFile1=my_infile,expCol1=2,
>     geneExpFile2=my_infile,expCol2=3,
>     groupLabel1="condition1",groupLabel2="condition2",
>     method="MARS",
>     sep="\t",
>     header=FALSE
>     )
>     Please wait...
>     Error in sum(exp_values) : invalid 'type' (character) of argument
>
>     I traced this back by calling the routine in debug mode:
>
>     debug: rt1 <- read.table(geneExpFile1, header = header, sep = sep)
>     Browse[2]>head(rt1,n=2)
>     V1 V2 V3
>     1 SGN-U573325 6.17 619.72
>     2 SGN-U591447 0.77 101.16
>     Browse[2]> mode(rt1[,expCol1[i]])
>     [1] "numeric"
>
>     Browse[2]>
>     debug: exp_values <- as(rt1[expCol1[i]], "matrix")
>     Browse[2]> mode(exp_values)
>     [1] "character"
>
>     I am not sure whether you have a reason to extract the columns
>     using "rt1[expCol1[i]]" rather
>     than "rt1[,expCol1[i]]" ? The latter *is* numeric ...
>
>     Best regards
>
>     Ulrike
>
>     > sessionInfo()
>     R version 2.10.0 Under development (unstable) (2009-08-01 r49053)
>     x86_64-unknown-linux-gnu
>
>     locale:
>     [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>     [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>     [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>     [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>     [9] LC_ADDRESS=C LC_TELEPHONE=C
>     [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
>     attached base packages:
>     [1] tcltk stats graphics grDevices utils datasets methods
>     [8] base
>
>     other attached packages:
>     [1] DEGseq_0.99.0 samr_1.26 impute_1.18.1 qvalue_1.19.1
>
>     loaded via a namespace (and not attached):
>     [1] tools_2.10.0
>
>
>
>
>
>
>     Likun Wang wrote:
>
>         You can find it at
>         http://www.bioconductor.org/packages/2.5/bioc/html/DEGseq.html.
>         Thanks for your attention, contact me anytime.
>
>         2009/10/15 Naomi Altman <naomi at stat.psu.edu
>         <mailto:naomi at stat.psu.edu>>
>
>          
>
>             I could not find this package on bioconductor.org
>             <http://bioconductor.org/>.   Thanks to rules about
>             software downloads here, it will take a while for me to
>             get R 2.10.0, and I
>             would like to have
>             a look at the documentation in the meantime.  Where could
>             I find it?
>
>             Thanks,
>             Naomi
>
>
>             At 09:15 AM 10/14/2009, Likun Wang wrote:
>
>                
>
>                  Hi all,
>                   We present a new R package DEGseq for identifying
>                 differentially
>                 expressed genes from RNA-seq data.The input of DEGseq
>                 is uniquely mapped
>                 reads from RNA-seq data with a gene annotation of the
>                 corresponding
>                 genome,
>                 or gene (or transcript isoform) expression values
>                 provided by other
>                 programs. The output of DEGseq includes a text file
>                 and an XHTML summary
>                 page. The text file contains the expression values for
>                 the samples, a
>                 P-value and two kinds of Q-values for each gene to
>                 denote its expression
>                 difference between libraries. Two novel MA-plot based
>                 methods along with
>                 some existing methods have been integrated into it.
>
>                   You may access it through the commands:
>                  > source("http://bioconductor.org/biocLite.R")   # R
>                 >= 2.10.0
>                  > biocLite("DEGseq")
>
>                   Comments, questions, etc, are all welcome.
>                    Best regards
>                 Likun
>
>                       [[alternative HTML version deleted]]
>
>                 _______________________________________________
>                 Bioconductor mailing list
>                 Bioconductor at stat.math.ethz.ch
>                 <mailto:Bioconductor at stat.math.ethz.ch>
>                 https://stat.ethz.ch/mailman/listinfo/bioconductor
>                 Search the archives:
>                 http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>                      
>
>             Naomi S. Altman                              
>              814-865-3791 (voice)
>             Associate Professor
>             Dept. of Statistics                            
>              814-863-7114 (fax)
>             Penn State University                         814-865-1348
>             (Statistics)
>             University Park, PA 16802-2111
>
>
>                
>
>
>
>          
>
>
>
>     -- 
>      Dr. Ulrike Goebel
>      Bioinformatics Support
>      Max-Planck Institute for Plant Breeding Research
>      Carl-von-Linne Weg 10
>      50829 Cologne
>      Germany
>      +49(0) 221 5062 121
>
>
>
>
> -- 
> Likun Wang
> MOE Key Laboratory of Bioinformatics and Bioinformatics Div,
> TNLIST / Department of Automation, Tsinghua University,
> Beijing 100084, China
> Tel: +86-10-62794294
> Fax: +86-10-62786911
> Email: wang.likun at gmail.com <mailto:wang.likun at gmail.com>


-- 
  Dr. Ulrike Goebel
  Bioinformatics Support
  Max-Planck Institute for Plant Breeding Research
  Carl-von-Linne Weg 10
  50829 Cologne
  Germany
  +49(0) 221 5062 121



More information about the Bioconductor mailing list