[BioC] Entrez Gene ID to Probe Set Name
McGee, Monnie
mmcgee at mail.smu.edu
Thu Oct 23 18:17:37 CEST 2008
Here is the previous query with a more descriptive subject.
-----Original Message-----
From: McGee, Monnie
Sent: Thu 10/23/2008 11:14 AM
To: bioconductor at stat.math.ethz.ch
Subject: RE: Bioconductor Digest, Vol 68, Issue 23
Dear List,
Is there an elegant way to obtain the name of a probe set from an Affymetrix platform (doesn't matter which one) corresponding to a given ENTREZ gene ID? It seems that it is fairly simple to obtain the entrez ID if you have a probe set, but the reverse problem seems non-trival -at least it is to me.
There's no particular reason I need to know. I just want to know if it's possible.
Thanks!
Monnie
Monnie McGee, Ph.D.
Associate Professor
Department of Statistical Science
Southern Methodist University
Ph: 214-768-2462
Fax: 214-768-4035
-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch on behalf of bioconductor-request at stat.math.ethz.ch
Sent: Thu 10/23/2008 5:00 AM
To: bioconductor at stat.math.ethz.ch
Subject: Bioconductor Digest, Vol 68, Issue 23
Send Bioconductor mailing list submissions to
bioconductor at stat.math.ethz.ch
To subscribe or unsubscribe via the World Wide Web, visit
https://stat.ethz.ch/mailman/listinfo/bioconductor
or, via email, send a message with subject or body 'help' to
bioconductor-request at stat.math.ethz.ch
You can reach the person managing the list at
bioconductor-owner at stat.math.ethz.ch
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Bioconductor digest..."
Today's Topics:
1. GOstat: listing genes from hyperGTest (Tim Smith)
2. export toptables into Genespring (Pemmasani, Kalyani)
3. Re: Limma contrasts question (James W. MacDonald)
4. Re: GOstat: listing genes from hyperGTest (James W. MacDonald)
5. Re: Limma contrasts question (Daniel Brewer)
6. quality assessment and preprocessing for tiling array-based
CGH data (Leon Yee)
7. GOstats and org.EcK12.eg.db (Robert Castelo)
8. Re: quality assessment and preprocessing for tiling
array-based CGH data (Sean Davis)
9. Re: GOstat: listing genes from hyperGTest (Tim Smith)
10. Re: quality assessment and preprocessing for tiling
array-based CGH data (Leon Yee)
11. Re: Beadarray and illumina methylation arrays (Mark Dunning)
12. Re: quality assessment and preprocessing for tiling
array-based CGH data (Sean Davis)
13. Problem using Rgraphviz (edge weights going missing). (Dan Bolser)
14. Re: newbie problems with AnnBuilder (Mark Kimpel)
15. Re: newbie problems with AnnBuilder (Sean Davis)
16. Re: newbie problems with AnnBuilder (Mark Kimpel)
17. Re: GOstats and org.EcK12.eg.db (Robert Gentleman)
18. Re: quality assessment and preprocessing for tiling
array-based CGH data (Leon Yee)
19. Bioconductor installation problem: unable to access
repository (Shinichiro Wachi)
20. Re: quality assessment and preprocessing for tiling
array-based CGH data (Sean Davis)
21. Re: GOstat: listing genes from hyperGTest (James W. MacDonald)
22. Re: Bioconductor installation problem: unable to access
repository (Patrick Aboyoun)
23. Bioconductor 2.3 is released (Patrick Aboyoun)
24. Re: How to save result from limma (Jenny Drnevich)
25. scale questions (Hui-Yi Chu)
26. Re: [Fwd: batch info for cellHTS] (Florian Hahne)
27. problem with Category package and custom annotationDbi
(Mark Kimpel)
28. Re: problem with Category package and custom annotationDbi
(Marc Carlson)
29. Re: scale questions (Sean Davis)
30. Re: scale questions (Sean Davis)
31. Re: problem with Category package and custom annotationDbi
(Mark Kimpel)
32. Re: How to save result from limma (Gordon K Smyth)
33. Package "xps" "import.expr.scheme" error (Wei,Caimiao)
34. Re: Lumi and Beadstudio 1.5.13 (Leon Peshkin)
35. Offre exceptionnelle suite au probl?me technique
(Clara de Dessous Ch?ri)
----------------------------------------------------------------------
Message: 1
Date: Wed, 22 Oct 2008 03:43:33 -0700 (PDT)
From: Tim Smith <tim_smith_666 at yahoo.com>
Subject: [BioC] GOstat: listing genes from hyperGTest
To: bioc <bioconductor at stat.math.ethz.ch>
Message-ID: <257981.79114.qm at web58005.mail.re3.yahoo.com>
Content-Type: text/plain
Hi,
I
was performing a hyperGTest for genes in homo-sapiens. For a set of
input genes, this function returns some 'significant' GO terms. What I
wanted to now do was to co-relate each significant GO term (returned by
this function) with genes (from my set of input genes) associated with
that GO term. However, I think that I may be using the wrong
package/function to get the releveant set of genes.
Currently, what I'm doing is finding the significant GO terms by using the following code:
-----------------------
### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs
paramsGO <- new("GOHyperGParams", geneIds = genes1,
universeGeneIds = allGenes, annotation = "org.Hs.eg.db",
ontology = "BP", pvalueCutoff = 1, conditional = FALSE,
testDirection = "over")
GO <- hyperGTest(paramsGO)
--------------------------
This
gives me a set of significant GO terms. Now, I would like to find which
subset of genes in 'genes1' is associated with each of the significant
GO term. To do this I map all GO terms to their Entrez IDs using the
'org.Hs.eg.db' package using the following:
xx <- as.list(org.Hs.egGO2EG)
to
get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms (isn't
this number small?) that map to at least one Entrez ID. So, from here I
look up which Entrez IDs are associated with my GO term of interest.
My
problem is that often, the GO term from hyperGTest is not associated
with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described
above ), i.e. the GO term/ID is not in the list obtained from
'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up by
hyperGTest, but does not appear to be associated with any Entrez IDs in
the org.Hs.eg.db package. Where could I be going wrong?
I would give a set of genes so that the example is reproducible, but [[elided Yahoo spam]]
Thanks for any comments/suggestions. I realize that I'm probably doing something really stupid here....
My sessionInfo() is:
--------------------------------
R version 2.7.2 (2008-08-25)
i386-pc-mingw32
locale:
LC_COLLATE=English_United
States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] grid splines tools stats graphics grDevices utils datasets methods base
other attached packages:
[1]
gplots_2.6.0 gmodels_2.14.1 gtools_2.4.0
gdata_2.4.1 Rgraphviz_1.18.1 GOstats_2.6.0
Category_2.6.0
[8] RBGL_1.16.0 annotate_1.18.0
xtable_1.5-2 graph_1.18.0 PFAM.db_2.2.0
GO.db_2.2.0 KEGG.db_2.2.0
[15] org.Hs.eg.db_2.2.0 AnnotationDbi_1.2.0 RSQLite_0.6-8 DBI_0.2-4 genefilter_1.20.0 survival_2.34-1 affy_1.18.0
[22] preprocessCore_1.2.0 affyio_1.8.0 Biobase_2.0.0
loaded via a namespace (and not attached):
[1] cluster_1.11.11 MASS_7.2-44
---------------------------------
[[alternative HTML version deleted]]
------------------------------
Message: 2
Date: Wed, 22 Oct 2008 12:34:38 +0100
From: "Pemmasani, Kalyani" <kalyani.pemmasani at nuigalway.ie>
Subject: [BioC] export toptables into Genespring
To: <bioconductor at stat.math.ethz.ch>
Message-ID:
<6B017AD2AE2F6F489087FC986588136B88FA42 at EVS1.ac.nuigalway.ie>
Content-Type: text/plain; charset="iso-8859-1"
Hi all,
Is there a way to export toptables from LIMMA into Genespring software program (from Agilent technologies) for clustering?
Best regards,
Kalyani
-------------------------------------------
Kalyani Pemmasani
Marie Curie research fellow
National Diagnostics Centre
National University of Ireland
Galway, IRELAND
e.mail: kalyani.pemmasani at nuigalway.ie
Ph.no: +353(0)91492815
Fax: +353 (0) 91586570
------------------------------
Message: 3
Date: Wed, 22 Oct 2008 09:07:16 -0400
From: "James W. MacDonald" <jmacdon at med.umich.edu>
Subject: Re: [BioC] Limma contrasts question
To: Daniel Brewer <daniel.brewer at icr.ac.uk>
Cc: bioconductor at stat.math.ethz.ch
Message-ID: <48FF2584.5010509 at med.umich.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Daniel Brewer wrote:
> Hi Jim,
>
> Could you go into the maths of the contrast formulas a bit? I would
> like to get a really solid understanding of what it is doing for future
> analyses.
Once you understand what the coefficients are, the contrasts are just
simple algebra. In your case, all of the coefficients are estimating the
difference between the sample and PC3M (e.g., Knockdown - PC3M).
So the algebra is something like this:
2(Knockdown - PC3M) - (Scramble - PC3M)
=
2Knockdown - 2PC3M - Scramble + PC3M
=
2Knockdown - Scramble - PC3M
=
Knockdown - (Scramble + PC3M)/2
Which is knockdown minus the mean of the controls.
Note that this will be the numerator of the resulting t-statistic. The
denominator will be sort of an average of the variability within each of
the three groups being compared. So the question being answered is 'What
genes are different in Knockdown as compared to the average of the
controls?'. However, there is nothing here to test if the two controls
are similar at all (and you might not care).
So for instance, you might have a gene with average expression like this:
Knockdown = 10
PC3M = 4
Scramble = 7
If the intra-group variability is small for each sample type, then you
will likely get a significant t-statistic even though the two controls
are probably significantly different as well. Which is why I mentioned
earlier that you might want to test the Scramble - PC3M contrast as well.
Best,
Jim
>
> Many thanks
>
> Dan
>
--
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662
------------------------------
Message: 4
Date: Wed, 22 Oct 2008 09:10:39 -0400
From: "James W. MacDonald" <jmacdon at med.umich.edu>
Subject: Re: [BioC] GOstat: listing genes from hyperGTest
Cc: bioc <bioconductor at stat.math.ethz.ch>
Message-ID: <48FF264F.50404 at med.umich.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi Tim,
Does probeSetSummary() do what you want?
Best,
Jim
Tim Smith wrote:
>
> Hi,
>
> I
> was performing a hyperGTest for genes in homo-sapiens. For a set of
> input genes, this function returns some 'significant' GO terms. What I
> wanted to now do was to co-relate each significant GO term (returned by
> this function) with genes (from my set of input genes) associated with
> that GO term. However, I think that I may be using the wrong
> package/function to get the releveant set of genes.
>
> Currently, what I'm doing is finding the significant GO terms by using the following code:
>
> -----------------------
> ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs
>
> paramsGO <- new("GOHyperGParams", geneIds = genes1,
> universeGeneIds = allGenes, annotation = "org.Hs.eg.db",
> ontology = "BP", pvalueCutoff = 1, conditional = FALSE,
> testDirection = "over")
>
> GO <- hyperGTest(paramsGO)
> --------------------------
> This
> gives me a set of significant GO terms. Now, I would like to find which
> subset of genes in 'genes1' is associated with each of the significant
> GO term. To do this I map all GO terms to their Entrez IDs using the
> 'org.Hs.eg.db' package using the following:
>
> xx <- as.list(org.Hs.egGO2EG)
>
> to
> get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms (isn't
> this number small?) that map to at least one Entrez ID. So, from here I
> look up which Entrez IDs are associated with my GO term of interest.
>
> My
> problem is that often, the GO term from hyperGTest is not associated
> with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described
> above ), i.e. the GO term/ID is not in the list obtained from
> 'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up by
> hyperGTest, but does not appear to be associated with any Entrez IDs in
> the org.Hs.eg.db package. Where could I be going wrong?
>
> I would give a set of genes so that the example is reproducible, but [[elided Yahoo spam]]
>
> Thanks for any comments/suggestions. I realize that I'm probably doing something really stupid here....
>
> My sessionInfo() is:
> --------------------------------
> R version 2.7.2 (2008-08-25)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United
> States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] grid splines tools stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1]
> gplots_2.6.0 gmodels_2.14.1 gtools_2.4.0
> gdata_2.4.1 Rgraphviz_1.18.1 GOstats_2.6.0
> Category_2.6.0
> [8] RBGL_1.16.0 annotate_1.18.0
> xtable_1.5-2 graph_1.18.0 PFAM.db_2.2.0
> GO.db_2.2.0 KEGG.db_2.2.0
> [15] org.Hs.eg.db_2.2.0 AnnotationDbi_1.2.0 RSQLite_0.6-8 DBI_0.2-4 genefilter_1.20.0 survival_2.34-1 affy_1.18.0
> [22] preprocessCore_1.2.0 affyio_1.8.0 Biobase_2.0.0
>
> loaded via a namespace (and not attached):
> [1] cluster_1.11.11 MASS_7.2-44
>
>
> ---------------------------------
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662
------------------------------
Message: 5
Date: Wed, 22 Oct 2008 14:50:06 +0100
From: Daniel Brewer <daniel.brewer at icr.ac.uk>
Subject: Re: [BioC] Limma contrasts question
To: "James W. MacDonald" <jmacdon at med.umich.edu>,
bioconductor at stat.math.ethz.ch
Message-ID: <48FF2F8E.4020208 at icr.ac.uk>
Content-Type: text/plain; charset=ISO-8859-1
James W. MacDonald wrote:
> Daniel Brewer wrote:
>
>> Hi Jim,
>>
>> Could you go into the maths of the contrast formulas a bit? I would
>> like to get a really solid understanding of what it is doing for future
>> analyses.
>
> Once you understand what the coefficients are, the contrasts are just
> simple algebra. In your case, all of the coefficients are estimating the
> difference between the sample and PC3M (e.g., Knockdown - PC3M).
>
> So the algebra is something like this:
>
> 2(Knockdown - PC3M) - (Scramble - PC3M)
> =
> 2Knockdown - 2PC3M - Scramble + PC3M
> =
> 2Knockdown - Scramble - PC3M
> =
> Knockdown - (Scramble + PC3M)/2
>
> Which is knockdown minus the mean of the controls.
>
> Note that this will be the numerator of the resulting t-statistic. The
> denominator will be sort of an average of the variability within each of
> the three groups being compared. So the question being answered is 'What
> genes are different in Knockdown as compared to the average of the
> controls?'. However, there is nothing here to test if the two controls
> are similar at all (and you might not care).
>
> So for instance, you might have a gene with average expression like this:
>
> Knockdown = 10
> PC3M = 4
> Scramble = 7
>
> If the intra-group variability is small for each sample type, then you
> will likely get a significant t-statistic even though the two controls
> are probably significantly different as well. Which is why I mentioned
> earlier that you might want to test the Scramble - PC3M contrast as well.
>
> Best,
>
> Jim
>
>
>>
>> Many thanks
>>
>> Dan
>>
>
Thanks for that, that is brilliant and has clarified everything. I did
a comparison of the controls and there were no significant genes found
which is encouraging.
Many thanks
Dan
--
**************************************************************
Daniel Brewer, Ph.D.
Institute of Cancer Research
Molecular Carcinogenesis
Email: daniel.brewer at icr.ac.uk
**************************************************************
The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.
This e-mail message is confidential and for use by the a...{{dropped:2}}
------------------------------
Message: 6
Date: Wed, 22 Oct 2008 21:51:24 +0800
From: Leon Yee <yee.leon at gmail.com>
Subject: [BioC] quality assessment and preprocessing for tiling
array-based CGH data
To: bioconductor at stat.math.ethz.ch
Message-ID: <48FF2FDC.6020203 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Dear all,
Is there any well-established routine for quality assessment and
preprocessing of array CGH data, especially tiling array-based CGH data?
I found most of the quality assessment of array data are about
expression array, while few are related to array CGH data.
We are using agilent 244k CGH array of rat, and now we have the
text files produced by Feature Extraction, don't know whether they are
of good quality. Could anyone help provide some clues? Thanks in advance!
After read.maimage(), we got the RGlist object, which contain
several components including R, G, Rb, Gb, and so on. The probes are of
3 types: -1, 1 and 0. 0 means normal probe; -1 mean negative control, i
guess, and the probe names are like (-)3xSLv1, NC1_00000002, etc[no
corresponding probe sequence]; 1 means positive control, i guess, and
the probe names are like DarkCorner, DCP_008001.0, RnCGHBrightCorner,
SRN_800002, etc[no corresponding probe sequence]. The number of -1 is
1275, while the number of 1 is 4217, each of which has its R, Rb, G, Gb
values. Do we need these values for quality assessment and
normalization? How?
In addition, in the normal probes, we have 1000 probes repeating 3
times in the array. How could we use these data for quality assessment
and normalization?
Regards,
Leon
------------------------------
Message: 7
Date: Wed, 22 Oct 2008 15:48:22 +0200
From: Robert Castelo <robert.castelo at upf.edu>
Subject: [BioC] GOstats and org.EcK12.eg.db
To: bioconductor at stat.math.ethz.ch
Message-ID: <1224683302.5889.35.camel at llull.imim.es>
Content-Type: text/plain
dear list,
I cannot get to work GOstats with the annotation for E. coli in
org.EcK12.eg.db. Please find below the code that reproduces the problem
including the error message, and my sessionInfo() at the end of this
email. I have included the same exercise with the human annotation
package org.Hs.eg.db which runs fine in my system. Any help with this
will be very much appreciated.
thanks!!
robert.
==========CODE STARTS HERE===========
library(org.Hs.eg.db)
library(org.EcK12.eg.db)
library(GOstats)
geneuniverse <- mappedkeys(org.EcK12.egSYMBOL)
set.seed(12345)
geneset <- sample(geneuniverse, size=100, replace=FALSE)
goHypGparams <- new("GOHyperGParams",
geneIds=geneset,
universeGeneIds=geneuniverse,
annotation="org.EcK12.eg.db", ontology="BP",
pvalueCutoff=1.0, conditional=TRUE,
testDirection="over")
goHypGcond <- hyperGTest(goHypGparams)
Error in get(mapName, envir = pkgEnv, inherits = FALSE) :
variable "org.EcK12.egENTREZID" was not found
Error in mget(probes, ID2EntrezID(datPkg)) :
error in evaluating the argument 'envir' in selecting a method for
function 'mget'
geneuniverse <- mappedkeys(org.Hs.egSYMBOL)
set.seed(12345)
geneset <- sample(geneuniverse, size=100, replace=FALSE)
goHypGparams <- new("GOHyperGParams",
geneIds=geneset,
universeGeneIds=geneuniverse,
annotation="org.Hs.eg.db", ontology="BP",
pvalueCutoff=1.0, conditional=TRUE,
testDirection="over")
goHypGcond <- hyperGTest(goHypGparams)
sessionInfo()
R version 2.8.0 beta (2008-10-05 r46601)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] splines tools stats graphics grDevices utils
datasets
[8] methods base
other attached packages:
[1] org.Hs.eg.db_2.2.6 GOstats_2.7.0 Category_2.7.6
[4] genefilter_1.21.5 survival_2.34-1 RBGL_1.17.2
[7] annotate_1.19.2 xtable_1.5-4 GO.db_2.2.5
[10] graph_1.19.6 org.EcK12.eg.db_2.2.6 AnnotationDbi_1.3.12
[13] RSQLite_0.7-0 DBI_0.2-4 Biobase_2.1.7
loaded via a namespace (and not attached):
[1] cluster_1.11.11 GSEABase_1.3.6 XML_1.98-1
------------------------------
Message: 8
Date: Wed, 22 Oct 2008 10:02:35 -0400
From: "Sean Davis" <sdavis2 at mail.nih.gov>
Subject: Re: [BioC] quality assessment and preprocessing for tiling
array-based CGH data
To: "Leon Yee" <yee.leon at gmail.com>
Cc: bioconductor at stat.math.ethz.ch
Message-ID:
<264855a00810220702m3ee17381y221a8e5ec18ee6ff at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
On Wed, Oct 22, 2008 at 9:51 AM, Leon Yee <yee.leon at gmail.com> wrote:
> Dear all,
>
> Is there any well-established routine for quality assessment and
> preprocessing of array CGH data, especially tiling array-based CGH data? I
> found most of the quality assessment of array data are about expression
> array, while few are related to array CGH data.
> We are using agilent 244k CGH array of rat, and now we have the text
> files produced by Feature Extraction, don't know whether they are of good
[[elided Yahoo spam]]
>
> After read.maimage(), we got the RGlist object, which contain several
> components including R, G, Rb, Gb, and so on. The probes are of 3 types:
> -1, 1 and 0. 0 means normal probe; -1 mean negative control, i guess, and
> the probe names are like (-)3xSLv1, NC1_00000002, etc[no corresponding probe
> sequence]; 1 means positive control, i guess, and the probe names are like
> DarkCorner, DCP_008001.0, RnCGHBrightCorner, SRN_800002, etc[no
> corresponding probe sequence]. The number of -1 is 1275, while the number
> of 1 is 4217, each of which has its R, Rb, G, Gb values. Do we need these
> values for quality assessment and normalization? How?
> In addition, in the normal probes, we have 1000 probes repeating 3 times
> in the array. How could we use these data for quality assessment and
> normalization?
You generally will not want to do any normalization besides a possible
shift of the center. Any linear normalization that affects the slope
of the M vs. A plot or nonlinear normalization will likely decrease
signal. As for quality control, a good, general measure to track is
the dlrs, a robust measure of the standard deviation.
dlrs <-
function(x) {
nx <- length(x)
if (nx<3) {
stop("Vector length>2 needed for computation")
}
tmp <- embed(x,2)
diffs <- tmp[,2]-tmp[,1]
dlrs <- IQR(diffs)/(sqrt(2)*1.34)
return(dlrs)
}
For agilent arrays, most of the dlrs should be around or under 0.2,
generally. However, this might vary a bit based on lab-to-lab
variation. In any case, if there is a significant outlier, that is
suspect. The input to the above function is the log ratios for a
single array arranged in chromosome and position order.
Sean
------------------------------
Message: 9
Date: Wed, 22 Oct 2008 07:24:50 -0700 (PDT)
Subject: Re: [BioC] GOstat: listing genes from hyperGTest
To: "James W. MacDonald" <jmacdon at med.umich.edu>
Cc: bioc <bioconductor at stat.math.ethz.ch>
Message-ID: <418040.17653.qm at web58004.mail.re3.yahoo.com>
Content-Type: text/plain
Thanks James. If I can tweak that function, I'll get exactly what I want.
I tried what you suggested and got the following error:
---------------------------
### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs
paramsGO <- new("GOHyperGParams", geneIds = genes1,
universeGeneIds = allGenes, annotation = "org.Hs.eg.db",
ontology = "BP", pvalueCutoff = 1, conditional = FALSE,
testDirection = "over")
GO <- hyperGTest(paramsGO)
ps <- probeSetSummary(GO)
Error in get(mapName, envir = pkgEnv, inherits = FALSE) :
variable "org.Hs.egENTREZID" was not found
--------------------------------
I guess the function would return the probe ids if I was using them, but I have Entrez IDs as input.
Or am I doing something wrong?
thanks!
----- Original Message ----
From: James W. MacDonald <jmacdon at med.umich.edu>
Cc: bioc <bioconductor at stat.math.ethz.ch>
Sent: Wednesday, October 22, 2008 9:10:39 AM
Subject: Re: [BioC] GOstat: listing genes from hyperGTest
Hi Tim,
Does probeSetSummary() do what you want?
Best,
Jim
Tim Smith wrote:
>
> Hi,
>
> I
> was performing a hyperGTest for genes in homo-sapiens. For a set of
> input genes, this function returns some 'significant' GO terms. What I
> wanted to now do was to co-relate each significant GO term (returned by
> this function) with genes (from my set of input genes) associated with
> that GO term. However, I think that I may be using the wrong
> package/function to get the releveant set of genes.
>
> Currently, what I'm doing is finding the significant GO terms by using the following code:
>
> -----------------------
> ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs
>
> paramsGO <- new("GOHyperGParams", geneIds = genes1,
> universeGeneIds = allGenes, annotation = "org.Hs.eg.db",
> ontology = "BP", pvalueCutoff = 1, conditional = FALSE,
> testDirection = "over")
>
> GO <- hyperGTest(paramsGO)
> --------------------------
> This
> gives me a set of significant GO terms. Now, I would like to find which
> subset of genes in 'genes1' is associated with each of the significant
> GO term. To do this I map all GO terms to their Entrez IDs using the
> 'org.Hs.eg.db' package using the following:
>
> xx <- as.list(org.Hs.egGO2EG)
>
> to
> get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms (isn't
> this number small?) that map to at least one Entrez ID. So, from here I
> look up which Entrez IDs are associated with my GO term of interest.
>
> My
> problem is that often, the GO term from hyperGTest is not associated
> with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described
> above ), i.e. the GO term/ID is not in the list obtained from
> 'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up by
> hyperGTest, but does not appear to be associated with any Entrez IDs in
> the org.Hs.eg.db package. Where could I be going wrong?
>
[[elided Yahoo spam]]
>
> Thanks for any comments/suggestions. I realize that I'm probably doing something really stupid here....
>
> My sessionInfo() is:
> --------------------------------
> R version 2.7.2 (2008-08-25)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United
> States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] grid splines tools stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1]
> gplots_2.6.0 gmodels_2.14.1 gtools_2.4.0
> gdata_2.4.1 Rgraphviz_1.18.1 GOstats_2.6.0
> Category_2.6.0
> [8] RBGL_1.16.0 annotate_1.18.0
> xtable_1.5-2 graph_1.18.0 PFAM.db_2.2.0
> GO.db_2.2.0 KEGG.db_2.2.0
> [15] org.Hs.eg.db_2.2.0 AnnotationDbi_1.2.0 RSQLite_0.6-8 DBI_0.2-4 genefilter_1.20.0 survival_2.34-1 affy_1.18.0
> [22] preprocessCore_1.2.0 affyio_1.8.0 Biobase_2.0.0
>
> loaded via a namespace (and not attached):
> [1] cluster_1.11.11 MASS_7.2-44
>
>
> ---------------------------------
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662
[[alternative HTML version deleted]]
------------------------------
Message: 10
Date: Wed, 22 Oct 2008 22:32:34 +0800
From: Leon Yee <yee.leon at gmail.com>
Subject: Re: [BioC] quality assessment and preprocessing for tiling
array-based CGH data
To: Sean Davis <sdavis2 at mail.nih.gov>
Cc: bioconductor at stat.math.ethz.ch
Message-ID: <48FF3982.9020005 at gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Sean Davis wrote:
> On Wed, Oct 22, 2008 at 9:51 AM, Leon Yee <yee.leon at gmail.com> wrote:
>> Dear all,
>>
>> Is there any well-established routine for quality assessment and
>> preprocessing of array CGH data, especially tiling array-based CGH data? I
>> found most of the quality assessment of array data are about expression
>> array, while few are related to array CGH data.
>> We are using agilent 244k CGH array of rat, and now we have the text
>> files produced by Feature Extraction, don't know whether they are of good
[[elided Yahoo spam]]
>>
>> After read.maimage(), we got the RGlist object, which contain several
>> components including R, G, Rb, Gb, and so on. The probes are of 3 types:
>> -1, 1 and 0. 0 means normal probe; -1 mean negative control, i guess, and
>> the probe names are like (-)3xSLv1, NC1_00000002, etc[no corresponding probe
>> sequence]; 1 means positive control, i guess, and the probe names are like
>> DarkCorner, DCP_008001.0, RnCGHBrightCorner, SRN_800002, etc[no
>> corresponding probe sequence]. The number of -1 is 1275, while the number
>> of 1 is 4217, each of which has its R, Rb, G, Gb values. Do we need these
>> values for quality assessment and normalization? How?
>> In addition, in the normal probes, we have 1000 probes repeating 3 times
>> in the array. How could we use these data for quality assessment and
>> normalization?
>
> You generally will not want to do any normalization besides a possible
> shift of the center. Any linear normalization that affects the slope
> of the M vs. A plot or nonlinear normalization will likely decrease
> signal. As for quality control, a good, general measure to track is
> the dlrs, a robust measure of the standard deviation.
>
>
> dlrs <-
> function(x) {
> nx <- length(x)
> if (nx<3) {
> stop("Vector length>2 needed for computation")
> }
> tmp <- embed(x,2)
> diffs <- tmp[,2]-tmp[,1]
> dlrs <- IQR(diffs)/(sqrt(2)*1.34)
> return(dlrs)
> }
>
> For agilent arrays, most of the dlrs should be around or under 0.2,
> generally. However, this might vary a bit based on lab-to-lab
> variation. In any case, if there is a significant outlier, that is
> suspect. The input to the above function is the log ratios for a
> single array arranged in chromosome and position order.
>
> Sean
>
Hi, Sean
Thanks for your advice. However, I have still several questions:
1. The input of dlrs is the log ratios, the log ration extracted
from the text file produced by Feature Extraction? or calculated from
RGlist --> MAlist ? I have searched the mailist and seen a post of you
mentioned the difference of log ration from Feature Extraction and the
default M value from read.maimages.
2. I can get the log ratios of all features including control type
of -1 and 1, but these features don't have chromosome positions, does
this mean I don't need all of them for quality assessment?
3. Some probes with the name of "chr2_random:xxxxx-yyyyyy" will not
get a proper mapping on the chromosome, so I should remove these values
from the input of dlrs. Is it so?
4. How could I handle those 1000 probes repeating 3 times? They
will be mapped on the same chromosome position by three per group.
Regards,
Leon
------------------------------
Message: 11
Date: Wed, 22 Oct 2008 15:42:11 +0100
From: "Mark Dunning" <md392 at cam.ac.uk>
Subject: Re: [BioC] Beadarray and illumina methylation arrays
To: "'Katrina bell'" <katrina.bell at mcri.edu.au>,
<bioconductor at stat.math.ethz.ch>
Message-ID: <000c01c93454$5dc9e720$195db560$@ac.uk>
Content-Type: text/plain; charset="us-ascii"
Hi Katrina,
I only have limited experience with methylation data, but hopefully I might
be able to give you a few pointers.
-The error with readIllumina is quite hard to diagnose without seeing the
example. I haven't seen any data from this type of Methylation array. What
do the first few lines of the bead-level text files (.csv in your case) look
like? It could be that the x and y coordinates are in a slightly different
format to that we have seen before.
-Yes, 25% of outliers does seem a little high I'm afraid. Have you also
looked at whereabouts the outliers are located on the arrays or made some
imageplots? We have just developed a new function for automatic artefact
detection called BASH that will be available in the forthcoming Bioconductor
release. I could be interesting to run that on your data as Illumina do
sometimes miss some beads in obvious artefacts and remove too many beads on
the rest of the array. BASH should be a good compromise.
-Yes, currently the only way of reading methylation data into beadarray is
by using the bead-level data.
-I'm not very familiar with the output of BeadStudio. Do you get separate
detection values for the green and red channels? If so, then I don't think
it would be problem if a bead-type was detected in one channel but not the
other (since they are measure of either methylated or unmethylated
respectively). Bead types that are not detected in either channel could be a
problem though.
-I haven't really seen many guidelines for normalization and this is
something I would like to look into. There is an obvious dye-bias that needs
to be corrected and the background normalisation used by Illumina might
actually help in this regard (although I wouldn't usually recommend it for
other Illumina data). Quantile normalisation has been used for other types
of two-colour Illumina data (http://www.biomedcentral.com/1471-2105/9/409,
http://genome.cshlp.org/cgi/content/abstract/17/3/368) so it could work
here.
Hope this helps,
Mark
-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Katrina bell
Sent: 21 October 2008 03:04
To: bioconductor at stat.math.ethz.ch
Subject: [BioC] Beadarray and illumina methylation arrays
Hello,
This is the first time I have used beadarray . I am using it for the
analysis of an illumina 27 methylation array and I am having a few issues I
hope that you could help me with.
1. The first time I tried to load the methylation data, I didn't write
in singleChannel=FALSE. It happily read in my 12 arrays with no problems
what so ever. I tried plotting a few things which worked fine. Seeing my
mistake, I then went back to reload my data with the red channel
(singleChannel=FALSE) and got the following error.
> BLData = readIllumina(arrayNames = targets$ChipBarcode, textType=".csv",
targets=targets, backgroundMethod="none", singleChannel=FALSE)
Found 12 arrays
Reading pixels of ./4408100017_A_Grn.tif
Calculating background
Sharpening Image
Calculating foregound
Background correcting: method = none
Reading pixels of ./4408100017_A_Red.tif
Calculating background
*** caught segfault ***
address (nil), cause 'memory not mapped'
Traceback:
1: .C("readBeadImage", as.character(tifFiles2[i]), as.double(RedX[ord]),
as.double(RedY[ord]), as.integer(numBeads), foreGround = double(length =
numBeads), backGround = double(length = numBeads),
as.integer(backgroundSize), as.integer(manip), as.integer(fground), PACKAGE
= "beadarray")
2: readIllumina(arrayNames = targets$ChipBarcode, textType = ".csv",
targets = targets, backgroundMethod = "none", singleChannel = FALSE)
session info Below.
So I ended up loading in the data with images=FALSE, which worked, but I
would like to be able to look at the background. Is there a way around this
issue?
2. When I plotted the outliers (bar chart) I got an astounding 25% for the
majority of my 12 samples, both in the red and green channel (unlogged
data). In addition, 3 of the samples had a peak of intensity at 5 in the
green channel, leading me to believe that I have some real quality control
issues with my samples. Any opinions/suggestions on these results would be
most welcome.
3. Is it correct that readBeadSummaryData, is not set up for two colour
arrays such as the methylation arrays? So the only way to look at
methylation data is through reading in BLData?
4. Some of my samples seem to have a large number of targets which have a p
value detection rate above 0.05 (beadstudio output). Illumina have
indicated that they disregard these. If I can not read in the bead summary
data from bead studio, I am assuming that these detection p values can not
be taken into account in the analysis? Or are there other methods that
remove/down grade these less than optimal probes (most removed as
outliers?).
5. Has any one had any experience with normalisation of the methylation
arrays? I know that many of the usual array methods are out of the question
due to the assumption that most probes on the array will not be
differentially expressed is invalid. I read in a bioconductor list someone
suggesting quantile normalisation? I would really appreciate any feeback
from people who have tried this or other methods, especially if they have
verified their methylation results.
Thanks for any help/advice you may be able to give.
Cheers
Katrina
> sessionInfo() below
R version 2.7.0 (2008-04-22)
x86_64-redhat-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8
;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADD
RESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] tools stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] beadarray_1.8.0 affy_1.18.2 preprocessCore_1.2.1
[4] affyio_1.8.0 geneplotter_1.18.0 annotate_1.18.0
[7] xtable_1.5-2 AnnotationDbi_1.2.2 RSQLite_0.6-9
[10] DBI_0.2-4 lattice_0.17-6 Biobase_2.0.1
[13] limma_2.14.5
loaded via a namespace (and not attached):
[1] grid_2.7.0 KernSmooth_2.22-22 RColorBrewer_1.0-2
[[alternative HTML version deleted]]
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
------------------------------
Message: 12
Date: Wed, 22 Oct 2008 10:46:24 -0400
From: "Sean Davis" <sdavis2 at mail.nih.gov>
Subject: Re: [BioC] quality assessment and preprocessing for tiling
array-based CGH data
To: "Leon Yee" <yee.leon at gmail.com>
Cc: bioconductor at stat.math.ethz.ch
Message-ID:
<264855a00810220746w6d5ecb11ica5542bf46042f20 at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
On Wed, Oct 22, 2008 at 10:32 AM, Leon Yee <yee.leon at gmail.com> wrote:
> Sean Davis wrote:
>>
>> On Wed, Oct 22, 2008 at 9:51 AM, Leon Yee <yee.leon at gmail.com> wrote:
>>>
>>> Dear all,
>>>
>>> Is there any well-established routine for quality assessment and
>>> preprocessing of array CGH data, especially tiling array-based CGH data?
>>> I
>>> found most of the quality assessment of array data are about expression
>>> array, while few are related to array CGH data.
>>> We are using agilent 244k CGH array of rat, and now we have the text
>>> files produced by Feature Extraction, don't know whether they are of good
[[elided Yahoo spam]]
>>>
>>> After read.maimage(), we got the RGlist object, which contain several
>>> components including R, G, Rb, Gb, and so on. The probes are of 3 types:
>>> -1, 1 and 0. 0 means normal probe; -1 mean negative control, i guess, and
>>> the probe names are like (-)3xSLv1, NC1_00000002, etc[no corresponding
>>> probe
>>> sequence]; 1 means positive control, i guess, and the probe names are
>>> like
>>> DarkCorner, DCP_008001.0, RnCGHBrightCorner, SRN_800002, etc[no
>>> corresponding probe sequence]. The number of -1 is 1275, while the
>>> number
>>> of 1 is 4217, each of which has its R, Rb, G, Gb values. Do we need these
>>> values for quality assessment and normalization? How?
>>> In addition, in the normal probes, we have 1000 probes repeating 3
>>> times
>>> in the array. How could we use these data for quality assessment and
>>> normalization?
>>
>> You generally will not want to do any normalization besides a possible
>> shift of the center. Any linear normalization that affects the slope
>> of the M vs. A plot or nonlinear normalization will likely decrease
>> signal. As for quality control, a good, general measure to track is
>> the dlrs, a robust measure of the standard deviation.
>>
>>
>> dlrs <-
>> function(x) {
>> nx <- length(x)
>> if (nx<3) {
>> stop("Vector length>2 needed for computation")
>> }
>> tmp <- embed(x,2)
>> diffs <- tmp[,2]-tmp[,1]
>> dlrs <- IQR(diffs)/(sqrt(2)*1.34)
>> return(dlrs)
>> }
>>
>> For agilent arrays, most of the dlrs should be around or under 0.2,
>> generally. However, this might vary a bit based on lab-to-lab
>> variation. In any case, if there is a significant outlier, that is
>> suspect. The input to the above function is the log ratios for a
>> single array arranged in chromosome and position order.
>>
>> Sean
>>
>
> Hi, Sean
>
> Thanks for your advice. However, I have still several questions:
>
> 1. The input of dlrs is the log ratios, the log ration extracted from the
> text file produced by Feature Extraction? or calculated from RGlist -->
> MAlist ? I have searched the mailist and seen a post of you mentioned the
> difference of log ration from Feature Extraction and the default M value
> from read.maimages.
You can read the Agilent FE manual for more details, but you can
probably use either and come to very similar conclusions. If you use
the MAlist version, make sure to use only median centering or none for
normalization.
> 2. I can get the log ratios of all features including control type of -1
> and 1, but these features don't have chromosome positions, does this mean I
> don't need all of them for quality assessment?
We have not routinely used these probes, no. If an array fails
miserably, then these control probes might be useful for determining
the reason for the failure, though.
> 3. Some probes with the name of "chr2_random:xxxxx-yyyyyy" will not get a
> proper mapping on the chromosome, so I should remove these values from the
> input of dlrs. Is it so?
You can either remove them or treat chr2_random as a separate chromosome.
> 4. How could I handle those 1000 probes repeating 3 times? They will be
> mapped on the same chromosome position by three per group.
You could choose one at random or use a mean or median of them. My
guess is that they agree very closely with one another so the choice
should not affect the results much.
Sean
------------------------------
Message: 13
Date: Wed, 22 Oct 2008 15:54:25 +0100
From: "Dan Bolser" <dan.bolser at gmail.com>
Subject: [BioC] Problem using Rgraphviz (edge weights going missing).
To: bioconductor at stat.math.ethz.ch
Message-ID:
<2c8757af0810220754r3a16507l29dd77f43a5151d7 at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Hi all,
After successfully installing Rgraphviz on this Centos 5.2 (RHEL 4)
box (basically with yum install *graphviz*** ..., then using the
standard BioC install procedure) and then navigating the packaging
error (that actually caused R to crash) described here:
http://lists.rpmforge.net/pipermail/users/2007-August/000958.html
(Just manually run dot -c as root)
I finally got to produce some nice graphs of my data. However, I found
that when I tried to write my graphs to file in dot format, the edge
weights go missing.
I load my graph like this:
library(Rgraphviz)
myNodes <-
paste(nodes$MAPC)
myG <- new("graphNEL", nodes=myNodes)
for(i in 1:nrow(dat)){
paste(dat[i,"A"], dat[i,"B"])
myG <-
addEdge(paste(dat[i,"A"]),
paste(dat[i,"B"]), myG, dat[i,"N"])
}
plot(myG, "neato")
and all well and good (except that I couldn't get the edge weights to
translate into line thickness - more on that below).
Then I (foolishly?) tried:
agwrite(agopen(myG,"test"), filename="test")
This produced a file, but the weight attribute of the edges (in dot
format?) were all clearly set to 1 (and not the range of values that I
have stored in dat$N). I confirmed that my loop was putting the
weights somewhere with the following one-liner:
lapply(myG at edgeData@data, function(l){l$weight})
but I don't really know what this means. So I think its a bug. The
graph should be written to file with the correct weight attributes.
AFAICT.
Poking around in the data structure I was confused by the apparent
number of edges...
ew <-
unlist(edgeWeights(myG))
length(ew)
Why is this not equal to the number of edges that I put in (and the
number reported by just printing the graph object)?
I tried to set the width of the edges plotted on the graph by using
the penwidth attribute, but it had no effect, and was not written to
file as an attribute of the graph.
## Try defining some EDGE attributes
eAttrs <- list()
## Get the edge weights
ew <-
unlist(edgeWeights(myG))
length(ew)
## Some need removing for some reason somehow...
ew <-
ew[setdiff(seq(along = ew), removedEdges(myG))]
length(ew)
## Label the attributes vector (so it can be used properly)
names(ew) <-
edgeNames(myG)
eAttrs$penwidth <-
ew
plot(myG, "neato", edgeAttrs=eAttrs)
toFile(agopen(myG, name="test"), filename="test", edgeAttrs=eAttrs)
So I read in the vignette,
" A list of all available attributes
is accessible
online at: http://www.graphviz.org/pub/scm/graphviz2/doc/info/attrs.
html. Note that there are some di?erences between default values and also
some attributes will not have an e?ect in Rgraphviz. Please see the man page
for graphvizAttributes for more details."
Where is the "man page for graphvizAttributes for more details" ??
It would be good to know what attributes Rgraphviz is ignoring and if
penwidth is one of them in particular.
Also the vignette didn't tell me who to contact in case of feedback,
bugs, questions, etc. Who else should I be bugging with this email?
Thanks for reading!
Dan.
--
R2spec --bioc -u
http://www.bioconductor.org/packages/release/bioc/src/contrib/Rgraphviz_1.18.1.tar.gz
------------------------------
Message: 14
Date: Wed, 22 Oct 2008 11:07:02 -0400
From: "Mark Kimpel" <mwkimpel at gmail.com>
Subject: Re: [BioC] newbie problems with AnnBuilder
To: "Marc Carlson" <mcarlson at fhcrc.org>
Cc: Bioconductor Newsgroup <bioconductor at stat.math.ethz.ch>
Message-ID:
<6b93d1830810220807l788e52f8wa4bafef9726892d0 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Build with AnnotationDbi was uneventful, but I have been unable to
install the package or use it as is.
If the package is just placed in my site-library, I get:
'ragene10stv1.db' is not a valid package -- installed < 2.0.0?
If I tar the package and try R CMD INSTALL, I get:
cannot extract package from 'ragene10stv1.db.tar.gz'
What approach should I be using?
Mark
------------------------------------------------------------
Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine
15032 Hunter Court, Westfield, IN 46074
(317) 490-5129 Work, & Mobile & VoiceMail
(317) 399-1219 Home
Skype: mkimpel
"The real problem is not whether machines think but whether men do."
-- B. F. Skinner
******************************************************************
On Tue, Oct 21, 2008 at 6:41 PM, Marc Carlson <mcarlson at fhcrc.org> wrote:
>
> Hi Mark,
>
> AnnBuilder is on its way out. Please have a look at the SQLforge.pdf
> vignette which can be found here:
>
> http://www.bioconductor.org/packages/2.3/bioc/html/AnnotationDbi.html
>
> If you have further questions after reading this, then please just ask,
> and we should be able to help you out.
>
>
> Marc
>
>
>
> Mark Kimpel wrote:
> > I'm having problems getting AnnBuilder to work with the Affy Rat Gene ST
> > array data supplied by Affy. Using the code chunk below, I am able to get
> > AnnBuilder to create the annotation object, but it crashes, I believe, when
> > it tries to save it. I should also mention that I had a previous crash when
> > I had a madecdfenv package directory in place that used the same name. I got
> > a "file lock" error, so I temporarily renamed the directory to see if this
> > fixed the problem. As below, the error changed, but I still can't get the
> > script to work.
> >
> > I suspect that there is a fundamental misunderstaning on my part related to
> > how the annotation package should relate to the cdf package or some naming
> > convention related to one or both.
> >
> > Mark
> >
> >
> >> require(AnnBuilder); require(makecdfenv)
> >> myBase <- "RaGene-1_0-st-v1.na26.rn4.transcript.probe-entrez_gene.csv"
> >> myBaseType <- "ll"
> >> mySrcUrls <- getSrcUrl("all", "Rattus norvegicus")
> >>
> >> ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
> >>
> > + myBaseType, pkgName =
> > substring(cleancdfname("RaGene-1_0-st-v1"),
> > + 1, (nchar(cleancdfname("RaGene-1_0-st-v1")) - 3)),
> > + pkgPath = '~/R_HOME/site-library-2.8.0', organism = "Rattus
> > norvegicus", version = "1.0",
> > + author = list(authors = "Mark W Kimpel", maintainer =
> > + "mkimpel at iupui.edu"), fromWeb = TRUE)
> > Read 1 item
> > Read 1 item
> > [1] "4450 2 2"
> > Error in lazyLoadDBinsertValue(data, datafile, ascii, compress, envhook) :
> > cannot open file
> > '~/R_HOME/site-library-2.8.0/ragene10stv1/data/Rdata.rdb': No such file or
> > directory
> >
> > Enter a frame number, or 0 to exit
> >
> > 1: ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
> > myBaseTy
> > 2: makeLLDB(file.path(pkgPath, pkgName), compress = TRUE)
> > 3: tools:::makeLazyLoadDB(dataEnv, dbbase, compress = compress)
> > 4: lazyLoadDBinsertVariable(vars[i], from, datafile, ascii, compress,
> > envhook)
> > 5: function (e)
> > 6: lazyLoadDBinsertValue(data, datafile, ascii, compress, envhook)
> >
> > Selection: 1
> > Browse[1]> annotation[1:2,]
> > ENTREZID PROBE ACCNUM GO
> > [1,] "100008565" "10766774" NA NA
> > [2,] "100034253" "10937540" NA "GO:0005525 at IEA;GO:0005622 at IEA"
> > PMID SYMBOL
> > [1,] "16804107;17292978" "Slc39a4l"
> > [2,] "8889548" "Gnl3l"
> > GENENAME CHR
> > [1,] "solute carrier family 39 (zinc transporter), member 4-like" NA
> > [2,] "guanine nucleotide binding protein-like 3 (nucleolar)-like" "X"
> > MAP REFSEQ UNIGENE CHRLOC PATH
> > [1,] NA "NM_001101021;NP_001094491" "Rn.10120" "NA" NA
> > [2,] "Xq14-q21" "NM_001081958;NP_001075427" "Rn.164274" "-40301218 at X" NA
> > ENZYME PFAM PROSITE
> > [1,] NA "NA at IPI00817057" "NA at IPI00817057"
> > [2,] NA "PF01926 at IPI00362844" "NA at IPI00362844"
> > ------------------------------------------------------------
> > Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
> > Indiana University School of Medicine
> >
> > 15032 Hunter Court, Westfield, IN 46074
> >
> > (317) 490-5129 Work, & Mobile & VoiceMail
> > (317) 399-1219 Home
> > Skype: mkimpel
> >
> > "The real problem is not whether machines think but whether men do." -- B.
> > F. Skinner
> > ******************************************************************
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> >
>
------------------------------
Message: 15
Date: Wed, 22 Oct 2008 11:53:59 -0400
From: "Sean Davis" <sdavis2 at mail.nih.gov>
Subject: Re: [BioC] newbie problems with AnnBuilder
To: "Mark Kimpel" <mwkimpel at gmail.com>
Cc: Bioconductor Newsgroup <bioconductor at stat.math.ethz.ch>
Message-ID:
<264855a00810220853m5afad74apfb3df17499f90f49 at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
On Wed, Oct 22, 2008 at 11:07 AM, Mark Kimpel <mwkimpel at gmail.com> wrote:
> Build with AnnotationDbi was uneventful, but I have been unable to
> install the package or use it as is.
>
> If the package is just placed in my site-library, I get:
> 'ragene10stv1.db' is not a valid package -- installed < 2.0.0?
>
> If I tar the package and try R CMD INSTALL, I get:
> cannot extract package from 'ragene10stv1.db.tar.gz'
>
> What approach should I be using?
You can just
R CMD INSTALL ragene10stv1.db
where ragene10stv1.db is the directory that contains the package
(right above the DESCRIPTION file).
Sean
> On Tue, Oct 21, 2008 at 6:41 PM, Marc Carlson <mcarlson at fhcrc.org> wrote:
>>
>> Hi Mark,
>>
>> AnnBuilder is on its way out. Please have a look at the SQLforge.pdf
>> vignette which can be found here:
>>
>> http://www.bioconductor.org/packages/2.3/bioc/html/AnnotationDbi.html
>>
>> If you have further questions after reading this, then please just ask,
>> and we should be able to help you out.
>>
>>
>> Marc
>>
>>
>>
>> Mark Kimpel wrote:
>> > I'm having problems getting AnnBuilder to work with the Affy Rat Gene ST
>> > array data supplied by Affy. Using the code chunk below, I am able to get
>> > AnnBuilder to create the annotation object, but it crashes, I believe, when
>> > it tries to save it. I should also mention that I had a previous crash when
>> > I had a madecdfenv package directory in place that used the same name. I got
>> > a "file lock" error, so I temporarily renamed the directory to see if this
>> > fixed the problem. As below, the error changed, but I still can't get the
>> > script to work.
>> >
>> > I suspect that there is a fundamental misunderstaning on my part related to
>> > how the annotation package should relate to the cdf package or some naming
>> > convention related to one or both.
>> >
>> > Mark
>> >
>> >
>> >> require(AnnBuilder); require(makecdfenv)
>> >> myBase <- "RaGene-1_0-st-v1.na26.rn4.transcript.probe-entrez_gene.csv"
>> >> myBaseType <- "ll"
>> >> mySrcUrls <- getSrcUrl("all", "Rattus norvegicus")
>> >>
>> >> ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
>> >>
>> > + myBaseType, pkgName =
>> > substring(cleancdfname("RaGene-1_0-st-v1"),
>> > + 1, (nchar(cleancdfname("RaGene-1_0-st-v1")) - 3)),
>> > + pkgPath = '~/R_HOME/site-library-2.8.0', organism = "Rattus
>> > norvegicus", version = "1.0",
>> > + author = list(authors = "Mark W Kimpel", maintainer =
>> > + "mkimpel at iupui.edu"), fromWeb = TRUE)
>> > Read 1 item
>> > Read 1 item
>> > [1] "4450 2 2"
>> > Error in lazyLoadDBinsertValue(data, datafile, ascii, compress, envhook) :
>> > cannot open file
>> > '~/R_HOME/site-library-2.8.0/ragene10stv1/data/Rdata.rdb': No such file or
>> > directory
>> >
>> > Enter a frame number, or 0 to exit
>> >
>> > 1: ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
>> > myBaseTy
>> > 2: makeLLDB(file.path(pkgPath, pkgName), compress = TRUE)
>> > 3: tools:::makeLazyLoadDB(dataEnv, dbbase, compress = compress)
>> > 4: lazyLoadDBinsertVariable(vars[i], from, datafile, ascii, compress,
>> > envhook)
>> > 5: function (e)
>> > 6: lazyLoadDBinsertValue(data, datafile, ascii, compress, envhook)
>> >
>> > Selection: 1
>> > Browse[1]> annotation[1:2,]
>> > ENTREZID PROBE ACCNUM GO
>> > [1,] "100008565" "10766774" NA NA
>> > [2,] "100034253" "10937540" NA "GO:0005525 at IEA;GO:0005622 at IEA"
>> > PMID SYMBOL
>> > [1,] "16804107;17292978" "Slc39a4l"
>> > [2,] "8889548" "Gnl3l"
>> > GENENAME CHR
>> > [1,] "solute carrier family 39 (zinc transporter), member 4-like" NA
>> > [2,] "guanine nucleotide binding protein-like 3 (nucleolar)-like" "X"
>> > MAP REFSEQ UNIGENE CHRLOC PATH
>> > [1,] NA "NM_001101021;NP_001094491" "Rn.10120" "NA" NA
>> > [2,] "Xq14-q21" "NM_001081958;NP_001075427" "Rn.164274" "-40301218 at X" NA
>> > ENZYME PFAM PROSITE
>> > [1,] NA "NA at IPI00817057" "NA at IPI00817057"
>> > [2,] NA "PF01926 at IPI00362844" "NA at IPI00362844"
>> > ------------------------------------------------------------
>> > Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
>> > Indiana University School of Medicine
>> >
>> > 15032 Hunter Court, Westfield, IN 46074
>> >
>> > (317) 490-5129 Work, & Mobile & VoiceMail
>> > (317) 399-1219 Home
>> > Skype: mkimpel
>> >
>> > "The real problem is not whether machines think but whether men do." -- B.
>> > F. Skinner
>> > ******************************************************************
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at stat.math.ethz.ch
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>> >
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
------------------------------
Message: 16
Date: Wed, 22 Oct 2008 12:05:25 -0400
From: "Mark Kimpel" <mwkimpel at gmail.com>
Subject: Re: [BioC] newbie problems with AnnBuilder
To: "Sean Davis" <sdavis2 at mail.nih.gov>
Cc: Bioconductor Newsgroup <bioconductor at stat.math.ethz.ch>
Message-ID:
<6b93d1830810220905m46a4e41kfe052fd768d458e5 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Interesting, it works from a command line outside of R, but when I had
tried R> system('R CMD INSTALL ragene10stv1.db') I received an error
message. I have used this approach successfully with other packages to
avoid leaving R and starting a shell, but with this package I get:
'* Installing to library '/home/mkimpel/R_HOME/site-library-2.8.0'
ERROR: no packages specified'
Well, thanks, it now is installed. Any comment on why my system()
approach might not have worked?
Mark
------------------------------------------------------------
Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine
15032 Hunter Court, Westfield, IN 46074
(317) 490-5129 Work, & Mobile & VoiceMail
(317) 399-1219 Home
Skype: mkimpel
"The real problem is not whether machines think but whether men do."
-- B. F. Skinner
******************************************************************
On Wed, Oct 22, 2008 at 11:53 AM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
> On Wed, Oct 22, 2008 at 11:07 AM, Mark Kimpel <mwkimpel at gmail.com> wrote:
>> Build with AnnotationDbi was uneventful, but I have been unable to
>> install the package or use it as is.
>>
>> If the package is just placed in my site-library, I get:
>> 'ragene10stv1.db' is not a valid package -- installed < 2.0.0?
>>
>> If I tar the package and try R CMD INSTALL, I get:
>> cannot extract package from 'ragene10stv1.db.tar.gz'
>>
>> What approach should I be using?
>
> You can just
>
> R CMD INSTALL ragene10stv1.db
>
> where ragene10stv1.db is the directory that contains the package
> (right above the DESCRIPTION file).
>
> Sean
>
>
>> On Tue, Oct 21, 2008 at 6:41 PM, Marc Carlson <mcarlson at fhcrc.org> wrote:
>>>
>>> Hi Mark,
>>>
>>> AnnBuilder is on its way out. Please have a look at the SQLforge.pdf
>>> vignette which can be found here:
>>>
>>> http://www.bioconductor.org/packages/2.3/bioc/html/AnnotationDbi.html
>>>
>>> If you have further questions after reading this, then please just ask,
>>> and we should be able to help you out.
>>>
>>>
>>> Marc
>>>
>>>
>>>
>>> Mark Kimpel wrote:
>>> > I'm having problems getting AnnBuilder to work with the Affy Rat Gene ST
>>> > array data supplied by Affy. Using the code chunk below, I am able to get
>>> > AnnBuilder to create the annotation object, but it crashes, I believe, when
>>> > it tries to save it. I should also mention that I had a previous crash when
>>> > I had a madecdfenv package directory in place that used the same name. I got
>>> > a "file lock" error, so I temporarily renamed the directory to see if this
>>> > fixed the problem. As below, the error changed, but I still can't get the
>>> > script to work.
>>> >
>>> > I suspect that there is a fundamental misunderstaning on my part related to
>>> > how the annotation package should relate to the cdf package or some naming
>>> > convention related to one or both.
>>> >
>>> > Mark
>>> >
>>> >
>>> >> require(AnnBuilder); require(makecdfenv)
>>> >> myBase <- "RaGene-1_0-st-v1.na26.rn4.transcript.probe-entrez_gene.csv"
>>> >> myBaseType <- "ll"
>>> >> mySrcUrls <- getSrcUrl("all", "Rattus norvegicus")
>>> >>
>>> >> ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
>>> >>
>>> > + myBaseType, pkgName =
>>> > substring(cleancdfname("RaGene-1_0-st-v1"),
>>> > + 1, (nchar(cleancdfname("RaGene-1_0-st-v1")) - 3)),
>>> > + pkgPath = '~/R_HOME/site-library-2.8.0', organism = "Rattus
>>> > norvegicus", version = "1.0",
>>> > + author = list(authors = "Mark W Kimpel", maintainer =
>>> > + "mkimpel at iupui.edu"), fromWeb = TRUE)
>>> > Read 1 item
>>> > Read 1 item
>>> > [1] "4450 2 2"
>>> > Error in lazyLoadDBinsertValue(data, datafile, ascii, compress, envhook) :
>>> > cannot open file
>>> > '~/R_HOME/site-library-2.8.0/ragene10stv1/data/Rdata.rdb': No such file or
>>> > directory
>>> >
>>> > Enter a frame number, or 0 to exit
>>> >
>>> > 1: ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
>>> > myBaseTy
>>> > 2: makeLLDB(file.path(pkgPath, pkgName), compress = TRUE)
>>> > 3: tools:::makeLazyLoadDB(dataEnv, dbbase, compress = compress)
>>> > 4: lazyLoadDBinsertVariable(vars[i], from, datafile, ascii, compress,
>>> > envhook)
>>> > 5: function (e)
>>> > 6: lazyLoadDBinsertValue(data, datafile, ascii, compress, envhook)
>>> >
>>> > Selection: 1
>>> > Browse[1]> annotation[1:2,]
>>> > ENTREZID PROBE ACCNUM GO
>>> > [1,] "100008565" "10766774" NA NA
>>> > [2,] "100034253" "10937540" NA "GO:0005525 at IEA;GO:0005622 at IEA"
>>> > PMID SYMBOL
>>> > [1,] "16804107;17292978" "Slc39a4l"
>>> > [2,] "8889548" "Gnl3l"
>>> > GENENAME CHR
>>> > [1,] "solute carrier family 39 (zinc transporter), member 4-like" NA
>>> > [2,] "guanine nucleotide binding protein-like 3 (nucleolar)-like" "X"
>>> > MAP REFSEQ UNIGENE CHRLOC PATH
>>> > [1,] NA "NM_001101021;NP_001094491" "Rn.10120" "NA" NA
>>> > [2,] "Xq14-q21" "NM_001081958;NP_001075427" "Rn.164274" "-40301218 at X" NA
>>> > ENZYME PFAM PROSITE
>>> > [1,] NA "NA at IPI00817057" "NA at IPI00817057"
>>> > [2,] NA "PF01926 at IPI00362844" "NA at IPI00362844"
>>> > ------------------------------------------------------------
>>> > Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
>>> > Indiana University School of Medicine
>>> >
>>> > 15032 Hunter Court, Westfield, IN 46074
>>> >
>>> > (317) 490-5129 Work, & Mobile & VoiceMail
>>> > (317) 399-1219 Home
>>> > Skype: mkimpel
>>> >
>>> > "The real problem is not whether machines think but whether men do." -- B.
>>> > F. Skinner
>>> > ******************************************************************
>>> >
>>> > [[alternative HTML version deleted]]
>>> >
>>> > _______________________________________________
>>> > Bioconductor mailing list
>>> > Bioconductor at stat.math.ethz.ch
>>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> >
>>> >
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
------------------------------
Message: 17
Date: Wed, 22 Oct 2008 09:49:20 -0700
From: Robert Gentleman <rgentlem at fhcrc.org>
Subject: Re: [BioC] GOstats and org.EcK12.eg.db
To: Robert Castelo <robert.castelo at upf.edu>
Cc: bioconductor at stat.math.ethz.ch
Message-ID: <48FF5990.20501 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi Robert,
Yet another one that I will need to write some specialized code for.
I should get it done by the end of the week, and will push it to both
release and devel. I will post an email when it is done,
best wishes
Robert
Robert Castelo wrote:
> dear list,
>
> I cannot get to work GOstats with the annotation for E. coli in
> org.EcK12.eg.db. Please find below the code that reproduces the problem
> including the error message, and my sessionInfo() at the end of this
> email. I have included the same exercise with the human annotation
> package org.Hs.eg.db which runs fine in my system. Any help with this
> will be very much appreciated.
>
> thanks!!
> robert.
> ==========CODE STARTS HERE===========
>
> library(org.Hs.eg.db)
> library(org.EcK12.eg.db)
> library(GOstats)
>
> geneuniverse <- mappedkeys(org.EcK12.egSYMBOL)
> set.seed(12345)
> geneset <- sample(geneuniverse, size=100, replace=FALSE)
>
>
> goHypGparams <- new("GOHyperGParams",
> geneIds=geneset,
> universeGeneIds=geneuniverse,
> annotation="org.EcK12.eg.db", ontology="BP",
> pvalueCutoff=1.0, conditional=TRUE,
> testDirection="over")
>
> goHypGcond <- hyperGTest(goHypGparams)
>
> Error in get(mapName, envir = pkgEnv, inherits = FALSE) :
> variable "org.EcK12.egENTREZID" was not found
> Error in mget(probes, ID2EntrezID(datPkg)) :
> error in evaluating the argument 'envir' in selecting a method for
> function 'mget'
>
> geneuniverse <- mappedkeys(org.Hs.egSYMBOL)
> set.seed(12345)
> geneset <- sample(geneuniverse, size=100, replace=FALSE)
>
>
> goHypGparams <- new("GOHyperGParams",
> geneIds=geneset,
> universeGeneIds=geneuniverse,
> annotation="org.Hs.eg.db", ontology="BP",
> pvalueCutoff=1.0, conditional=TRUE,
> testDirection="over")
>
> goHypGcond <- hyperGTest(goHypGparams)
>
>
> sessionInfo()
> R version 2.8.0 beta (2008-10-05 r46601)
> x86_64-unknown-linux-gnu
>
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
>
> attached base packages:
> [1] splines tools stats graphics grDevices utils
> datasets
> [8] methods base
>
> other attached packages:
> [1] org.Hs.eg.db_2.2.6 GOstats_2.7.0 Category_2.7.6
> [4] genefilter_1.21.5 survival_2.34-1 RBGL_1.17.2
> [7] annotate_1.19.2 xtable_1.5-4 GO.db_2.2.5
> [10] graph_1.19.6 org.EcK12.eg.db_2.2.6 AnnotationDbi_1.3.12
> [13] RSQLite_0.7-0 DBI_0.2-4 Biobase_2.1.7
>
> loaded via a namespace (and not attached):
> [1] cluster_1.11.11 GSEABase_1.3.6 XML_1.98-1
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
------------------------------
Message: 18
Date: Thu, 23 Oct 2008 01:14:09 +0800
From: Leon Yee <yee.leon at gmail.com>
Subject: Re: [BioC] quality assessment and preprocessing for tiling
array-based CGH data
To: Sean Davis <sdavis2 at mail.nih.gov>
Cc: bioconductor at stat.math.ethz.ch
Message-ID: <48FF5F61.5060906 at gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Sean Davis wrote:
>> Hi, Sean
>>
>> Thanks for your advice. However, I have still several questions:
>>
>> 1. The input of dlrs is the log ratios, the log ration extracted from the
>> text file produced by Feature Extraction? or calculated from RGlist -->
>> MAlist ? I have searched the mailist and seen a post of you mentioned the
>> difference of log ration from Feature Extraction and the default M value
>> from read.maimages.
>
> You can read the Agilent FE manual for more details, but you can
> probably use either and come to very similar conclusions. If you use
> the MAlist version, make sure to use only median centering or none for
> normalization.
>
>> 2. I can get the log ratios of all features including control type of -1
>> and 1, but these features don't have chromosome positions, does this mean I
>> don't need all of them for quality assessment?
>
> We have not routinely used these probes, no. If an array fails
> miserably, then these control probes might be useful for determining
> the reason for the failure, though.
>
>> 3. Some probes with the name of "chr2_random:xxxxx-yyyyyy" will not get a
>> proper mapping on the chromosome, so I should remove these values from the
>> input of dlrs. Is it so?
>
> You can either remove them or treat chr2_random as a separate chromosome.
>
>> 4. How could I handle those 1000 probes repeating 3 times? They will be
>> mapped on the same chromosome position by three per group.
>
> You could choose one at random or use a mean or median of them. My
> guess is that they agree very closely with one another so the choice
> should not affect the results much.
Hi, Sean
Thank you very much for your detailed reply and help.
Where can I get the references or official documentations about
dlrs method?
In addition, we have design our array with dye-swap [test-cy3 vs
ref-cy5, and test-cy5 vs ref-cy3]. Is there any method for utilizing the
information here for quality assessment?
Best wishes!
Leon
------------------------------
Message: 19
Date: Wed, 22 Oct 2008 17:13:10 +0000 (UTC)
From: Shinichiro Wachi <swachi at ucdavis.edu>
Subject: [BioC] Bioconductor installation problem: unable to access
repository
To: bioconductor at stat.math.ethz.ch
Message-ID: <loom.20081022T170334-485 at post.gmane.org>
Content-Type: text/plain; charset=us-ascii
I am installing Bioconductor on a new machine (Intel Mac running OSX 10.5.5), R
version is 2.8.0.
sudo R is used for this session.
These are the error messages I get:
source("http://bioconductor.org/biocLite.R")
biocinstall()
Running biocinstall version 2.3.8 with R version 2.8.0 (under development)
Your version of R requires version 2.3 of Bioconductor.
Warning: unable to access index for repository
http://bioconductor.org/packages/2.3/bioc/bin/macosx//contrib/2.8
Warning: unable to access index for repository
http://bioconductor.org/packages/2.3/data/annotation/bin/macosx//contrib/2.8
Warning: unable to access index for repository
http://bioconductor.org/packages/2.3/data/experiment/bin/macosx//contrib/2.8
Warning: unable to access index for repository
http://bioconductor.org/packages/2.3/extra/bin/macosx//contrib/2.8
Warning: unable to access index for repository
http://cran.fhcrc.org/bin/macosx//contrib/2.8
Warning message:
package 'Biobase' is not available
http://bioconductor.org/packages/2.3/bioc/src/contrib/PACKAGES
will produce a web page displaying packages. (I read a post that asked for this.
I am assuming this was a quick server diagnostics).
Is there a problem with the server, or is this a temporary glitch in the
installer? Is there a workaround?
Much thanks.
Shin
------------------------------
Message: 20
Date: Wed, 22 Oct 2008 13:29:58 -0400
From: "Sean Davis" <sdavis2 at mail.nih.gov>
Subject: Re: [BioC] quality assessment and preprocessing for tiling
array-based CGH data
To: "Leon Yee" <yee.leon at gmail.com>
Cc: bioconductor at stat.math.ethz.ch
Message-ID:
<264855a00810221029i18a08bc3nb1a34d786a6b30a6 at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
On Wed, Oct 22, 2008 at 1:14 PM, Leon Yee <yee.leon at gmail.com> wrote:
> Sean Davis wrote:
>>>
>>> Hi, Sean
>>>
>>> Thanks for your advice. However, I have still several questions:
>>>
>>> 1. The input of dlrs is the log ratios, the log ration extracted from
>>> the
>>> text file produced by Feature Extraction? or calculated from RGlist -->
>>> MAlist ? I have searched the mailist and seen a post of you mentioned
>>> the
>>> difference of log ration from Feature Extraction and the default M value
>>> from read.maimages.
>>
>> You can read the Agilent FE manual for more details, but you can
>> probably use either and come to very similar conclusions. If you use
>> the MAlist version, make sure to use only median centering or none for
>> normalization.
>>
>>> 2. I can get the log ratios of all features including control type of -1
>>> and 1, but these features don't have chromosome positions, does this mean
>>> I
>>> don't need all of them for quality assessment?
>>
>> We have not routinely used these probes, no. If an array fails
>> miserably, then these control probes might be useful for determining
>> the reason for the failure, though.
>>
>>> 3. Some probes with the name of "chr2_random:xxxxx-yyyyyy" will not get
>>> a
>>> proper mapping on the chromosome, so I should remove these values from
>>> the
>>> input of dlrs. Is it so?
>>
>> You can either remove them or treat chr2_random as a separate chromosome.
>>
>>> 4. How could I handle those 1000 probes repeating 3 times? They will be
>>> mapped on the same chromosome position by three per group.
>>
>> You could choose one at random or use a mean or median of them. My
>> guess is that they agree very closely with one another so the choice
>> should not affect the results much.
>
> Hi, Sean
>
> Thank you very much for your detailed reply and help.
>
> Where can I get the references or official documentations about dlrs
> method?
It is a standard robust estimator of the variance and is not specific
to microarrays. If you look at the code, it simply subtracts the
difference between adjacent probes and then normalizes the result. If
the array is "noisy", the dlrs will be high. This assumes that the
contribution due to large copy number changes is negligible which is
likely true since even the most abnormal cancer samples have fewer
than 1000 breaks.
> In addition, we have design our array with dye-swap [test-cy3 vs ref-cy5,
> and test-cy5 vs ref-cy3]. Is there any method for utilizing the information
> here for quality assessment?
Not that I know of, but you could certainly look at correlations
between replicates, etc. Our experience with Agilent CGH arrays is
that the contribution due to dye bias is small compared to changes due
to copy number.
Sean
Sean
------------------------------
Message: 21
Date: Wed, 22 Oct 2008 13:40:01 -0400
From: "James W. MacDonald" <jmacdon at med.umich.edu>
Subject: Re: [BioC] GOstat: listing genes from hyperGTest
Cc: bioc <bioconductor at stat.math.ethz.ch>
Message-ID: <48FF6571.5090806 at med.umich.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi Tim,
Yeah, probeSetSummary() is probably not what you want, if you are not
starting with an Affy chip. There are some gymnastics required to map
things back to the original Affy chip that you won't need to do. In
addition, if you are not using a conditional hypergeometric analysis, it
should be pretty simple to get what you want without even needing to
parse things out of the GOHyperGResult object. An example:
## fake up some data
> geneIds <- Lkeys(org.Hs.egGO)[sample(1:5000, 500)]
> univ <- Lkeys(org.Hs.egGO)
> param <- new("GOHyperGParams", geneIds = geneIds,
universeGeneIds=univ, annotation="org.Hs.eg.db", ontology="BP")
> hyp <- hyperGTest(param)
> summary(hyp, categorySize=10)
GOBPID Pvalue OddsRatio ExpCount Count Size
Term
1 GO:0007338 0.002723500 29.25101 0.07808304 2 54 single
fertilization
2 GO:0009566 0.002925855 28.16374 0.08097501 2 56
fertilization
So we have two terms of interest. Getting the Entrez Gene IDs from the
input set that map to these terms is easy:
> geneIds[geneIds %in% get("GO:0007338", revmap(org.Hs.egGO))]
[1] "100131137" "10007"
Now you might also want to know which 54 Entrez Gene IDs map to that
particular GO term. Since you are not conditioning, this includes that
particular GO term and all its offspring.
> offspring <- get("GO:0007338", GOBPOFFSPRING)
> egids <- unique(unlist(mget(c("GO:0007338", offspring),
revmap(org.Hs.egGO), ifnotfound=NA), use.names=FALSE))
> egids[!is.na(egids)]
[1] "1047" "4179" "4240" "4486" "4809" "5016"
[7] "6674" "7783" "7784" "7802" "7993" "8747"
[13] "8748" "8852" "9082" "10007" "10361" "22917"
[19] "26476" "53340" "57055" "57829" "64100" "93185"
[25] "158062" "442868" "100131137" "49" "410" "2683"
[31] "3010" "4184" "6677" "7142" "7455" "8857"
[37] "11055" "124626" "2054" "2741" "10343" "10566"
[43] "27297" "152015" "3074" "167" "928" "2515"
[49] "5104" "23553" "284359" "164684" "7141" "79400"
Best,
Jim
Tim Smith wrote:
> Thanks James. If I can tweak that function, I'll get exactly what I want.
>
> I tried what you suggested and got the following error:
>
> ---------------------------
> ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs
>
> paramsGO <- new("GOHyperGParams", geneIds = genes1,
> universeGeneIds = allGenes, annotation = "org.Hs.eg.db",
> ontology = "BP", pvalueCutoff = 1, conditional = FALSE,
> testDirection = "over")
>
> GO <- hyperGTest(paramsGO)
> ps <- probeSetSummary(GO)
>
> Error in get(mapName, envir = pkgEnv, inherits = FALSE) :
> variable "org.Hs.egENTREZID" was not found
> --------------------------------
>
> I guess the function would return the probe ids if I was using them, but I have Entrez IDs as input.
>
> Or am I doing something wrong?
>
> thanks!
>
>
>
>
>
> ----- Original Message ----
> From: James W. MacDonald <jmacdon at med.umich.edu>
>
> Cc: bioc <bioconductor at stat.math.ethz.ch>
> Sent: Wednesday, October 22, 2008 9:10:39 AM
> Subject: Re: [BioC] GOstat: listing genes from hyperGTest
>
> Hi Tim,
>
> Does probeSetSummary() do what you want?
>
> Best,
>
> Jim
>
>
>
> Tim Smith wrote:
>>
>> Hi,
>>
>> I
>> was performing a hyperGTest for genes in homo-sapiens. For a set of
>> input genes, this function returns some 'significant' GO terms. What I
>> wanted to now do was to co-relate each significant GO term (returned by
>> this function) with genes (from my set of input genes) associated with
>> that GO term. However, I think that I may be using the wrong
>> package/function to get the releveant set of genes.
>>
>> Currently, what I'm doing is finding the significant GO terms by using the following code:
>>
>> -----------------------
>> ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs
>>
>> paramsGO <- new("GOHyperGParams", geneIds = genes1,
>> universeGeneIds = allGenes, annotation = "org.Hs.eg.db",
>> ontology = "BP", pvalueCutoff = 1, conditional = FALSE,
>> testDirection = "over")
>>
>> GO <- hyperGTest(paramsGO)
>> --------------------------
>> This
>> gives me a set of significant GO terms. Now, I would like to find which
>> subset of genes in 'genes1' is associated with each of the significant
>> GO term. To do this I map all GO terms to their Entrez IDs using the
>> 'org.Hs.eg.db' package using the following:
>>
>> xx <- as.list(org.Hs.egGO2EG)
>>
>> to
>> get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms (isn't
>> this number small?) that map to at least one Entrez ID. So, from here I
>> look up which Entrez IDs are associated with my GO term of interest.
>>
>> My
>> problem is that often, the GO term from hyperGTest is not associated
>> with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described
>> above ), i.e. the GO term/ID is not in the list obtained from
>> 'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up by
>> hyperGTest, but does not appear to be associated with any Entrez IDs in
>> the org.Hs.eg.db package. Where could I be going wrong?
>>
> [[elided Yahoo spam]]
>> Thanks for any comments/suggestions. I realize that I'm probably doing something really stupid here....
>>
>> My sessionInfo() is:
>> --------------------------------
>> R version 2.7.2 (2008-08-25)
>> i386-pc-mingw32
>>
>> locale:
>> LC_COLLATE=English_United
>> States.1252;LC_CTYPE=English_United
>> States.1252;LC_MONETARY=English_United
>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] grid splines tools stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1]
>> gplots_2.6.0 gmodels_2.14.1 gtools_2.4.0
>> gdata_2.4.1 Rgraphviz_1.18.1 GOstats_2.6.0
>> Category_2.6.0
>> [8] RBGL_1.16.0 annotate_1.18.0
>> xtable_1.5-2 graph_1.18.0 PFAM.db_2.2.0
>> GO.db_2.2.0 KEGG.db_2.2.0
>> [15] org.Hs.eg.db_2.2.0 AnnotationDbi_1.2.0 RSQLite_0.6-8 DBI_0.2-4 genefilter_1.20.0 survival_2.34-1 affy_1.18.0
>> [22] preprocessCore_1.2.0 affyio_1.8.0 Biobase_2.0.0
>>
>> loaded via a namespace (and not attached):
>> [1] cluster_1.11.11 MASS_7.2-44
>>
>>
>> ---------------------------------
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662
------------------------------
Message: 22
Date: Wed, 22 Oct 2008 10:48:02 -0700
From: Patrick Aboyoun <paboyoun at fhcrc.org>
Subject: Re: [BioC] Bioconductor installation problem: unable to
access repository
To: Shinichiro Wachi <swachi at ucdavis.edu>
Cc: bioconductor at stat.math.ethz.ch
Message-ID: <48FF6752.1020901 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Shin,
I appears that R 2.8 has changed the way it regulates Mac OS X binary packages for users running Mac OS X 10.5 (Leopard). We have just become aware of this change and will be adjusting the Bioconductor Mac OS X repositories accordingly over the next few days to adjust to these changes. The good news is that R 2.8 supports binary packages for both Mac OS X 10.4 (Tiger) and Mac OS X 10.5 (Leopard). I'll send out an e-mail to this group when the Mac OS X 10.5 (Leopard) packages are available for BioC 2.3.
Patrick
Shinichiro Wachi wrote:
> I am installing Bioconductor on a new machine (Intel Mac running OSX 10.5.5), R
> version is 2.8.0.
>
> sudo R is used for this session.
>
> These are the error messages I get:
>
> source("http://bioconductor.org/biocLite.R")
> biocinstall()
>
> Running biocinstall version 2.3.8 with R version 2.8.0 (under development)
> Your version of R requires version 2.3 of Bioconductor.
> Warning: unable to access index for repository
> http://bioconductor.org/packages/2.3/bioc/bin/macosx//contrib/2.8
> Warning: unable to access index for repository
> http://bioconductor.org/packages/2.3/data/annotation/bin/macosx//contrib/2.8
> Warning: unable to access index for repository
> http://bioconductor.org/packages/2.3/data/experiment/bin/macosx//contrib/2.8
> Warning: unable to access index for repository
> http://bioconductor.org/packages/2.3/extra/bin/macosx//contrib/2.8
> Warning: unable to access index for repository
> http://cran.fhcrc.org/bin/macosx//contrib/2.8
> Warning message:
> package 'Biobase' is not available
>
> http://bioconductor.org/packages/2.3/bioc/src/contrib/PACKAGES
> will produce a web page displaying packages. (I read a post that asked for this.
> I am assuming this was a quick server diagnostics).
>
> Is there a problem with the server, or is this a temporary glitch in the
> installer? Is there a workaround?
>
> Much thanks.
>
> Shin
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
------------------------------
Message: 23
Date: Wed, 22 Oct 2008 13:36:30 -0700
From: Patrick Aboyoun <paboyoun at fhcrc.org>
Subject: [BioC] Bioconductor 2.3 is released
To: BioC list <bioconductor at stat.math.ethz.ch>
Message-ID: <48FF8ECE.4060201 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Bioconductors:
We are pleased to announce the release of Bioconductor 2.3. This release
includes 36 new software packages, and many changes to existing
packages. Bioconductor 2.3 is comprised of 294 software packages and is
compatible with the recently released R 2.8.0.
Please visit http://bioconductor.org for details and downloads.
IMPORTANT NOTE FOR MAC USERS: R 2.8.0 is using a new Mac OS X binary
package distribution system and the CRAN and BioC repositories need to
catch up with this change. If you are using Mac OS X, please refrain
from migrating to R-2.8.0 until these new binary package repositories
are put in place, or use 'type="source"' when installing packages using
biocLite.
Contents
========
o Release Highlight
o Getting Started with Bioconductor 2.3
o New Software Packages
o Software Packages in 2.2 that didn't make it to 2.3
Release Highlight
=================
This release contains a septet of packages (BSgenome, Biostrings,
ShortRead, IRanges, HilbertVis, HilbertVisGUI, and rtracklayer) that are
suited to analyze 'next generation' high-throughput DNA sequence data.
The BSgenome package provides the backbone for representing genome
sequences from many different organisms including human, mouse, rat,
dog, chimp, chicken, cow, fruit fly, honey bee, yeast, E. coli, C.
elegans, and arabidopsis. The Biostrings package performs fast or
flexible alignments between reads and genomes. The ShortRead package
provides tools for importation/exportation and quality assurance of
common data formats. The IRanges package offers an emerging
infrastructure for managing very large data objects and for range-based
data representation. The packages HilbertVis and HilbertVisGUI display
data with space-filling (Hilbert) curves that maintain the spatial
information implied by the linearity of chromosomes. The rtracklayer
package provides an interface to genome browsers and their annotation
tracks.
Getting Started with Bioconductor 2.3
=====================================
IMPORTANT: MAC USERS: see the important note above.
To install Bioconductor 2.3
1. Install R 2.8.0. Bioconductor 2.3 has been designed expressly for
this version of R.
2. Follow the instructions here:
http://bioconductor.org/docs/install
Please visit http://bioconductor.org for details and downloads.
New Packages
============
The following packages are new in this release of Bioconductor; visit
http://bioconductor.org/packages/release/Software.html
for links to all package descriptions.
affyContam
Structured corruption of cel file data to demonstrate QA effectiveness
Agi4x44PreProcess
Preprocesses Agilent 4x44 array data
ArrayExpress
Accesses the ArrayExpress microarray database at EBI
arrayMvout
Analyzes AffyBatch instances
ArrayTools
Quality assessment and differentially gene expression detection for
Affymetrix GeneChips
BicARE
Biclustering Analysis and Results Exploration
CGHbase
Base functions and classes for arrayCGH data analysis
CGHregions
Dimension reduction for arrayCGH data with minimal information loss
ChemmineR
Compound Data Mining Framework
CMA
Synthesis of microarray-based classification
DFP
Supervised technique for identifying differentially expressed genes
using Fuzzy Patterns (FPs).
domainsignatures
Finds significantly enriched gene classifications based on their
InterPro domain structure
dualKS
Training and classifying gene expression data sets using a
Kolmogorov-Smirnov rank-sum based algorithm
edgeR
Estimates and tests for differential expression in multiple digital
gene expression libraries
HELP
Pipeline for analyzing HELP microarray data that includes graphical and
mathematical tools
HilbertVis
Functions to visualize long vectors of integer data by means of Hilbert
curves
HilbertVisGUI
An interactive tool to visualize long vectors of integer data by means
of Hilbert curves
IRanges
Infrastructure for managing large data objects and range-based data
representations
ITALICS
Normalizes Affymetrix GeneChip Human Mapping 100K and 500K set
iterativeBMA
Bayesian Model Averaging (BMA) of classification models of 2-class
microarray samples
iterativeBMAsurv
Uses Bayesian Model Averaging (BMA) of survival analysis models of
microarray data
KCsmart
Multi-sample aCGH analysis package using kernel convolution
logitT
Implements the Logit-t algorithm
LPEadj
Extends the LPE algorithm
MEDME
Determines absolute and relative DNA methylation scores from MeDIP
enrichment measurements
miRNApath
Provides pathway enrichment techniques for miRNA expression data
microRNA
Accesses different data resources for microRNAs
minet
Implements methods for inferring mutual information networks from data.
multiscan
Estimates gene expressions from several laser scans of the same microarray
parody
Provides routines for univariate and multivariate outlier detection
PLPE
Performs tests for paired high-throughput data
RNAither
Analyzes cell-based RNAi screens
RpsiXML
Queries, data structure and interface to visualization of interaction
datasets
SIM
Finds associations between DNA copy number and gene expression
ShortRead
Representation of high-throughput, short-read sequencing data
xmapbridge
Plots graphs in the X:Map genome browser
Software Packages in 2.2 that didn't make it to 2.3
===================================================
1. SemSim
2. widgetInvoke
[[elided Yahoo spam]]
The Biocore Team
------------------------------
Message: 24
Date: Wed, 22 Oct 2008 16:04:53 -0500
From: Jenny Drnevich <drnevich at illinois.edu>
Subject: Re: [BioC] How to save result from limma
To: Gordon K Smyth <smyth at wehi.EDU.AU>
Cc: Bioconductor mailing list <bioconductor at stat.math.ethz.ch>
Message-ID: <200810222104.m9ML4ss3012169 at expredir5.cites.uiuc.edu>
Content-Type: text/plain; charset="us-ascii"; format=flowed
Hi Gordon,
I just downloaded the new R 2.8.0 release and limma 2.16.0. I was
checking out the new sort="none" option in topTable and I found an
error in the help page for ?topTable. It doesn't list sort.by="none"
as a possibility for topTable or toptable, but does list it for
topTableF. However, it's actually the reverse; sort.by="none" works
when using topTable with only 1 coef but not with more than 1 coef.
Just thought I'd let you and the archives know...
Thanks again!
Jenny
At 07:15 PM 8/19/2008, Gordon K Smyth wrote:
>On Tue, 19 Aug 2008, Jenny Drnevich wrote:
>
>>At 06:14 PM 8/13/2008, Gordon K Smyth wrote:
>>>OK, I've added sort="none" to the possibilities.
>>>Best wishes
>>>Gordon
>>
>>Hi Gordon,
>>
>>Should this change to topTable be up on BioC by now? I just updated
>>my packages on R 2.7.1 and the latest limma_2.14.5 does not have
>>it. Neither does the developmental version limma_2.15.10 on R 2.8.0
>>dev. Usually your changes appear very quickly...
>>
[[elided Yahoo spam]]
>>Jenny
>
>Not committed to BioC yet. I'm getting older and slower. Also,
>there will be a number of code additions in my next commit to BioC,
>which I'm still finalising.
>
>Best wishes
>Gordon
Jenny Drnevich, Ph.D.
Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign
330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA
ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at illinois.edu
------------------------------
Message: 25
Date: Wed, 22 Oct 2008 17:39:24 -0400
From: "Hui-Yi Chu" <huiyi.chu at gmail.com>
Subject: [BioC] scale questions
To: bioconductor at stat.math.ethz.ch
Message-ID:
<aaeddd3f0810221439r19fe9f56rc06ffc812fa8405e at mail.gmail.com>
Content-Type: text/plain
Dear List,
I think this may be a simple question for you but I wanna make it sure for
further steps.
I have already done some of data pre-processing procedures for my affymetrix
yeast2 arrays. My next step is to get *ratios* from various conditions in wt
and mutant following by fold-change comparison. So my question is which step
I should scale my dataframe for comparison?
Here are parts of my codes (codes with underline are the questions):
wt.pt.f1 <- exprs(esetsub[, 1])- exprs(esetsub[, 17])
wt.pt.f2 <- exprs(esetsub[, 2])- exprs(esetsub[, 18])
wt.pt.f <- cbind(wt.pt.f1, wt.pt.f2)
wt.pt.f <- new("ExpressionSet", exprs= as.matrix(wt.pt.f))
*wt.pt.f <- scale(exprs(wt.pt.f)) ### not sure
*mut.pt.f1 <- exprs(esetsub[, 9])- exprs(esetsub[, 21])
mut.pt.f2 <- exprs(esetsub[, 10])- exprs(esetsub[, 22])
mut.pt.f <- cbind(mut.pt.f1, mut.pt.f2)
mut.pt.f <- new("ExpressionSet", exprs= as.matrix(mut.pt.f))
*mut.pt.f <- scale(exprs(mut.pt.f)) **### not sure*
gg <- cbind(wt.pt.f, mut.pt.f)
*gg <- scale(gg)* *### not sure*
pt.f1 <- gg[,3]-gg[,1]
pt.f2 <- gg[,4]-gg[,2]
gg1 <- cbind(gg, pt.f1, pt.f2)
gg2 <- new("ExpressionSet", exprs=as.matrix(gg))
....... followed by further steps
As seen above, where should I scale the ratios? Scale wt and mut separately
or together before getting pt.f1 and pt.f2 ratios??
[[elided Yahoo spam]]
Hui-Yi
[[alternative HTML version deleted]]
------------------------------
Message: 26
Date: Wed, 22 Oct 2008 15:09:08 -0700
From: Florian Hahne <fhahne at fhcrc.org>
Subject: Re: [BioC] [Fwd: batch info for cellHTS]
To: Yan Zhou <Yan.Zhou at fccc.edu>
Cc: Bioconductor_help <bioconductor at stat.math.ethz.ch>
Message-ID: <48FFA484.7050405 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi Yan,
the idea was that the user constructs the batch array manually and
assigns it to the batch slot using the accessor method, e.g.
bt <- array(rep(1:2, each=5, dim=c(5,2,1))
batch(x) <- bt
I agree that it would be useful to have that functionality directly in
the import functions. The only natural place where the batch information
could go is in the platelist file. In the plateconf file, we don't have
the notion of plate replicates or samples any more.
Attached you find a slightly modified readPlateList function which
evaluates an (optional) "Batch" column in the platelist file.
Hope that is the functionality you where looking for. Please let me know
if you have further input.
Thanks,
Florian
PS: I forwarded this conversation to the mailing list. Others might
benefit from it...
readPlateList <- function(filename,
path=dirname(filename),
name,
importFun,
verbose=interactive())
{
file <- basename(filename)
dfiles <- dir(path)
if(!(is.character(path)&&length(path)==1))
stop("'path' must be character of length 1")
pd <- read.table(file.path(path, file), sep="\t", header=TRUE,
as.is=TRUE)
checkColumns(pd, file, mandatory=c("Filename", "Plate", "Replicate"),
numeric=c("Plate", "Replicate", "Channel", "Batch"))
## consistency check for "importFun"
if (!missing(importFun)) {
if (!is(importFun, "function"))
stop("'importFun' should be a function to use to read the
raw data files")
} else {
## default function (compatible with the file format of the
plate reader)
importFun <- function(f) {
txt <- readLines(f, warn=FALSE)
sp <- strsplit(txt, "\t")
well <- sapply(sp, "[", 2)
val <- sapply(sp, "[", 3)
out <- list(data.frame(well=I(well), val=as.numeric(val)),
txt=I(txt))
return(out)
}
}
## check if the data files are in the given directory
a <- unlist(sapply(pd$Filename, function(z) grep(z, dfiles,
ignore.case=TRUE)))
if (length(a)==0)
stop(sprintf("None of the files were found in the given 'path':
%s", path))
f <- file.path(path, dfiles[a])
## check if 'importFun' gives the output in the desired form
aux <- importFun(f[1])
if (which(unlist(lapply(aux, is, "data.frame"))) != 1 |
!all(c("val", "well") %in% names(aux[[1]])) | length(aux)!=2)
stop("The output of 'importFun' must be a list with 2
components;\n",
"the first component should be a 'data.frame' with slots
'well' and 'val'.")
## auto-determine the plate format
well <- as.character(importFun(f[1])[[1]]$well)
let <- substr(well, 1, 1)
num <- substr(well, 2, 3)
let <- match(let, LETTERS)
num <- as.integer(num)
if(any(is.na(let))||any(is.na(num)))
stop(sprintf("Malformated column 'well' in input file %s", f[1]))
dimPlate <- c(nrow=max(let), ncol=max(num))
nrWell <- prod(dimPlate)
if(verbose)
cat(sprintf("%s: found data in %d x %d (%d well) format.\n", name,
dimPlate[1], dimPlate[2], nrWell))
## Should we check whether these are true?
## "96" = c(nrow=8, ncol=12),
## "384" = c(nrow=16, ncol=24),
nrRep <- max(pd$Replicate)
nrPlate <- max(pd$Plate)
combo <- paste(pd$Plate, pd$Replicate)
## Channel: if not given, this implies that there is just one
if("Channel" %in% colnames(pd)) {
nrChannel <- max(pd$Channel)
channel <- pd$Channel
combo <- paste(combo, pd$Channel)
} else {
nrChannel <- 1L
channel <- rep(1L, nrow(pd))
pd$Channel <- channel
}
whDup <- which(duplicated(combo))
if(length(whDup)>0L) {
idx <- whDup[1:min(5L, length(whDup))]
msg <- paste("The following rows are duplicated in the plateList
table:\n",
"\tPlate Replicate Channel\n", "\t",
paste(idx, combo[idx], sep="\t", collapse="\n\t"),
if(length(whDup)>5) sprintf("\n\t...and %d more.\n",
length(whDup)-5), "\n",
sep="")
stop(msg)
}
xraw <- array(NA_real_, dim=c(nrWell, nrPlate, nrRep, nrChannel))
intensityFiles <- vector(mode="list", length=nrow(pd))
names(intensityFiles) <- pd[, "Filename"]
status <- character(nrow(pd))
for(i in seq_len(nrow(pd))) {
if(verbose)
cat("\rReading ", i, ": ", pd$Filename[i], sep="")
ff <- grep(pd[i, "Filename"], dfiles, ignore.case=TRUE)
if (length(ff)!=1) {
f <- file.path(path, pd[i, "Filename"])
status[i] <- sprintf("File not found: %s", f)
} else {
f <- file.path(path, dfiles[ff])
names(intensityFiles)[i] <- dfiles[ff]
status[i] <- tryCatch({
out <- importFun(f)
pos <- convertWellCoordinates(out[[1]]$well, dimPlate)$num
intensityFiles[[i]] <- out[[2]]
xraw[pos, pd$Plate[i], pd$Replicate[i], channel[i]] <-
out[[1]]$val
"OK"
}, warning=function(e) paste(class(e)[1], e$message, sep=": "),
error=function(e) paste(class(e)[1],
e$message, sep=": ")
) ## tryCatch
} ## else
} ## for
if(verbose)
cat("\rRead", nrow(pd), "plates. \n\n")
## ---- Store the data as a "cellHTS" object ----
## arrange the assayData slot:
dat <- lapply(seq_len(nrChannel), function(ch)
matrix(xraw[,,,ch], ncol=nrRep, nrow=nrWell*nrPlate))
names(dat) <- paste("ch", seq_len(nrChannel), sep="")
adata <- do.call("assayDataNew", c(storage.mode="lockedEnvironment",
dat))
## arrange the phenoData slot:
pdata <- new("AnnotatedDataFrame",
data <- data.frame(replicate=seq_len(nrRep),
assay=rep(name, nrRep),
stringsAsFactors=FALSE),
varMetadata=data.frame(labelDescription=c("Replicate
number",
"Biological
assay"),
channel=factor(rep("_ALL_", 2L),
levels=c(names(dat), "_ALL_")),
row.names=c("replicate", "assay"),
stringsAsFactors=FALSE))
## arrange the featureData slot:
well <- convertWellCoordinates(seq_len(nrWell), pdim=dimPlate)$letnum
fdata <- new("AnnotatedDataFrame",
data <- data.frame(plate=rep(seq_len(nrPlate),
each=nrWell),
well=rep(well, nrPlate),
controlStatus=factor(rep("unknown",
nrWell*nrPlate)),
stringsAsFactors=FALSE),
varMetadata=data.frame(labelDescription=c("Plate
number", "Well ID",
"Well
annotation"),
row.names=c("plate", "well",
"controlStatus"),
stringsAsFactors=FALSE))
res <- new("cellHTS",
assayData=adata,
phenoData=pdata,
featureData=fdata,
plateList=cbind(pd[,1L,drop=FALSE], status=I(status),
pd[,-1L,drop=FALSE]),
intensityFiles=intensityFiles)
## if there is a batch column in the platelist file we want to import it
if("Batch" %in% colnames(pd)){
bat <- pd$Batch[order(pd$Replicate, pd$Channel)]
dim(bat) <- c(max(plate(res)), ncol(res), length(channelNames(res)))
res at batch <- bat
}
## output the possible errors that were encountered along the way:
whHadProbs <- which(status!="OK")
if(length(whHadProbs)>0 & verbose) {
idx <- whHadProbs[1:min(5, length(whHadProbs))]
msg <- paste("Please check the following problems encountered
while reading the data:\n",
"\tFilename \t Error\n", "\t",
paste(plateList(res)$Filename[idx], status[idx],
sep="\t", collapse="\n\t"),
if(length(whHadProbs)>5) sprintf("\n\t...and %d
more.\n",
length(whHadProbs)-5), "\n", sep="")
stop(msg)
}
return(res)
}
Yan Zhou wrote:
> Dear Florian,
>
> I understand the different meaning of batch from the 2 cellHTS
> packages. I just don't know how to add the "batch" slot.(we have a
> large screen with screen done on different day,we want to use the
> variance adjustment by batch function). I tried to add a "batch"
> column in the plate configuration file, but doesn't seem to be taken
> into cellHTS2 object. then I tried to add "batch" column in the plate
> list file , still didn't do anything. In another word, I couldn't
> figure out how to add the batch slot to the cellHTS object. please
[[elided Yahoo spam]]
>
> yan
>
> Florian Hahne wrote:
>
>> Hi Yan,
>> Ligia forwarded your mail to me.
>>
>> The batch concept is a little bit different in cellHTS2. Basically,
>> we separated the ability to included several plate configurations and
>> the batch-specific parameter estimation (e.g. in the normalizePlate
>> function). For the former, you can use a regular expression syntax in
>> your plate configuration file, while the latter has to be added
>> manually into the new 'batch' slot. All of this is explained in much
>> more detail in the package vignette: cellHTS2 - Main vignette
>> (complete version): End-to-end analysis of cell-based screens
>>
>> Let me know if this doesn't get you anywhere.
>>
>> Florian
>>
>> ligia at ebi.ac.uk wrote:
>>
>>> Hi Florian,
>>>
>>> Hope you're doing fine.
>>> Could you take care of this for me?
>>> Cheers,
>>> Ligia
>>>
>>>
>>> ------------------------- Original Message ----------------------------
>>> Subject: batch info for cellHTS
>>> From: "Yan Zhou" <Yan.Zhou at fccc.edu>
>>> Date: Fri, October 17, 2008 19:14
>>> To: ligia at ebi.ac.uk
>>> --------------------------------------------------------------------------
>>>
>>>
>>> Dear Ligia,
>>>
>>> I'm using the cellHTS2 package for HTS data analysis. For the old
>>> cellHTS package, I knew how to incorporate the batch information. But
>>> for the new cellHTS2 package, I couldn't have it done right. I attached
>>> my plate configuration file and also plate list file with this email. I
>>> was wondering whether you would have time to help me take a look, and
>>> point me to the right directions. Thanks a lot for your time and
>>> kind help.
>>>
>>> yan
>>>
>>
>>
>>
>
--
Florian Hahne, PhD
Computational Biology Program
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-3148
fhahne at fhcrc.org
------------------------------
Message: 27
Date: Wed, 22 Oct 2008 19:19:26 -0400
From: "Mark Kimpel" <mwkimpel at gmail.com>
Subject: [BioC] problem with Category package and custom annotationDbi
To: "Bioconductor Newsgroup" <bioconductor at stat.math.ethz.ch>
Message-ID:
<6b93d1830810221619x7ad5a565hf08bf287839f74f0 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Today I believe I successfuly built a annotation package for my Affy
Rat Gene ST data using annotationDbi, at least I got no errors during
the build and it loads properly. I get the following error output,
however, when I try to run hyperGTest, package Category, on a vector
of Entrez Gene IDs and a vector of the gene universe of the chip. I
suspect I did something wrong when building the annotation package,
but I have no clue what that could be. I've used this same code with
chipsets whose annotation packages are built by the BioConductor team
without issue.
> params <- new("GOHyperGParams", geneIds = myEGs,
+ universeGeneIds = myGeneUniverse,
+ annotation = annotation(AOP$eSet),
+ ontology = "BP", pvalueCutoff = 0.05, conditional
= TRUE, testDirection = "over")
> params
A GOHyperGParams instance
category: GO
annotation: ragene10stv1
> hyperGTest(params)
Error in getUniverseHelper(probes, datPkg, entrezIds) :
No Entrez Gene ids left in universe
Enter a frame number, or 0 to exit
1: hyperGTest(params)
2: .valueClassTest(standardGeneric("hyperGTest"), "HyperGResultBase", "hyperGT
3: is(object, Cl)
4: is(object, Cl)
5: universeBuilder(p)
6: universeBuilder(p)
7: getUniverseViaGo(p)
8: getUniverseHelper(probes, datPkg, entrezIds)
Selection: 8
Called from: eval(expr, envir, enclos)
Browse[1]> ls()
[1] "datPkg" "entrezIds" "probes" "univ"
Browse[1]> datPkg
An object of class "ArabadopsisDatPkg"
Slot "name":
[1] "ragene10stv1"
Browse[1]> entrezIds[1:5]
[1] "65049" "60444" "313914" "140941" "306868"
Browse[1]> probes[1:5]
[1] "10701636" "10701643" "10701654" "10701663" "10701679"
Browse[1]> univ
character(0)
Browse[1]>
------------------------------------------------------------
Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine
15032 Hunter Court, Westfield, IN 46074
(317) 490-5129 Work, & Mobile & VoiceMail
(317) 399-1219 Home
Skype: mkimpel
"The real problem is not whether machines think but whether men do."
-- B. F. Skinner
------------------------------
Message: 28
Date: Wed, 22 Oct 2008 16:41:38 -0700
From: Marc Carlson <mcarlson at fhcrc.org>
Subject: Re: [BioC] problem with Category package and custom
annotationDbi
To: Mark Kimpel <mwkimpel at gmail.com>
Cc: Bioconductor Newsgroup <bioconductor at stat.math.ethz.ch>
Message-ID: <48FFBA32.3030406 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1
Hi Mark,
If the package you built and installed were called "MarkPackage.db",
then what would be the output of "MarkPackage()" (right after you loaded
it)? Also, what is the output of sessionInfo()?
Marc
Mark Kimpel wrote:
> Today I believe I successfuly built a annotation package for my Affy
> Rat Gene ST data using annotationDbi, at least I got no errors during
> the build and it loads properly. I get the following error output,
> however, when I try to run hyperGTest, package Category, on a vector
> of Entrez Gene IDs and a vector of the gene universe of the chip. I
> suspect I did something wrong when building the annotation package,
> but I have no clue what that could be. I've used this same code with
> chipsets whose annotation packages are built by the BioConductor team
> without issue.
>
>
>> params <- new("GOHyperGParams", geneIds = myEGs,
>>
> + universeGeneIds = myGeneUniverse,
> + annotation = annotation(AOP$eSet),
> + ontology = "BP", pvalueCutoff = 0.05, conditional
> = TRUE, testDirection = "over")
>
>> params
>>
> A GOHyperGParams instance
> category: GO
> annotation: ragene10stv1
>
>> hyperGTest(params)
>>
> Error in getUniverseHelper(probes, datPkg, entrezIds) :
> No Entrez Gene ids left in universe
>
> Enter a frame number, or 0 to exit
>
> 1: hyperGTest(params)
> 2: .valueClassTest(standardGeneric("hyperGTest"), "HyperGResultBase", "hyperGT
> 3: is(object, Cl)
> 4: is(object, Cl)
> 5: universeBuilder(p)
> 6: universeBuilder(p)
> 7: getUniverseViaGo(p)
> 8: getUniverseHelper(probes, datPkg, entrezIds)
>
> Selection: 8
> Called from: eval(expr, envir, enclos)
> Browse[1]> ls()
> [1] "datPkg" "entrezIds" "probes" "univ"
> Browse[1]> datPkg
> An object of class "ArabadopsisDatPkg"
> Slot "name":
> [1] "ragene10stv1"
>
> Browse[1]> entrezIds[1:5]
> [1] "65049" "60444" "313914" "140941" "306868"
> Browse[1]> probes[1:5]
> [1] "10701636" "10701643" "10701654" "10701663" "10701679"
> Browse[1]> univ
> character(0)
> Browse[1]>
>
> ------------------------------------------------------------
> Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
> Indiana University School of Medicine
>
> 15032 Hunter Court, Westfield, IN 46074
>
> (317) 490-5129 Work, & Mobile & VoiceMail
> (317) 399-1219 Home
> Skype: mkimpel
>
> "The real problem is not whether machines think but whether men do."
> -- B. F. Skinner
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
------------------------------
Message: 29
Date: Wed, 22 Oct 2008 19:51:17 -0400
From: "Sean Davis" <sdavis2 at mail.nih.gov>
Subject: Re: [BioC] scale questions
To: "Hui-Yi Chu" <huiyi.chu at gmail.com>
Cc: bioconductor at stat.math.ethz.ch
Message-ID:
<264855a00810221651n656b5a8ak226daa013f282fab at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
On Wed, Oct 22, 2008 at 5:39 PM, Hui-Yi Chu <huiyi.chu at gmail.com> wrote:
> Dear List,
>
> I think this may be a simple question for you but I wanna make it sure for
> further steps.
> I have already done some of data pre-processing procedures for my affymetrix
> yeast2 arrays. My next step is to get *ratios* from various conditions in wt
> and mutant following by fold-change comparison. So my question is which step
> I should scale my dataframe for comparison?
>
> Here are parts of my codes (codes with underline are the questions):
>
> wt.pt.f1 <- exprs(esetsub[, 1])- exprs(esetsub[, 17])
> wt.pt.f2 <- exprs(esetsub[, 2])- exprs(esetsub[, 18])
> wt.pt.f <- cbind(wt.pt.f1, wt.pt.f2)
> wt.pt.f <- new("ExpressionSet", exprs= as.matrix(wt.pt.f))
> *wt.pt.f <- scale(exprs(wt.pt.f)) ### not sure
>
> *mut.pt.f1 <- exprs(esetsub[, 9])- exprs(esetsub[, 21])
> mut.pt.f2 <- exprs(esetsub[, 10])- exprs(esetsub[, 22])
> mut.pt.f <- cbind(mut.pt.f1, mut.pt.f2)
> mut.pt.f <- new("ExpressionSet", exprs= as.matrix(mut.pt.f))
> *mut.pt.f <- scale(exprs(mut.pt.f)) **### not sure*
>
> gg <- cbind(wt.pt.f, mut.pt.f)
> *gg <- scale(gg)* *### not sure*
> pt.f1 <- gg[,3]-gg[,1]
> pt.f2 <- gg[,4]-gg[,2]
> gg1 <- cbind(gg, pt.f1, pt.f2)
> gg2 <- new("ExpressionSet", exprs=as.matrix(gg))
> ....... followed by further steps
>
>
> As seen above, where should I scale the ratios? Scale wt and mut separately
> or together before getting pt.f1 and pt.f2 ratios??
[[elided Yahoo spam]]
I may be misunderstanding, but why do you want to scale the log
ratios? Generally, you would not scale them at all. And I really
cannot tell why you are forming ratios in the first place? Do you
have replicates of any kind?
Sean
------------------------------
Message: 30
Date: Wed, 22 Oct 2008 20:31:33 -0400
From: "Sean Davis" <sdavis2 at mail.nih.gov>
Subject: Re: [BioC] scale questions
To: "Hui-Yi Chu" <huiyi.chu at gmail.com>
Cc: bioconductor at stat.math.ethz.ch
Message-ID:
<264855a00810221731p1abc243cj1a59dc6726ebc7e0 at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
On Wed, Oct 22, 2008 at 8:16 PM, Hui-Yi Chu <huiyi.chu at gmail.com> wrote:
> Hi Sean,
>
> Yes, they are replicate in ratios. In other words, these ratios are from
> untreatment and treatment of wt1, wt2, mut1,mut2, so total is 8 ratios as
> below.
>
> untreatmen wt1, wt2, mut1, mut2
> treatment wt1, wt2, mut1, mut2
>
> And I wanna get the second values like:
> r1: untreatment mut1/wt1
> r2: untreatment mut2/wt2
> r3: treatment mut1/wt1
> r4: treatment mut2/wt2
>
> Now we have 4 ratios. And then I will compare r3 vs r1, r4 vs r2 to get most
> fold changes genes. (I know I need three replicates, but I cannot convince
> my adviser, therefore, this is the strategy I can use so far. ) So the
> question is should I scale the 8 ratios from wt and mut separately (twice)
> or together(once) before I get the four values?
Hi, Hui-Yi.
You should not do any scaling. You can use the log fold changes
directly. You seem to understand that this is not an optimal design,
so I won't belabor that point.
Sean
>
> On Wed, Oct 22, 2008 at 7:51 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>>
>> On Wed, Oct 22, 2008 at 5:39 PM, Hui-Yi Chu <huiyi.chu at gmail.com> wrote:
>> > Dear List,
>> >
>> > I think this may be a simple question for you but I wanna make it sure
>> > for
>> > further steps.
>> > I have already done some of data pre-processing procedures for my
>> > affymetrix
>> > yeast2 arrays. My next step is to get *ratios* from various conditions
>> > in wt
>> > and mutant following by fold-change comparison. So my question is which
>> > step
>> > I should scale my dataframe for comparison?
>> >
>> > Here are parts of my codes (codes with underline are the questions):
>> >
>> > wt.pt.f1 <- exprs(esetsub[, 1])- exprs(esetsub[, 17])
>> > wt.pt.f2 <- exprs(esetsub[, 2])- exprs(esetsub[, 18])
>> > wt.pt.f <- cbind(wt.pt.f1, wt.pt.f2)
>> > wt.pt.f <- new("ExpressionSet", exprs= as.matrix(wt.pt.f))
>> > *wt.pt.f <- scale(exprs(wt.pt.f)) ### not sure
>> >
>> > *mut.pt.f1 <- exprs(esetsub[, 9])- exprs(esetsub[, 21])
>> > mut.pt.f2 <- exprs(esetsub[, 10])- exprs(esetsub[, 22])
>> > mut.pt.f <- cbind(mut.pt.f1, mut.pt.f2)
>> > mut.pt.f <- new("ExpressionSet", exprs= as.matrix(mut.pt.f))
>> > *mut.pt.f <- scale(exprs(mut.pt.f)) **### not sure*
>> >
>> > gg <- cbind(wt.pt.f, mut.pt.f)
>> > *gg <- scale(gg)* *### not sure*
>> > pt.f1 <- gg[,3]-gg[,1]
>> > pt.f2 <- gg[,4]-gg[,2]
>> > gg1 <- cbind(gg, pt.f1, pt.f2)
>> > gg2 <- new("ExpressionSet", exprs=as.matrix(gg))
>> > ....... followed by further steps
>> >
>> >
>> > As seen above, where should I scale the ratios? Scale wt and mut
>> > separately
>> > or together before getting pt.f1 and pt.f2 ratios??
[[elided Yahoo spam]]
>>
>> I may be misunderstanding, but why do you want to scale the log
>> ratios? Generally, you would not scale them at all. And I really
>> cannot tell why you are forming ratios in the first place? Do you
>> have replicates of any kind?
>>
>> Sean
>
>
------------------------------
Message: 31
Date: Wed, 22 Oct 2008 21:26:16 -0400
From: "Mark Kimpel" <mwkimpel at gmail.com>
Subject: Re: [BioC] problem with Category package and custom
annotationDbi
To: "Marc Carlson" <mcarlson at fhcrc.org>
Cc: Bioconductor Newsgroup <bioconductor at stat.math.ethz.ch>
Message-ID:
<6b93d1830810221826y71050c5ckb14dbca4644c9bf7 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
> ragene10stv1()
Quality control information for ragene10stv1:
This package has the following mappings:
ragene10stv1ACCNUM has 0 mapped keys (of 29214 keys)
ragene10stv1ALIAS2PROBE has 31410 mapped keys (of 31410 keys)
ragene10stv1CHR has 20998 mapped keys (of 29214 keys)
ragene10stv1CHRLENGTHS has 23 mapped keys (of 23 keys)
ragene10stv1CHRLOC has 13092 mapped keys (of 29214 keys)
ragene10stv1CHRLOCEND has 13092 mapped keys (of 29214 keys)
ragene10stv1ENSEMBL has 15812 mapped keys (of 29214 keys)
ragene10stv1ENSEMBL2PROBE has 14570 mapped keys (of 14570 keys)
ragene10stv1ENTREZID has 21170 mapped keys (of 29214 keys)
ragene10stv1ENZYME has 1586 mapped keys (of 29214 keys)
ragene10stv1ENZYME2PROBE has 705 mapped keys (of 705 keys)
ragene10stv1GENENAME has 21170 mapped keys (of 29214 keys)
ragene10stv1GO has 13813 mapped keys (of 29214 keys)
ragene10stv1GO2ALLPROBES has 9850 mapped keys (of 9850 keys)
ragene10stv1GO2PROBE has 7430 mapped keys (of 7430 keys)
ragene10stv1MAP has 20351 mapped keys (of 29214 keys)
ragene10stv1PATH has 4357 mapped keys (of 29214 keys)
ragene10stv1PATH2PROBE has 206 mapped keys (of 206 keys)
ragene10stv1PFAM has 17187 mapped keys (of 29214 keys)
ragene10stv1PMID has 12198 mapped keys (of 29214 keys)
ragene10stv1PMID2PROBE has 36641 mapped keys (of 36641 keys)
ragene10stv1PROSITE has 17187 mapped keys (of 29214 keys)
ragene10stv1REFSEQ has 18683 mapped keys (of 29214 keys)
ragene10stv1SYMBOL has 21170 mapped keys (of 29214 keys)
ragene10stv1UNIGENE has 17585 mapped keys (of 29214 keys)
ragene10stv1UNIPROT has 9826 mapped keys (of 29214 keys)
Additional Information about this package:
DB schema: RATCHIP_DB
DB schema version: 1.0
Organism: Rattus norvegicus
Date for NCBI data: 2008-Sep2
Date for GO data: 200808
Date for KEGG data: 2008-Sep2
Date for Golden Path data: 2006-Jun20
Date for IPI data: 2008-Sep02
Date for Ensembl data: 2008-Jul23
> sessionInfo()
R version 2.8.0 (2008-10-20)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] tools stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] ragene10stv1.db_1.0.0 RSQLite_0.7-0 DBI_0.2-4
[4] AnnotationDbi_1.4.0 Biobase_2.2.0 graph_1.20.0
loaded via a namespace (and not attached):
[1] cluster_1.11.11
>
------------------------------------------------------------
Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine
15032 Hunter Court, Westfield, IN 46074
(317) 490-5129 Work, & Mobile & VoiceMail
(317) 399-1219 Home
Skype: mkimpel
"The real problem is not whether machines think but whether men do."
-- B. F. Skinner
******************************************************************
On Wed, Oct 22, 2008 at 7:41 PM, Marc Carlson <mcarlson at fhcrc.org> wrote:
> Hi Mark,
>
> If the package you built and installed were called "MarkPackage.db",
> then what would be the output of "MarkPackage()" (right after you loaded
> it)? Also, what is the output of sessionInfo()?
>
> Marc
>
>
>
>
> Mark Kimpel wrote:
>> Today I believe I successfuly built a annotation package for my Affy
>> Rat Gene ST data using annotationDbi, at least I got no errors during
>> the build and it loads properly. I get the following error output,
>> however, when I try to run hyperGTest, package Category, on a vector
>> of Entrez Gene IDs and a vector of the gene universe of the chip. I
>> suspect I did something wrong when building the annotation package,
>> but I have no clue what that could be. I've used this same code with
>> chipsets whose annotation packages are built by the BioConductor team
>> without issue.
>>
>>
>>> params <- new("GOHyperGParams", geneIds = myEGs,
>>>
>> + universeGeneIds = myGeneUniverse,
>> + annotation = annotation(AOP$eSet),
>> + ontology = "BP", pvalueCutoff = 0.05, conditional
>> = TRUE, testDirection = "over")
>>
>>> params
>>>
>> A GOHyperGParams instance
>> category: GO
>> annotation: ragene10stv1
>>
>>> hyperGTest(params)
>>>
>> Error in getUniverseHelper(probes, datPkg, entrezIds) :
>> No Entrez Gene ids left in universe
>>
>> Enter a frame number, or 0 to exit
>>
>> 1: hyperGTest(params)
>> 2: .valueClassTest(standardGeneric("hyperGTest"), "HyperGResultBase", "hyperGT
>> 3: is(object, Cl)
>> 4: is(object, Cl)
>> 5: universeBuilder(p)
>> 6: universeBuilder(p)
>> 7: getUniverseViaGo(p)
>> 8: getUniverseHelper(probes, datPkg, entrezIds)
>>
>> Selection: 8
>> Called from: eval(expr, envir, enclos)
>> Browse[1]> ls()
>> [1] "datPkg" "entrezIds" "probes" "univ"
>> Browse[1]> datPkg
>> An object of class "ArabadopsisDatPkg"
>> Slot "name":
>> [1] "ragene10stv1"
>>
>> Browse[1]> entrezIds[1:5]
>> [1] "65049" "60444" "313914" "140941" "306868"
>> Browse[1]> probes[1:5]
>> [1] "10701636" "10701643" "10701654" "10701663" "10701679"
>> Browse[1]> univ
>> character(0)
>> Browse[1]>
>>
>> ------------------------------------------------------------
>> Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
>> Indiana University School of Medicine
>>
>> 15032 Hunter Court, Westfield, IN 46074
>>
>> (317) 490-5129 Work, & Mobile & VoiceMail
>> (317) 399-1219 Home
>> Skype: mkimpel
>>
>> "The real problem is not whether machines think but whether men do."
>> -- B. F. Skinner
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>
>
------------------------------
Message: 32
Date: Thu, 23 Oct 2008 14:07:48 +1100 (AUS Eastern Daylight Time)
From: Gordon K Smyth <smyth at wehi.EDU.AU>
Subject: Re: [BioC] How to save result from limma
To: Jenny Drnevich <drnevich at illinois.edu>
Cc: Bioconductor mailing list <bioconductor at stat.math.ethz.ch>
Message-ID: <Pine.WNT.4.64.0810231405491.2800 at PC602.alpha.wehi.edu.au>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Dear Jenny,
Thanks for the heads-up. I believe I have fixed it entirely now in limma
2.16.2. Try it out and see if you can break it.
Best wishes
Gordon
On Wed, 22 Oct 2008, Jenny Drnevich wrote:
> Hi Gordon,
>
> I just downloaded the new R 2.8.0 release and limma 2.16.0. I was checking
> out the new sort="none" option in topTable and I found an error in the help
> page for ?topTable. It doesn't list sort.by="none" as a possibility for
> topTable or toptable, but does list it for topTableF. However, it's actually
> the reverse; sort.by="none" works when using topTable with only 1 coef but
> not with more than 1 coef. Just thought I'd let you and the archives know...
>
> Thanks again!
> Jenny
>
> At 07:15 PM 8/19/2008, Gordon K Smyth wrote:
>
>
>> On Tue, 19 Aug 2008, Jenny Drnevich wrote:
>>
>>> At 06:14 PM 8/13/2008, Gordon K Smyth wrote:
>>>> OK, I've added sort="none" to the possibilities.
>>>> Best wishes
>>>> Gordon
>>>
>>> Hi Gordon,
>>>
>>> Should this change to topTable be up on BioC by now? I just updated my
>>> packages on R 2.7.1 and the latest limma_2.14.5 does not have it. Neither
>>> does the developmental version limma_2.15.10 on R 2.8.0 dev. Usually your
>>> changes appear very quickly...
>>>
[[elided Yahoo spam]]
>>> Jenny
>>
>> Not committed to BioC yet. I'm getting older and slower. Also, there will
>> be a number of code additions in my next commit to BioC, which I'm still
>> finalising.
>>
>> Best wishes
>> Gordon
>
> Jenny Drnevich, Ph.D.
>
> Functional Genomics Bioinformatics Specialist
> W.M. Keck Center for Comparative and Functional Genomics
> Roy J. Carver Biotechnology Center
> University of Illinois, Urbana-Champaign
>
> 330 ERML
> 1201 W. Gregory Dr.
> Urbana, IL 61801
> USA
>
> ph: 217-244-7355
> fax: 217-265-5066
> e-mail: drnevich at illinois.edu
>
------------------------------
Message: 33
Date: Wed, 22 Oct 2008 22:28:59 -0500
From: "Wei,Caimiao" <caiwei at mdanderson.org>
Subject: [BioC] Package "xps" "import.expr.scheme" error
To: Bioconductor mailing list <bioconductor at stat.math.ethz.ch>
Message-ID:
<19D18D962A716B4EA5D26CB24F14DC26339D0DE943 at DCPWVMBXC1VS3.mdanderson.edu>
Content-Type: text/plain
I am importing chip definition and annotation files to create a ROOT scheme and get this error:
Error in import.expr.scheme(filename = "Scheme_HGU133p2_na26", filedir = scmdir, :
error in function 'ImportExprSchemes'
> libdir <- "/mypath/Affy/libraryfiles"
> anndir <- "/mypath/Affy/Annotation"
> scmdir <- "/mypath/CRAN/Workspaces/Schemes"
>
> scheme.hgu133p2.na26 <- import.expr.scheme(filename="Scheme_HGU133p2_na26",
+ filedir=scmdir,schemefile=paste(libdir,"HG-U133_Plus_2.cdf",sep="/"),
+ probefile=paste(libdir,"HG-U133_Plus_2.probe.tab",sep="/"),
+ annotfile=paste(anndir,"HG-U133_Plus_2.na26.annot.csv",sep="/"))
Error in import.expr.scheme(filename = "Scheme_HGU133p2_na26", filedir = scmdir, :
error in function 'ImportExprSchemes'
> sessionInfo()
R version 2.7.2 (2008-08-25)
i386-pc-mingw32
locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] xps_1.0.2
>
Thanks for any help!
Caimiao
[[alternative HTML version deleted]]
------------------------------
Message: 34
Date: Thu, 23 Oct 2008 02:49:47 -0400
From: Leon Peshkin <pesha at hms.harvard.edu>
Subject: Re: [BioC] Lumi and Beadstudio 1.5.13
To: <bioconductor at stat.math.ethz.ch>
Message-ID: <9EF526FD-9EA0-4489-83D1-6F3E9B93FAE9 at hms.harvard.edu>
Content-Type: text/plain; charset="US-ASCII"; format=flowed; delsp=yes
Hi Pan,
I was wondering if you could help me resolve the issue with lumi
package,
I am able to load Illumina data with lumiR, but then when I try
background adjustment
it fails:
A0 <- lumiR("killme.txt", convertNuID =FALSE, inputAnnotation =FALSE)
> B0 <- lumiB(A0,method='bgAdjust')
Error in `[.data.frame`(control, , sampleNames(lumiBatch)) :
undefined columns selected
-Leon
------------------------------
Message: 35
Date: Thu, 23 Oct 2008 10:43:18 +0200 (CEST)
From: Clara de Dessous Ch?ri <email at dessouscheri.emv1.com>
Subject: [BioC] Offre exceptionnelle suite au probl?me technique
To: <bioconductor at stat.math.ethz.ch>
Message-ID: <20276360648.3897244.1224751398367 at schr1>
Content-Type: text/plain
Pour etre sur de recevoir tous nos emails, nous vous conseillons d'ajouter
email at dessouscheri.emv1.com a votre carnet d'adresses.
Si cet email ne s'affiche pas correctement, vous pouvez le visualiser grace a ce
lien
Copyright 2008 - Copyright Dessous Cheri, Tous droits reserves.
Si vous ne souhaitez plus recevoir la newsletter de Dessous Cheri,utilisez
le lien de desabonnement
[[alternative HTML version deleted]]
------------------------------
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
End of Bioconductor Digest, Vol 68, Issue 23
More information about the Bioconductor
mailing list