[BioC] Bioconductor Digest, Vol 106, Issue 21

Thu Dec 22 17:25:28 CET 2011

-----Original Message-----
From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of bioconductor-request at r-project.org
Sent: Thursday, December 22, 2011 3:00 AM
To: bioconductor at r-project.org
Subject: Bioconductor Digest, Vol 106, Issue 21

Send Bioconductor mailing list submissions to
	bioconductor at r-project.org

To subscribe or unsubscribe via the World Wide Web, visit
	https://stat.ethz.ch/mailman/listinfo/bioconductor
or, via email, send a message with subject or body 'help' to
	bioconductor-request at r-project.org

You can reach the person managing the list at
	bioconductor-owner at r-project.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Bioconductor digest..."

Today's Topics:

   1. Re: problems with pd.genomewidesnp.6 (MacDonald, James)
   2. Re: Affymetrix Mouse Gene 1.0 ST - Number of probes
      (MacDonald, James)
   3. Limma question (asas asasa)
   4. Re: "reverse" a set of nucleotides: from reverse to direct
      sense (Jane Merlevede)

----------------------------------------------------------------------

Message: 1
Date: Wed, 21 Dec 2011 08:56:44 -0500
From: "MacDonald, James" <jmacdon at med.umich.edu>
To: Sebastian Thieme <thieme at mi.fu-berlin.de>
Cc: bioconductor at r-project.org
Subject: Re: [BioC] problems with pd.genomewidesnp.6
Message-ID: <4EF1E59C.9040901 at med.umich.edu>
Content-Type: text/plain; charset="iso-8859-1"; format="flowed"

Hi Sebastian,

On 12/20/11 6:01 PM, Sebastian Thieme wrote:
> Hi at all,
>
> I have some problems with the pd.genomewidesnp.6 package and I hope
> some one can help me. The info with
> get(objects("package:pd.genomewidesnp.6")) is
>
> #Class........: AffySNPCNVPDInfo
> #Manufacturer.: Affymetrix
> #Genome Build.: HG19
> #Chip Geometry: 2572 rows x  2680 columns
>
> I want match the man_festid of each prob to one gene, therefore I look
> in the gene_assoc part and call the gene with minimum distance to the
> respective prob as corresponding gene. My commands for get the raw
> informations are:
>
> snp.f<- dbGetQuery(con6, "select * from featureSet")
> snp.f<- snpfeature[,c("fsetid","man_fsetid","chrom","physical_pos","strand","cytoband","gene_assoc")]
>
> cn.f<- dbGetQuery(con6, "select * from featureSetCNV")
> cn.f<- cn.f[,c("fsetid","man_fsetid","chrom","chrom_start","strand","cytoband","gene_assoc")]
>
> snp6.f<- rbind(snp.f,cn.f)
>
> and process the gene_assoc part. Now the problem within the gene_assoc
> part is that there are genes which are not on the same chromosome as
> the respective probs e.g.
>
> fsetid man_fsetid chrom physical_pos strand cytoband
> 650443  CN_618877    12     93793083      -      q22
>                         gene_assoc
> ENST00000358888 // upstream // 315610 // Hs.112553 // RPL41 // 6171
> //ribosomal protein L41 /// ENST00000318066 // downstream // 8981 //
> Hs.524630 // UBE2N // 7334 // ubiquitin-conjugating enzyme E2N (UBC13
> homolog, yeast) /// NR_002212 // exon // 0 // --- // NUDT4P1 // 440672
> // nudix (nucleoside diphosphate linked moiety X)-type motif 4
> pseudogene 1 /// NM_199040 // CDS // 0 // Hs.506325 // NUDT4 // 11163
> // nudix (nucleoside diphosphate linked moiety X)-type motif 4
> ///NM_019094 // CDS // 0 // Hs.506325 // NUDT4 // 11163 // nudix
> (nucleoside diphosphate linked moiety X)-type motif 4
>
> gene "NUDT4P1" is annotated on Chromosome 1 not 12 and this is only
> one. An other example is

In what build is that true? UCSC claims that NUDT4 and NUDT4P1 are 
overlapping, on chr12 (hg19).

Anyway, the larger point here is a discussion of what a SNP is, and how 
they are localized. Essentially, a SNP is a single base that has been 
found to vary with a certain frequency in a population. They are 
localized by the flanking sequence, which means that in the case of a 
pseudogene (which may or may not be on the same chromosome), you will 
see the same flanking sequence and cannot reliably say where the SNP is 
really located.

Since DNA chips work by binding to the SNP and its flanking sequence, 
you cannot say whether you have measured the gene, the pseudogene, or 
some combination thereof.

Listing all possibilities for the SNP location is therefore not a 
'problem', it just reflects our lack of precision.

Best,

Jim

> fsetid    man_fsetid chrom physical_pos strand cytoband
> 186938 SNP_A-4227519    12     31784081      -   p11.21
>
>                                                       gene_assoc
> ENST00000294419 // upstream // 14576 // Hs.10862 // AK3L1 // 205 //
> adenylate kinase 3-like 1 /// ENST00000412352 // upstream // 16012 //
> Hs.585084 // C12orf72 // 254013 // chromosome 12 open reading frame 72
> /// NM_013410 // upstream // 14564 // Hs.10862 // AK3L1 // 205 //
> adenylate kinase 3-like 1 /// NM_001135864 // upstream // 16012 //
> Hs.585084 // C12orf72 // 254013 // chromosome 12 open reading frame 72
>
> AK3L1 is annotated at chromosome 9 not 12. The corresponding ensembl
> ID (ENST00000294419 ) is mapped to AK4-201 which is annotated on
> chromosome 1 . This are only two examples there are a lot more. Can
> some one help?
>
>
> best regards
>
> Basti
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 

------------------------------

Message: 2
Date: Wed, 21 Dec 2011 09:00:18 -0500
From: "MacDonald, James" <jmacdon at med.umich.edu>
To: "Sophie LAMARRE [guest]" <guest at bioconductor.org>
Cc: bioconductor at r-project.org, sophie.lamarre at insa-toulouse.fr
Subject: Re: [BioC] Affymetrix Mouse Gene 1.0 ST - Number of probes
Message-ID: <4EF1E672.7000905 at med.umich.edu>
Content-Type: text/plain; charset="iso-8859-1"; format="flowed"

Hi Sophie,

On 12/21/11 5:38 AM, Sophie LAMARRE [guest] wrote:
> Hello,
>
> I work on affymetrix mouse gene 1.0 ST.
>
> I used two methods in order to match my data base with my probes. I compared the uniques probes in the two methods after doing a RMA normalization:
>
> ->  there were 34 760 probes (controls probe and main probes) when I used R/ Bioconductor. I downloaded the Unsupported Mouse Gene 1.0 ST Array CDF (Technical documentation ->  Library Files) on Affymetrix website in order to have the cdf files and to make my own CDF package.
> ->  there were 35 556 probes (controls probe and main probes) when I used Expression Console. I downloaded the Mouse Gene 1.0 ST Array, Analysis (Technical documentation ->  Library Files) in order to have the files that Expression Console need.
>
> =>  So I lost 796 probes. It's boring!
>
> Next, when I kept only main probes (after matched my data base with the Affymetrix annotation file available on Affymetrix website), I had:
> ->  28 104 probes with Bioconductor
> ->  28 856 probes with Expression Console
>
> =>  There were 752 main probes, I hadn't if I realized my data analysis with Bioconductor. I'm worry because sometimes one can ask me not to do summarization probes, so I can't use Expression Console, I have to use Bioconductor. I lost a lot of probes.
>
> I asked my question to Affymetrix support and they answered:
>
> This difference can be due to a number of reasons.
>
> Firstly, the CDF file is the array layout information designed for 3' IVT array analysis,  and are therefore not optimal for a WT array (The WT arrays use different library files, CLF and PGF). This is the reason why it is given a unsupported status (as seen in the name). This could explain the difference you see.
>
> Secondly, bioconductor and Expression Console are different software, so the RMA algorithm may not work identically the same. Things like background correction, filtering and such might differ between these two software.
>
> What do you think answer Affymetrix support? Personnally, I don't think that the summarization (median polish) removes somes probes. How you could explain the difference I found? How I can do in so as to I keep all the probes I need (main probes)?

The short answer is that Affy technical support is correct. There are a 
host of problems associated with using the affy package to analyze WT 
arrays, which is why the oligo and xps packages exist. You will be 
better served by switching to either oligo or xps for the analysis of 
these data.

Best,

Jim

> Thank you,
>
> Sophie LAMARRE
> Biostatistician - Toulouse (FRANCE)
>
>   -- output of sessionInfo():
>
> R version 2.13.0 (2011-04-13)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252
> [4] LC_NUMERIC=C                   LC_TIME=French_France.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] affy_1.30.0    Biobase_2.12.2
>
> loaded via a namespace (and not attached):
> [1] affyio_1.20.0         preprocessCore_1.14.0 tools_2.13.0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 

------------------------------

Message: 3
Date: Wed, 21 Dec 2011 17:09:29 +0200
From: asas asasa <asssaaaaf at gmail.com>
To: bioconductor at r-project.org
Subject: [BioC] Limma question
Message-ID:
	<CAEQ1HoK5WMHJ9EdK8xfadC52NO4crL8pHMRUgqow9DxZD5AM=w at mail.gmail.com>
Content-Type: text/plain

Hello limma people,

I use Limma in the separate channel analysis for two color data, but there
are some small problems :

1) While my differential expression analysis is fine after background
correction ( backgroundCorrect(...) ), when this step is omitted the
following error occur:

Error in intraspotCorrelation(MA2, design) :
  Missing or infinite values found in M or A

So, MA$A,$G include NA values, but I don't understand why they appear and
how to dill with them, to avoid this error. My raw data doesn't include NA
values.

2) I need to moralized intensities of my microarray results, so it can be
exported for other programs. Is the following formula a correct way to
extract log intensities from MA data ?

logR <- MA2$A + (0.5*MA2$M) # red = Cy5
logG <- MA2$A - (0.5*MA2$M) # green = Cy3

Thanks for your previous help,
Assaf

	[[alternative HTML version deleted]]

------------------------------

Message: 4
Date: Thu, 22 Dec 2011 11:46:53 +0100
From: Jane Merlevede <jane.merlevede at gmail.com>
To: Michael Lawrence <lawrence.michael at gene.com>
Cc: bioconductor at r-project.org
Subject: Re: [BioC] "reverse" a set of nucleotides: from reverse to
	direct	sense
Message-ID:
	<CADE5-OT-uz2yr7HpwFwUfQWpET5awn42=dN+HCQ1yPjJNVi3NA at mail.gmail.com>
Content-Type: text/plain

Thanks for your answer !
I'm interesting in using the package that you developed: VariantAnnotation.
I will try it after installing R.2.14
At the beginning of your vignette, you show your data ; there is a column
"strand". I would like to know if it cares about the strand, because that
is why I need. I went through your paper and I haven't seen that you
consider both direct and reverse strand. Does your package handle both
strands? Or do I need to use the ReverseComplement function first and then
use your method on only direct strand?

Jane Merlev?de

2011/12/20 Michael Lawrence <lawrence.michael at gene.com>

> I think you are looking for the reverseComplement function in Biostrings.
> Also, the VariantAnnotation package provides much of the functionality of
> Annovar.
>
> Michael
>
> On Mon, Dec 19, 2011 at 2:13 AM, Jane Merlevede <jane.merlevede at gmail.com>wrote:
>
>> Hello,
>>
>> I am looking for "interesting" mutations among a set of mutations. To
>> reduce the amount of mutations, I am using Annovar. This software takes as
>> input a file which contains the following information: chromosome, wild
>> and
>> mutated nucleotide(s) and the start and end position of the variant(s).
>> It seems that this soft use only information from the "direct" sense but I
>> have information on reverse strand too.
>> I wrote a R-script to "reverse" the mutated variants, but I was told that
>> there is probably a solution to do that in bioconductor.
>> I haven't found yet, that's why I would like your help to know if it
>> exists.
>>
>> Thanks in advance,
>> Jane
>>
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>

	[[alternative HTML version deleted]]

------------------------------

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor

End of Bioconductor Digest, Vol 106, Issue 21