[BioC] Genomicfeatures chromosome names matching

Wed Feb 8 20:36:07 CET 2012

Hi Yuval,

Sorry for the delay.

On 01/19/2012 07:33 AM, Yuval Itan wrote:
> Dear Herve,
>
> My name is Yuval, I am a postdoc at the Rockefeller University. I am trying to use Bioconductor for analyzing my RNA-seq data, and I would be grateful for your advice as my R level is a bit basic and I got stuck. I need to count the number of reads per gene and my fastq data was aligned to chromosomes named "1", "2" etc. while makeTranscriptFromUCSC provided with "chr1" etc. names that made the overlap check impossible. Is there a way to convert the chromorome names returned from the make that the TranscriptFromUCSC (or if available the full gene) chromosome names will not include "chr" (or any way that they would match)?
>

Please have a look at ?seqlevels

I would suggest that you use seqlevels() on your reads (you probably
have them in a GappedAlignments object), not on your transcripts (stored
in a TranscriptDb object). Also it's important to realize that it's not
enough to fix the chromosome names so that they match: the reference
genome used to align your fastq data must be the same as the reference
genome your annotations are based on (i.e. the genome used when
makeTranscriptFromUCSC() was called to make the TranscriptDb object).
Otherwise, even though technically you'll be able to do
findOverlaps()/countOverlaps(), the result you'll get could be
partially or totally meaningless.

Finally note that the bioconductor mailing list (cc'ed) is a better
place to ask this kind of questions as many subscribers there might
be able to help.

Cheers,
H.

> Many thanks,
>
> Yuval

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319