[BioC] HapMap gene list
noxyport at gmail.com
noxyport at gmail.com
Wed Aug 4 19:41:38 CEST 2010
Hi,
I have a problem with the gene list (gff version3 file) HapMap is
using (ftp://ftp.ncbi.nlm.nih.gov/hapmap/gbrowse/2009-02_phaseII+III/gff/refGene_hg18_tests_11Apr07.gff.gz).
I tried loading the file into R and selecting all "mRNA" entries but
something seems to go wrong with it:
> hapmap=read.table("refGene_hg18_tests_11Apr07.gff", header=F, sep=" ")
> nrow(hapmap)
[1] 171701
> hapmap2=hapmap[which(hapmap$V3=="mRNA"), ]
> nrow(hapmap2)
[1] 12718
> hapmap[(2210:2220), (1:3)]
V1 V2 V3
2210 chr1 UCSC_1 mRNA
2211 chr1 UCSC_1 five_prime_UTR
2212 chr1 UCSC_1 five_prime_UTR
2213 chr1 UCSC_1 CDS
2214 chr1 UCSC_1 CDS
2215 chr1 UCSC_1 CDS
2216 chr1 UCSC_1 CDS
2217 chr1 UCSC_1 CDS
2218 chr1 UCSC_1 CDS
2219 chr1 UCSC_1 CDS
2220 chr1 UCSC_1 CDS
>
Can anyone explain why this could be? Probably, the large descriptive
column (V9) but I don't see the failure.
I have to admit that it is probably not the best way to use this file
but I do not find any other source (RefSeq, UCSC), which contains the
same genomic regions for the genes annotated as in HapMap. Which NCBI
36 build did they use and where can I download a gene file with
chromosome, gene start and stop matching with HapMap?
Thanks for your help!
More information about the Bioconductor
mailing list