[BioC] biomaRt: retrieve total chromosome lengths

Steffen Durinck durincks at mail.nih.gov
Mon Oct 30 21:16:52 CET 2006


Hi An,

There is no way to retrieve the chromosome lengths with biomaRt when 
used with Ensembl.
The closest you'll get with biomaRt is to subtract the position of the 
'first' transcript from the position of the 'last' transcript.

If you want to use the Ensembl data to get this information (you'll need 
to do some browser clicking), you can select your species of interest at
http://www.ensembl.org/

for hsapiens:

http://www.ensembl.org/Homo_sapiens/index.html

then select a chromosome e.g.:

http://www.ensembl.org/Homo_sapiens/mapview?chr=1

and here you'll get the length.

Cheers,
Steffen

James W. MacDonald wrote:
> Hi An,
>
> De Bondt, An-7114 [PRDBE] wrote:
>   
>> Hi,
>>
>> How can I retrieve, for a certain organism (e.g. human), the total length of
>> each of its chromosomes using biomaRt?
>> 	library(biomaRt)
>> 	mart <- useMart("ensembl")
>> 	mart <- useDataset("hsapiens_gene_ensembl", mart)
>> 	chr.lengths <- ???
>>     
>
> Well, this doesn't agree exactly with what I see on this webpage:
>
> http://www.ornl.gov/sci/techresources/Human_Genome/posters/chromosome/faqs.shtml
>
> But it is pretty close. Of course I am finding the end of the 'last' 
> transcript on a given chromosome rather than the end of the chromosome 
> itself, so there will likely be differences. However, I don't see an 
> attribute that looks like it gives chromosomal information without first 
> being mapped through a gene, so I don't know if you can get exactly what 
> you want.
>
> If there is a way, Steffen Durinck will undoubtedly know what it is, but 
> I haven't seen a response from him as yet.
>
> Anyway, here is what I did.
>
>  > mart <- useMart("ensembl", "hsapiens_gene_ensembl")
> Checking attributes and filters ... ok
>  > a <- getBM("hsapiens_gene_ensembl_structure.transcript_chrom_end", 
> "chromosome_name", c(1:21, "x","y"), mart, output="list")
>  > sapply(a[[1]], max)
>          1         2         3         4         5
> 247197891 242713278 199439629 191246650 180727832
>          6         7         8         9        10
> 170735623 158630410 146252219 140191642 135347681
>         11        12        13        14        15
> 134361903 132289533 114110907 106354309 100334282
>         16        17        18        19        20
>   88771793  78646005  76106388  63802660  62429769
>         21         x         y
>   46935585 154908521  57767721
>
> Best,
>
> Jim
>
>
>   
>> Thanks in advance!
>> An
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>     
>
>
>   


-- 
Steffen Durinck, Ph.D.

Oncogenomics Section
Pediatric Oncology Branch
National Cancer Institute, National Institutes of Health
URL: http://home.ccr.cancer.gov/oncology/oncogenomics/

Phone: 301-402-8103
Address:
Advanced Technology Center,
8717 Grovemont Circle
Gaithersburg, MD 20877



More information about the Bioconductor mailing list