[BioC] extract introns
Steve Lianoglou
mailinglist.honeypot at gmail.com
Sun Nov 13 16:29:15 CET 2011
Hi,
On Sun, Nov 13, 2011 at 9:46 AM, Yating Cheng <yating.cheng at charite.de> wrote:
> Hi, everyone,
>
> Now I am facing new problem.
>
> The following file I got from biomart. So, GB*** is ensemble_gene_id.and
> Group*** is chromosome_name.
>
> The first problem is that, in the file is exon_chrom_start and end, but
> should I need intron_start and end ? since I need intron sequences.
The `gaps` function will help with that ... however, if there are
intron start/end columns you can fetch from biomart, feel free to go
that way ...
> "ensembl_gene_id" "chromosome_name" "exon_chrom_start" "exon_chrom_end"
> "strand"
> "1" "GB17244" "Group5.28" 95590 95776 1
> "2" "GB17244" "Group5.28" 95865 96055 1
> "3" "GB17244" "Group5.28" 96189 96428 1
> "4" "GB17244" "Group5.28" 100525 100632 1
> "5" "GB17244" "Group5.28" 100689 100751 1
> "6" "GB17244" "Group5.28" 100835 101145 1
> "7" "GB17244" "Group5.28" 101265 101547 1
> "8" "GB17244" "Group5.28" 101632 101673 1
> "9" "GB12201" "Group13.1" 374539 374712 -1
> "10" "GB12201" "Group13.1" 366611 366778 -1
>
> And another prolem is in the script.
>
> library(Biostrings)
>
> library(GenomicRanges)
>
> source("http://ftp:/biocLite.R")
> biocLite("BSgenome.Amellifera.BeeBase.assembly4")
>
> library(BSgenome.Amellifera.BeeBase.assembly4)
> library(BSgenome.Amellifera.BeeBase.assembly4)
> #Loading required package: BSgenome
> #Warning message:
> #package 'BSgenome' was built under R version 2.13.2
>
> intron_ranges<-read.table("intron_ranges.txt")
>
> gr.exons <- with(intron_ranges, {
> GRanges(chromosome_name,
> IRanges(exon_chrom_start, exon_chrom_end),
> ifelse(strand == 1, '+', '-'))})
>
> intron.seqs <-getSeq(Amellifera, gr.exons, as.character=FALSE)
>
> #Error in .getOneSeqFromBSgenomeMultipleSequences(x, name, start, NA,
> width, : sequence Group5.28 not found
>
> My guess is that the chromosome names I input is not the same stored in
> the BS genome.
You don't have to guess, what is the output of
`seqlevels(Amellifera)`? You'll get a list of chromosome names there.
It's your job to translate your "Group5.28" names to the ones listed
in the Amellifera genome.
> But that is the only way I can get the IRanges. Or does
> anyone know how to get IRanges direct from BS genome.
You can extract one chromosome at a time from the BSgenome object,
then just query it w/ an IRanges object.
For example, in human:
R> library(BSgenome.Hsapiens.UCSC.hg19)
R> chr1 <- unmasked(Hsapiens$chr1)
R> some.seqs <- Views(chr1, IRanges(sample(249250600, 10), width=10))
R> some.seqs
some.seqs
Views on a 249250621-letter DNAString subject
subject: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
views:
start end width
[1] 12274027 12274036 10 [AAAGAGCAAC]
[2] 149890737 149890746 10 [AAAGCCATAA]
[3] 49569412 49569421 10 [AAATTTGGAT]
[4] 230817984 230817993 10 [TTAGATTTTG]
[5] 23827107 23827116 10 [TTTCCTGTAT]
[6] 126211483 126211492 10 [NNNNNNNNNN]
[7] 230032024 230032033 10 [GTGAATAATT]
[8] 182185491 182185500 10 [TGCATTTGTC]
[9] 229816410 229816419 10 [TTCCTTAATG]
[10] 99265270 99265279 10 [CACATTTTCT]
HTH,
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioconductor
mailing list