[BioC] extract introns
Yating Cheng
yating.cheng at charite.de
Tue Nov 15 23:57:31 CET 2011
Hi Steve,
I am working with Apis mellifera.
Actually I am also trying to use the alignment function to get the intron
sequence.which is much easier for me to understand. But there are some
problem with that also.
rm(list=ls(all=TRUE))
# set working directory
setwd("F:/Yating/introns")
# load Biostrings library
library(Biostrings)
# load data
genebody<-read.delim("genebody_5.txt",sep="\t",header=FALSE)
exons<-read.delim("exon_5_seperated.txt",sep="\t",header=FALSE)
# create object for storage of extracted introns
introns<-data.frame()
# loop over all genes
for(i in 1:length(genebody[,1])){
tmp.id<-as.vector(genebody[i,1]) # get gene id
tmp.subject<-as.vector(genebody[i,2]) # get gene sequence
tmp.exons<-exons[which(exons[,1]==tmp.id),] # get exons of the
selected genes
tmp.pattern<-as.vector(tmp.exons[,3]) # define exons as patterns
for alignment
tmp.align<-pairwiseAlignment(pattern=tmp.pattern,
subject=tmp.subject,type="local") # align all exons pairwise to gene
sequence
tmp.start<-start(subject(tmp.align)) # vector of all alignment
starts
tmp.end<-end(subject(tmp.align)) # vector of all alignment ends
for(ex in 1:(length(tmp.end)-1)){ # extract introns
tmp.intron<-substr(tmp.subject,tmp.end[ex]+1,tmp.start[ex+1]-1)
introns<-rbind(introns,cbind(tmp.id,tmp.end[ex]+1,tmp.start[ex+1]-1,tmp.intron))
}
}
}
# define useful names for columns
colnames(introns)<-c("gene.id","pos.start","pos.end","intron")
# write output
write.table(introns,"Introns.txt", sep="\t", quote=FALSE, row.names=FALSE)
The problem is the unvalid arguments in the substr....
Best Wishes
Yating
> HI Yating,
>
> On Tue, Nov 15, 2011 at 4:48 PM, Yating Cheng <yating.cheng at charite.de>
> wrote:
>> Hi, steve
>>
>> Thanks for your answer. About converting the names of chromosome, it is
>> kind of difficult.
>>
>> I need the intron sequences for every gene, but the chromosome names of
>> some genes I got from biomart are the same.
>
> Sorry, I'm confused.
>
> What organism are you working with?
>
> Can you construct a TranscriptDb for it using the GenomicFeatures
> package (you'll have to read through the GenomicFeatures vignette).
>
>> I am not sure about how can I get specific genes from BS genome, and
>> match
>> them to gene-IDs.
>
> BSgenome packages do not have gene specific information, they only
> contain sequence information for different organisms.
>
> The fact that you can use a BSgenome.* package for your organism of
> interest suggests that you should be able to build a TranscriptDb for
> it, so that's good news ...
>
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
> | Memorial Sloan-Kettering Cancer Center
> | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
More information about the Bioconductor
mailing list