[BioC] converting position from '-' strand to '+' strand
James W. MacDonald
jmacdon at med.umich.edu
Mon Apr 5 21:36:34 CEST 2010
Hi Tim,
Tim Smith wrote:
> Apologies if this seems like a trivial question.
>
> I wanted to have a consistent set of locations and wanted to all the
> locations to begin from the 5' end. How can I convert locations that
> are given for the '-' strand? For example:
> ----------------------------- library(biomaRt)
>
> mart.obj <- useMart(biomart = 'ensembl', dataset =
> 'hsapiens_gene_ensembl')
>
> atb <- c('ensembl_gene_id', 'chromosome_name', 'start_position',
> 'end_position', 'strand')
>
> mir.locs <- getBM(attributes=atb, filters="biotype", values="miRNA",
> mart=mart.obj) mir.locs[1:5,]
>> ensembl_gene_id chromosome_name start_position end_position strand
> 1 ENSG00000222732 5 171706206 171706319 1
> 2 ENSG00000207864 9 97847727 97847823 1
> 3 ENSG00000221173 9 129338809 129338909 -1
> 4 ENSG00000222961 5 32379501 32379581 -1
> 5 ENSG00000221058 18 51612956 51613026 -1
>
>
> ---------------------------- Is there a quick way that I can convert
> the last 3 rows so that they reflect positions from the 5' strand?
> many thanks!
I'm confused. You want the positions to be from the 5' end regardless of
the strand? Wouldn't that make things less consistent? It seems
counterintuitive to be counting from different ends of the chromosome.
Anyway, if you really want something like that, you could use a
combination of tapply() and mapply().
## split data by chrom
mir.list <- tapply(1:dim(mir.locs)[1], mir.locs$chromosome_name,
function(x) mir.locs[x,])
## reverse counting for '-' strand
## first make a list containing the length of each chromosome in your
output - I leave that to the reader to figure out how to do that. Let's
say it is called 'len.list'
## now a little function to do the reversing
little.fun <- function(df, len){
df$start_position <- ifelse(df$strand == -1,len-df$end_position,
df$start_position)
df$end_position <- ifelse(df$strand == -1,len-df$start_position,
df$end_position)
df
}
## and use mapply to do the work
mir.rev <- mapply(function(x,y) little.fun(mir.list, len.list),
SIMPLIFY = FALSE)
Note that this isn't tested, but it should be pretty close.
Best,
Jim
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________ Bioconductor mailing
> list Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
More information about the Bioconductor
mailing list