[BioC] converting position from '-' strand to '+' strand

Mon Apr 5 21:36:34 CEST 2010

Hi Tim,

Tim Smith wrote:
> Apologies if this seems like a trivial question.
> 
> I wanted to have a consistent set of locations and wanted to all the
> locations to begin from the 5' end. How can I convert locations that
> are given for the '-' strand? For example: 
> ----------------------------- library(biomaRt)
> 
> mart.obj <- useMart(biomart = 'ensembl', dataset =
> 'hsapiens_gene_ensembl')
> 
> atb <- c('ensembl_gene_id', 'chromosome_name', 'start_position',
> 'end_position', 'strand')
> 
> mir.locs <- getBM(attributes=atb, filters="biotype", values="miRNA",
> mart=mart.obj) mir.locs[1:5,]
>> ensembl_gene_id chromosome_name start_position end_position strand
> 1 ENSG00000222732               5      171706206    171706319      1 
> 2 ENSG00000207864               9       97847727     97847823      1 
> 3 ENSG00000221173               9      129338809    129338909     -1 
> 4 ENSG00000222961               5       32379501     32379581     -1 
> 5 ENSG00000221058              18       51612956     51613026     -1
> 
> 
> ---------------------------- Is there a quick way that I can convert
> the last 3 rows so that they reflect positions from the 5' strand? 
> many thanks!

I'm confused. You want the positions to be from the 5' end regardless of 
the strand? Wouldn't that make things less consistent? It seems 
counterintuitive to be counting from different ends of the chromosome.

Anyway, if you really want something like that, you could use a 
combination of tapply() and mapply().

## split data by chrom

mir.list <- tapply(1:dim(mir.locs)[1], mir.locs$chromosome_name,
		function(x) mir.locs[x,])

## reverse counting for '-' strand

## first make a list containing the length of each chromosome in your 
output - I leave that to the reader to figure out how to do that. Let's 
say it is called 'len.list'

## now a little function to do the reversing

little.fun <- function(df, len){
	df$start_position <- ifelse(df$strand == -1,len-df$end_position,
				df$start_position)
	df$end_position <- ifelse(df$strand == -1,len-df$start_position,
				df$end_position)
	df
}

## and use mapply to do the work

mir.rev <- mapply(function(x,y) little.fun(mir.list, len.list),
		SIMPLIFY = FALSE)

Note that this isn't tested, but it should be pretty close.

Best,

Jim

> 
> 
> 
> [[alternative HTML version deleted]]
> 
> _______________________________________________ Bioconductor mailing
> list Bioconductor at stat.math.ethz.ch 
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues