[BioC] Rsamtools question
Valerie Obenchain
vobencha at fhcrc.org
Tue Oct 4 17:58:56 CEST 2011
Hi Alpesh,
One approach would be to
I. Read in snps with readGappedAlignments :
Convert your sam file to a bam using samtools from the command line,
samtools view -bS -o yourfile.bam yourfile.sam
and read into R with readGappedAlignments
library(GenomicFeatrures)
?readGappedAlignments
II. Get gene ID :
Use an org annotation package to get the gene id you are interested in.
There are many ID types available (Entrez, Ensembl, RefGene, etc.). You
will use this ID to identify the gene of interest in the TxDb package
(below).
Many org packages are available, see all starting with "org",
http://bioconductor.org/packages/devel/data/annotation/
For the human package,
library(org.Hs.eg.db)
?org.Hs.eg.db
III. Get the genomic coordinates for the gene of interest :
The TxDb packages have genomic coordinates as well as transcript, exon
and gene ID's. See packages starting with "TxDb",
http://bioconductor.org/packages/devel/data/annotation/
For example, the human hg19 known genes from UCSC are in the following
package. Since this is a UCSC package the geneID's will be RefGene ID's.
library(GenomicFeatures)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
ls(2) # to see the name of the txdb object
If you want to make your own txdb see
?makeTranscriptDb
There are several functions available for organizing you data by exons,
transcripts or gene, see
?transcriptsBy
IV. Counting
You now have the genomic coordinates of your gene (from txdb package)
and your snps (read in with readGappedAlignments).
Several counting options are available,
library(GenomicRanges)
?summarizeOverlaps # avoids double counting, available in R-devel
version
?countOverlaps
?findOverlaps
Valerie
On 10/02/2011 08:28 AM, Alpesh Querer wrote:
> Hi All,
>
> I am a newbie with R and Bioconductor. I have .sam files and I want to count
> the number of reads mapping to particular SNP locations on the .fa gene.
> Also I want to avoid double counting the SNPs which are covered by the same
> 36bp read.
> Any help is highly appreciated.
>
> Thanks,
> Alpesh
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list