[BioC] Rsamtools question
Valerie Obenchain
vobencha at fhcrc.org
Tue Oct 4 18:11:20 CEST 2011
On 10/04/2011 08:58 AM, Valerie Obenchain wrote:
> Hi Alpesh,
>
> One approach would be to
>
> I. Read in snps with readGappedAlignments :
>
> Convert your sam file to a bam using samtools from the command line,
>
> samtools view -bS -o yourfile.bam yourfile.sam
or from within R -- Rsamtools::asBam
>
> and read into R with readGappedAlignments
>
> library(GenomicFeatrures)
> ?readGappedAlignments
>
>
> II. Get gene ID :
>
> Use an org annotation package to get the gene id you are interested
> in. There are many ID types available (Entrez, Ensembl, RefGene,
> etc.). You will use this ID to identify the gene of interest in the
> TxDb package (below).
> Many org packages are available, see all starting with "org",
>
> http://bioconductor.org/packages/devel/data/annotation/
>
> For the human package,
>
> library(org.Hs.eg.db)
> ?org.Hs.eg.db
>
>
> III. Get the genomic coordinates for the gene of interest :
> The TxDb packages have genomic coordinates as well as transcript, exon
> and gene ID's. See packages starting with "TxDb",
>
> http://bioconductor.org/packages/devel/data/annotation/
>
> For example, the human hg19 known genes from UCSC are in the following
> package. Since this is a UCSC package the geneID's will be RefGene ID's.
>
> library(GenomicFeatures)
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> ls(2) # to see the name of the txdb object
>
> If you want to make your own txdb see
>
> ?makeTranscriptDb
>
> There are several functions available for organizing you data by
> exons, transcripts or gene, see
>
> ?transcriptsBy
>
>
> IV. Counting
>
> You now have the genomic coordinates of your gene (from txdb package)
> and your snps (read in with readGappedAlignments).
> Several counting options are available,
>
> library(GenomicRanges)
> ?summarizeOverlaps # avoids double counting, available in R-devel
> version
> ?countOverlaps
> ?findOverlaps
also Rsamtools::applyPileups which gives bases over genomic coordinates
Valerie
>
>
> Valerie
>
>
>
> On 10/02/2011 08:28 AM, Alpesh Querer wrote:
>> Hi All,
>>
>> I am a newbie with R and Bioconductor. I have .sam files and I want
>> to count
>> the number of reads mapping to particular SNP locations on the .fa gene.
>> Also I want to avoid double counting the SNPs which are covered by
>> the same
>> 36bp read.
>> Any help is highly appreciated.
>>
>> Thanks,
>> Alpesh
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list