[Bioc-devel] Bioc needs better support for variants

Martin Morgan mtmorgan at fhcrc.org
Sat Apr 16 05:41:42 CEST 2011


On 04/15/2011 01:00 PM, Michael Lawrence wrote:
>
>
> On Fri, Apr 15, 2011 at 7:19 AM, Martin Morgan <mtmorgan at fhcrc.org
> <mailto:mtmorgan at fhcrc.org>> wrote:
>
>     On 04/15/2011 06:05 AM, Vincent Carey wrote:
>
>         I will comment on my limited view and progress.  I need to work from
>         an exemplar.  I committed cheung2010 in the experimental data
>         archive
>         (devel only).  This relates to PMID 20856902, genetics of expression
>         in immortalized B cells.
>
>         There are 147 individuals with hapmap phase 3 genotypes and hgfocus
>         arrays (:-( but about 45 have RNA-seq data in GEO.  fastq is
>         available
>         with the SRAtools fastq-dump and you can get the sra data reasonably
>         quickly using ascp.  I will eventually make a sample from their
>         RNA-seq data available in this package to look at SNP-driven
>         allele-specific expression and other aspects of SNP-dependent
>         expression regulation.
>
>         Probably there is DNA-seq data out there on these coriell cell lines
>         but for the moment I will be looking at the chip-based SNPs and
>         imputation on those.  Better representations for 8 million SNP per
>         sample would probably come in handy, but breaking them up by
>         chromosome in SnpMatrix instances is OK so far.  I think we have to
>         recognize that in any of these paradigms discrete calls are
>         often not
>         going to cut it, and uncertainty representations will be important.
>
>         VCF representations of indels in 1000 genomes are available, but I
>         don't know that we have good tools for importing and modeling those.
>         Another exemplar that should be considered.
>
>         On Fri, Apr 15, 2011 at 7:16 AM, Michael Lawrence
>         <lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>>
>           wrote:
>
>             Hi guys,
>
>             Congrats on the release. For this next one, one focus, in my
>             opinion, should
>             be on analyzing variants in the context of sequencing data.
>             This includes
>             infrastructure for things like calling variants (in DNA and
>             RNA), as well as
>             determining their effects (e.g., coding and splicing
>             changes). It would be
>             good if we could come up with a plan. If we had one, we
>             could commit some
>             resources here to the problem.
>
>             Is anyone willing to help out on this? What do you guys think?
>
>
>     We could certainly play a role in annotation of variants and support
>     for interfacing with established 3rd party formats. Obviously also
>     the representation of variants that overlap with IRanges /
>     Biostrings infrastructure. Martin
>
>
>
> Great, this is in line with what I was thinking. We need a way to
> formally represent sets of variants, as well as transcripts and proteins
> (i.e., something based on a GRange[List]). Then we can map between
> coordinate systems and request the consequences of mutations. I was
> looking at the Ensembl variations Perl API;  it might be good for
> inspiration.
>
> Is there somewhere like a wiki where we could start hashing this out?

Started a page here

http://wiki.fhcrc.org/bioc/Variant_Calls

individuals should be able to create their own accounts from links at 
the very top of the page.

Martin

>
> Michael
>
>
>
>             Thanks,
>             Michael
>
>                     [[alternative HTML version deleted]]
>
>             _______________________________________________
>             Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>             mailing list
>             https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>         _______________________________________________
>         Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>         mailing list
>         https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
>     --
>     Computational Biology
>     Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
>     Location: M1-B861
>     Telephone: 206 667-2793
>
>


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioc-devel mailing list