[Bioc-devel] Bioc needs better support for variants
Martin Morgan
mtmorgan at fhcrc.org
Sat Apr 16 05:41:42 CEST 2011
On 04/15/2011 01:00 PM, Michael Lawrence wrote:
>
>
> On Fri, Apr 15, 2011 at 7:19 AM, Martin Morgan <mtmorgan at fhcrc.org
> <mailto:mtmorgan at fhcrc.org>> wrote:
>
> On 04/15/2011 06:05 AM, Vincent Carey wrote:
>
> I will comment on my limited view and progress. I need to work from
> an exemplar. I committed cheung2010 in the experimental data
> archive
> (devel only). This relates to PMID 20856902, genetics of expression
> in immortalized B cells.
>
> There are 147 individuals with hapmap phase 3 genotypes and hgfocus
> arrays (:-( but about 45 have RNA-seq data in GEO. fastq is
> available
> with the SRAtools fastq-dump and you can get the sra data reasonably
> quickly using ascp. I will eventually make a sample from their
> RNA-seq data available in this package to look at SNP-driven
> allele-specific expression and other aspects of SNP-dependent
> expression regulation.
>
> Probably there is DNA-seq data out there on these coriell cell lines
> but for the moment I will be looking at the chip-based SNPs and
> imputation on those. Better representations for 8 million SNP per
> sample would probably come in handy, but breaking them up by
> chromosome in SnpMatrix instances is OK so far. I think we have to
> recognize that in any of these paradigms discrete calls are
> often not
> going to cut it, and uncertainty representations will be important.
>
> VCF representations of indels in 1000 genomes are available, but I
> don't know that we have good tools for importing and modeling those.
> Another exemplar that should be considered.
>
> On Fri, Apr 15, 2011 at 7:16 AM, Michael Lawrence
> <lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>>
> wrote:
>
> Hi guys,
>
> Congrats on the release. For this next one, one focus, in my
> opinion, should
> be on analyzing variants in the context of sequencing data.
> This includes
> infrastructure for things like calling variants (in DNA and
> RNA), as well as
> determining their effects (e.g., coding and splicing
> changes). It would be
> good if we could come up with a plan. If we had one, we
> could commit some
> resources here to the problem.
>
> Is anyone willing to help out on this? What do you guys think?
>
>
> We could certainly play a role in annotation of variants and support
> for interfacing with established 3rd party formats. Obviously also
> the representation of variants that overlap with IRanges /
> Biostrings infrastructure. Martin
>
>
>
> Great, this is in line with what I was thinking. We need a way to
> formally represent sets of variants, as well as transcripts and proteins
> (i.e., something based on a GRange[List]). Then we can map between
> coordinate systems and request the consequences of mutations. I was
> looking at the Ensembl variations Perl API; it might be good for
> inspiration.
>
> Is there somewhere like a wiki where we could start hashing this out?
Started a page here
http://wiki.fhcrc.org/bioc/Variant_Calls
individuals should be able to create their own accounts from links at
the very top of the page.
Martin
>
> Michael
>
>
>
> Thanks,
> Michael
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
> mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
> _______________________________________________
> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
> mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>
>
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioc-devel
mailing list