[BioC] GWASTools suggestions: explicit interface for GenotypeReaders,

Stephanie M. Gogarten sdmorris at u.washington.edu
Fri Dec 20 00:45:31 CET 2013


Hi Karl,

I updated the documentation for the functions that assume chromosomes are in blocks (this had been on my to-do list for a while).  The other things are good ideas but we don't have time for them right now. If you wanted to work on them and send us some code, we'd be happy to incorporate it. 

Stephanie 

On 12/19/13 2:20 AM, Karl Forner wrote:
> Hello,
> 
>  Explicit interface for GenotypeReaders,
> --------------------------------------------------
> I am a big fan of the GenotypeData object architecture, that enables to use
> a unique object type which can use any representation or storage of the
> actual genotypes thanks to its GenotypeReader concept.
> 
> But from what I've seen, the different readers just stick to a common
> interface, that is not clearly defined.
> For example the method hasVariable() is not available for
> MatrixGenotypeReader.
> It is important when developing functions taking a GenotypeData as
> argument, to know which interface is safe to use.
> 
> I believe that this is a very good example for the use of an abstract class
> GenotypeReader, that each specialized Reader should derive from.
> 
> 
> sorted GenotypeData
> ---------------------------
> I realized that some functions rely on the SNPs to be sorted by chromosome.
> In assocTestRegression() for instance, these lines of code are wrong if the
> chromosome are not sorted.
> 
>     chrom <- getChromosome(genoData)
>     unique_chrom <- unique(chrom)
>     nChromosomes <- max(chrom)
>     rle_chrom <- rle(chrom)
>     rle_chrom2 <- rep(0, nChromosomes)
>     rle_chrom2[unique_chrom] <- rle_chrom$lengths
> 
> 
> I think that either it should be clearly stated in the function
> documentation that it takes a sorted genotype data as argument, or that a
> stronger assumption that all genotypedata must be sorted should be enforced.
> 
> 
> subset Genotype Reader
> ---------------------------------
> A useful addition would be a SubsetGenotypeReader, that would take as
> argument an exiting GenotypeReader instance, and lists of snpIPs and
> scanIDs.
> This reader will act as a database view on the data, and would allow to use
> subset data with all functions taking GenotypeData arguments.
> 
> 
> 
> These are suggestions, and I realize that implementing them requires work,
> but if ever you need it I could contribute some code.
> 
> Thanks for your attention
> 
> Karl Forner
> 
>    [[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 



More information about the Bioconductor mailing list