[R] Reshaping genetic data from long to wide

Farrel Buchinsky fbuchins at wpahs.org
Thu Apr 6 18:27:08 CEST 2006


Bottom Line Up Front: How does one reshape genetic data from long to wide?

I currently have a lot of data. About 180 individuals (some
probands/patients, some parents, rare siblings) and SNP data from 6000 loci
on each. The standard formats seem to be something along the lines of Famid,
pid, fatid, motid, affected, sex, locus1Allele1, locus1Allele2,
locus2Allele1, locus2Allele2, etc

In other words one human, one row. If there were multiple loci then the
variables would continue to be heaped up on the right. This kind of
orientation, shall be referred to as "wide".

Given how big my dataset is, it is easier to manage the data in the database
in the "long" format. In this format I have a pedigree table and from it, a
one to many relationship with the SNP data. The SNP table has fields:
uniqueHumanID, Allele1, Allele2, locus

That makes for an incredibly long table.

Data is stored in a Sybase database that I communicate with through ODBC
using Microsoft Access. RODBC package then reads the queries that I have
created in Microsoft Access. The only reason for Microsoft Access is that I
have had well over a decade's worth of experience using it at an
intermediate level.

With the magic of SQL I can mix and match these tables. But creating the
table that is 180 rows long and about 12010 variables wide is daunting.
Essentially the 6000 SNPs represent each human having 12000 repeated
measures (6000SNPs times 2 alleles)

I presume I would be able to use the reshape function in R:
"Reshape Grouped Data
Description
This function reshapes a data frame between 'wide' format with repeated
measurements in separate columns of the same record and 'long' format with
the repeated measurements in separate records. "


BUT BEFORE I launch into this.

Is there a way that either the Warnes package (Genetics) or the David
Clayton package can handle the data in the long form?
If not do any of the packages reshape the data in a way that is pedigree and
genotype aware. The general R reshape function is not predesigned to be
friendly to genetic data.

Farrel Buchinsky, MD --- Mobile (412) 779-1073
Pediatric Otolaryngologist
Allegheny General Hospital
Pittsburgh, PA 


**********************************************************************
This email and any files transmitted with it are confidentia...{{dropped}}




More information about the R-help mailing list