[R-sig-genetics] R-sig-genetics Digest, Vol 8, Issue 2

Ross ross.lazarus at gmail.com
Fri Sep 10 20:51:42 CEST 2010


Hi, Fernando,

I think there are some QC tools in David Clayton's snpMatrix package,
but there's no single R package to do all the reports you need AFAIK.
For comprehensive reporting, if you don't mind not using R, one option
is to try the SNP/WGA tools in Galaxy - they do use R for graphics but
you don't need to install anything as it all works through an ordinary
web browser.

Essentially, if you have your genotype and pedigree data in Plink
style linkage format (separate map and ped files), the steps are
something like this:

1. make yourself a new user account at the main Galaxy server
(http://usegalaxy.org) so your histories are preserved between logins

2. From the analysis window, left (tool) pane, click the Get Data tool
group header to expand the group, then click the 'upload file' tool.
A form will appear in the center pane of your browser.

3. Change the file format (first field on the form) from Auto to
"lped" format as autodetect won't work for these multi-part datatypes
4. Make the 'ped' and 'map' file upload fields point to the right map
and ped files on your local machine, set the 'build' to hg18 and
change the name to reflect something informative about your data then
click execute.

5. After the data are uploaded (should only take a minute or two for a
small file) to your history, you can select the SNP/WGA QC LD Plots
tool submenu in the tool pane and then click the QC tool. Another form
will open in the center panel. Your new dataset should be the only one
available in the drop-down list of files to process. Change the QC job
name to a meaningful name, click 'execute'. For a small dataset, the
whole process should run for a few minutes but you can safely log out
and log back in later - your work will all be preserved.

6. The QC tool output (in the right side history pane) has an 'eye'
icon which you can click to open up the report in the center panel -
you should see HWE/missingness/Mendel and all sorts of other useful
plots and there are some tabular files containing summary details by
marker and by sample.

I'm happy to answer any questions you might have - I hope this helps
get you started?

There's a 'clean' tool you can use to remove markers and subjects that
fall below specific thresholds for QC measures and there's a TDT tool
you can use for analysis of family data.


On Fri, Sep 10, 2010 at 6:00 AM,  <r-sig-genetics-request at r-project.org> wrote:
> Send R-sig-genetics mailing list submissions to
>        r-sig-genetics at r-project.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        https://stat.ethz.ch/mailman/listinfo/r-sig-genetics
> or, via email, send a message with subject or body 'help' to
>        r-sig-genetics-request at r-project.org
>
> You can reach the person managing the list at
>        r-sig-genetics-owner at r-project.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of R-sig-genetics digest..."
>
>
> Today's Topics:
>
>   1. Available R packages to QC/summarize large SNP    datasets?
>      (GRIGNOLA, FERNANDO E [AG/1000])
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 9 Sep 2010 10:54:23 -0500
> From: "GRIGNOLA, FERNANDO E [AG/1000]"
>        <fernando.e.grignola at monsanto.com>
> To: <r-sig-genetics at r-project.org>
> Subject: [R-sig-genetics] Available R packages to QC/summarize large
>        SNP     datasets?
> Message-ID:
>        <2335BBF3D6579746A3165D5D7B95DF7018D1BB at NA1000EXM12.na.ds.monsanto.com>
>
> Content-Type: text/plain; charset="us-ascii"
>
> Hi,
>
> I looking for some guidance regarding available packages to do QC (i.e.
> parent-offspring inheritance checks) and to summarize large SNP datasets
> (i.e. 50K) for pedigreed populations (frequencies, linkage calculations,
> etc).
>
> I know of some packages that check for HW equilibrium, get frequencies
> as part of the data preparation for genome-wide association analyses for
> example. However, I was wondering if somebody can point me to 1 or 2
> packages that mostly focus on data quality and summary statistics for
> large SNPs of data.
>
> Thanks in advance for your assistance,
>
>
>
> Fernando
>
>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <https://stat.ethz.ch/pipermail/r-sig-genetics/attachments/20100909/79833d2b/attachment-0001.html>
> -------------- next part --------------
>
> This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited.
>
> All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware". Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying this e-mail or any attachment.
>
>
> The information contained in this email may be subject to the export control laws and regulations of the United States, potentially including but not limited to the Export Administration Regulations ("EAR") and sanctions regulations issued by the U.S. Department of Treasury, Office of Foreign Asset Controls (?OFAC?).  As a recipient of this information you are obligated to comply with all applicable U.S. export laws and regulations.
>
> ------------------------------
>
> _______________________________________________
> R-sig-genetics mailing list
> R-sig-genetics at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-genetics
>
>
> End of R-sig-genetics Digest, Vol 8, Issue 2
> ********************************************
>



-- 
Ross Lazarus MBBS MPH
Associate Professor, Harvard Medical School
Director of Bioinformatics, Channing Laboratory
181 Longwood Ave., Boston MA 02115, USA.
Tel: +1 617 505 4850



More information about the R-sig-genetics mailing list