[Bioc-devel] a day in the life of gwascat

Vincent Carey @tvjc @end|ng |rom ch@nn|ng@h@rv@rd@edu
Thu Apr 30 13:15:29 CEST 2020


right, line 35265 of
http://www.ebi.ac.uk/gwas/api/search/downloads/alternative has an unclosed
quote in a field.

 35265 2019-04-10      30804558        Grove J 2019-02-25      Nat Genet
    www.ncbi.nlm.nih.gov/pubmed/30804558    I       dentification of common
genetic risk variants for autism spectrum disorder.    Autism spectrum
disorder        18       ,381 European ancestry cases, 27,969 European
ancestry controls       2,119 European ancestry cases, 142,379 Euro       pean
ancestry controls                               Intergenic
                                            chr11:102751102"-?
chr11:102751102
0                       1       0.037   8E-6    5.096910013008056
        1.1641443       [NR]    Illumina [9112387] (imputed)    N       autism
spectrum disorder        http:/       /www.ebi.ac.uk/efo/EFO_0003756
GCST007556      Genome-wide genotyping array

On Thu, Apr 30, 2020 at 6:59 AM Martin Morgan <mtmorgan.bioc using gmail.com>
wrote:

> I'd look instead at or around line 35264 for use of quotes, e.g., "3'
> DNA", and change the argument read.delim(quote = "") (though I never get
> that right so probably wrong again...). A comment character might also be a
> problem.
>
> If you point to the location of the file I could investigate further...
>
> Martin
>
> On 4/30/20, 6:55 AM, "Bioc-devel on behalf of Vincent Carey" <
> bioc-devel-bounces using r-project.org on behalf of stvjc using channing.harvard.edu>
> wrote:
>
>     The EBI GWAS catalog is large -- now the download is over 100MB for
> 179K
>     associations.  A "bug" in the
>     package was reported, so I acquired the file by hand.
>
>     > nn =
> read.delim("gwas_catalog_v1.0.2-associations_e98_r2020-03-08.tsv",
>     sep="\t")
>
>     *Warning message:*
>
>     *In scan(file = file, what = what, sep = sep, quote = quote, dec =
> dec,  :*
>
>     *  EOF within quoted string*
>
>     > dim(nn)
>
>     [1] 35264    38
>
>
>     The "bug" is the number 35264 ...
>
>
>     >
>
>     [1]+  Stopped                 R
>
>     %vjcair> wc gwas_cat*tsv
>
>       179365 13243516 120140148
>     gwas_catalog_v1.0.2-associations_e98_r2020-03-08.tsv
>
>     %vjcair> vi gwas_cat*tsv
>
>     %vjcair> fg
>
>     R
>
>
>     > tail(nn)
>
>     *Error: C stack usage  98161262 is too close to the limit*
>
>
>     *Maybe my R needs to be updated.*
>
>
>     *If I use data.table::fread to consume the tsv over HTTP all seems
> well,
>     and perhaps*
>
>     *I will switch to that.*
>
>     --
>     The information in this e-mail is intended only for the
> ...{{dropped:18}}
>
>     _______________________________________________
>     Bioc-devel using r-project.org mailing list
>     https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
The information in this e-mail is intended only for the ...{{dropped:18}}



More information about the Bioc-devel mailing list