[Bioc-devel] a day in the life of gwascat
Vincent Carey
@tvjc @end|ng |rom ch@nn|ng@h@rv@rd@edu
Thu Apr 30 13:48:22 CEST 2020
This file trips up fread around record 170349, inconsistently ... I haven't
figured that out yet.
readLines, strsplit may be the ultimate solution.
On Thu, Apr 30, 2020 at 7:15 AM Vincent Carey <stvjc using channing.harvard.edu>
wrote:
> right, line 35265 of
> http://www.ebi.ac.uk/gwas/api/search/downloads/alternative has an
> unclosed quote in a field.
>
> 35265 2019-04-10 30804558 Grove J 2019-02-25 Nat Genet
> www.ncbi.nlm.nih.gov/pubmed/30804558 I dentification of
> common genetic risk variants for autism spectrum disorder. Autism
> spectrum disorder 18 ,381 European ancestry cases, 27,969
> European ancestry controls 2,119 European ancestry cases, 142,379
> Euro pean ancestry controls Intergenic
>
> chr11:102751102"-? chr11:102751102 0 1 0.037
> 8E-6 5.096910013008056 1.1641443 [NR] Illumina
> [9112387] (imputed) N autism spectrum disorder http:/
> /www.ebi.ac.uk/efo/EFO_0003756 GCST007556 Genome-wide
> genotyping array
>
> On Thu, Apr 30, 2020 at 6:59 AM Martin Morgan <mtmorgan.bioc using gmail.com>
> wrote:
>
>> I'd look instead at or around line 35264 for use of quotes, e.g., "3'
>> DNA", and change the argument read.delim(quote = "") (though I never get
>> that right so probably wrong again...). A comment character might also be a
>> problem.
>>
>> If you point to the location of the file I could investigate further...
>>
>> Martin
>>
>> On 4/30/20, 6:55 AM, "Bioc-devel on behalf of Vincent Carey" <
>> bioc-devel-bounces using r-project.org on behalf of stvjc using channing.harvard.edu>
>> wrote:
>>
>> The EBI GWAS catalog is large -- now the download is over 100MB for
>> 179K
>> associations. A "bug" in the
>> package was reported, so I acquired the file by hand.
>>
>> > nn =
>> read.delim("gwas_catalog_v1.0.2-associations_e98_r2020-03-08.tsv",
>> sep="\t")
>>
>> *Warning message:*
>>
>> *In scan(file = file, what = what, sep = sep, quote = quote, dec =
>> dec, :*
>>
>> * EOF within quoted string*
>>
>> > dim(nn)
>>
>> [1] 35264 38
>>
>>
>> The "bug" is the number 35264 ...
>>
>>
>> >
>>
>> [1]+ Stopped R
>>
>> %vjcair> wc gwas_cat*tsv
>>
>> 179365 13243516 120140148
>> gwas_catalog_v1.0.2-associations_e98_r2020-03-08.tsv
>>
>> %vjcair> vi gwas_cat*tsv
>>
>> %vjcair> fg
>>
>> R
>>
>>
>> > tail(nn)
>>
>> *Error: C stack usage 98161262 is too close to the limit*
>>
>>
>> *Maybe my R needs to be updated.*
>>
>>
>> *If I use data.table::fread to consume the tsv over HTTP all seems
>> well,
>> and perhaps*
>>
>> *I will switch to that.*
>>
>> --
>> The information in this e-mail is intended only for the
>> ...{{dropped:18}}
>>
>> _______________________________________________
>> Bioc-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
--
The information in this e-mail is intended only for the ...{{dropped:18}}
More information about the Bioc-devel
mailing list