[R-sig-Debian] read.csv fails in R console in Ubuntu terminal but works in RStudio after R 3.6.3 upgrade to R 4.0.2
David Winsemius
dw|n@em|u@ @end|ng |rom comc@@t@net
Thu Jul 16 04:15:02 CEST 2020
On 7/15/20 1:35 PM, Dirk Eddelbuettel wrote:
> On 15 July 2020 at 16:16, Sam H wrote:
> | I am trying to download some data using read.csv and it works perfectly in
> | RStudio and fails in the R console in the terminal in Ubuntu 18.04 after
> | upgrading from R 3.6.3 to 4.0.2. Before upgrading this worked in the R
> | console in the terminal also without any issues.
> |
> | Why would that be? How to fix this?
> |
> | Below please find R code output and sessionInfo().
> |
> | *Works in RStudio*
> |
> | > read.csv("https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download", header=TRUE, as.is=TRUE, na="n/a")
>
> Ok, let's stop right here. First off, for good debugging it helps to separate
>
> - downloading a file via R from
> - reading a file
> - maybe varying the arguments you give there
>
> In my case this got easier. I clicked on the link (in Ubuntu 20.04) and it
> downloaded it. From there few problems. `read.csv()` just reads it:
In fact one can use the fread approach directly, rather than first using
your system or your browser to download the copy:
z <-
data.table::fread("https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download",
header=TRUE)
Downloaded 486840 bytes...>
> str(z)
Classes ‘data.table’ and 'data.frame': 3631 obs. of 9 variables:
$ Symbol : chr "TXG" "YI" "PIH" "PIHPP" ...
$ Name : chr "10x Genomics, Inc." "111, Inc." "1347 Property
Insurance Holdings, Inc." "1347 Property Insurance Holdings, Inc." ...
$ LastSale : chr "90.93" "6.31" "4.528" "24.35" ...
$ MarketCap : chr "$8.94B" "$519.69M" "$27.48M" "n/a" ...
$ IPOyear : chr "2019" "2018" "2014" "n/a" ...
$ Sector : chr "Capital Goods" "Health Care" "Finance"
"Finance" ...
$ industry : chr "Biotechnology: Laboratory Analytical
Instruments" "Medical/Nursing Services" "Property-Casualty Insurers"
"Property-Casualty Insurers" ...
$ Summary Quote: chr "https://old.nasdaq.com/symbol/txg"
"https://old.nasdaq.com/symbol/yi" "https://old.nasdaq.com/symbol/pih"
"https://old.nasdaq.com/symbol/pihpp" ...
$ V9 : logi NA NA NA NA NA NA ...
- attr(*, ".internal.selfref")=<externalptr>
I had earlier experienced the hanging of the original example in Ubuntu
18.04 using R 3.6.1. I get teh same result in either a Terminal hosted R
session or an Rstudio R session.
(It does leave hanging the question of why `read.csv` is failing.)
--
David.
>
> edd using rob:~/Downloads$ Rscript -e 'data.table::fread("companylist.csv", header=TRUE)'
> Symbol Name LastSale MarketCap IPOyear Sector industry Summary Quote V9
> 1: TXG 10x Genomics, Inc. 88.91 $8.75B 2019 Capital Goods Biotechnology: Laboratory Analytical Instruments https://old.nasdaq.com/symbol/txg NA
> 2: YI 111, Inc. 6.64 $546.87M 2018 Health Care Medical/Nursing Services https://old.nasdaq.com/symbol/yi NA
> 3: PIH 1347 Property Insurance Holdings, Inc. 4.528 $27.48M 2014 Finance Property-Casualty Insurers https://old.nasdaq.com/symbol/pih NA
> 4: PIHPP 1347 Property Insurance Holdings, Inc. 24.8631 n/a n/a Finance Property-Casualty Insurers https://old.nasdaq.com/symbol/pihpp NA
> 5: TURN 180 Degree Capital Corp. 1.67 $51.97M n/a Finance Finance/Investors Services https://old.nasdaq.com/symbol/turn NA
> ---
> 3622: ZS Zscaler, Inc. 122.43 $15.98B 2018 Technology EDP Services https://old.nasdaq.com/symbol/zs NA
> 3623: ZUMZ Zumiez Inc. 25.55 $649.76M 2005 Consumer Services Clothing/Shoe/Accessory Stores https://old.nasdaq.com/symbol/zumz NA
> 3624: ZYNE Zynerba Pharmaceuticals, Inc. 3.41 $85.08M 2015 Health Care Major Pharmaceuticals https://old.nasdaq.com/symbol/zyne NA
> 3625: ZYXI Zynex, Inc. 26.22 $870.31M n/a Health Care Biotechnology: Electromedical & Electrotherapeutic Apparatus https://old.nasdaq.com/symbol/zyxi NA
> 3626: ZNGA Zynga Inc. 9.82 $10.54B 2011 Technology EDP Services https://old.nasdaq.com/symbol/znga NA
> edd using rob:~/Downloads$
>
> For kicks, same with data.table:
>
> edd using rob:~/Downloads$ Rscript -e 'str(read.csv("companylist.csv"))'
> 'data.frame': 3626 obs. of 9 variables:
> $ Symbol : chr "TXG" "YI" "PIH" "PIHPP" ...
> $ Name : chr "10x Genomics, Inc." "111, Inc." "1347 Property Insurance Holdings, Inc." "1347 Property Insurance Holdings, Inc." ...
> $ LastSale : chr "88.91" "6.64" "4.528" "24.8631" ...
> $ MarketCap : chr "$8.75B" "$546.87M" "$27.48M" "n/a" ...
> $ IPOyear : chr "2019" "2018" "2014" "n/a" ...
> $ Sector : chr "Capital Goods" "Health Care" "Finance" "Finance" ...
> $ industry : chr "Biotechnology: Laboratory Analytical Instruments" "Medical/Nursing Services" "Property-Casualty Insurers" "Property-Casualty Insurers" ...
> $ Summary.Quote: chr "https://old.nasdaq.com/symbol/txg" "https://old.nasdaq.com/symbol/yi" "https://old.nasdaq.com/symbol/pih" "https://old.nasdaq.com/symbol/pihpp" ...
> $ X : logi NA NA NA NA NA NA ...
> edd using rob:~/Downloads$
>
> So in short, if you have a problem, it is not likely coming from the Ubuntu
> binary for R 4.0.2 which I am running here.
>
> Maybe start by downloading the file? You could have firewall or other
> issues. We can't tell. And we can't reproduce the issue.
>
> Good luck, Dirk
>
More information about the R-SIG-Debian
mailing list