[R-sig-Debian] read.csv fails in R console in Ubuntu terminal but works in RStudio after R 3.6.3 upgrade to R 4.0.2

David Winsemius dw|n@em|u@ @end|ng |rom comc@@t@net
Thu Jul 16 04:15:02 CEST 2020


On 7/15/20 1:35 PM, Dirk Eddelbuettel wrote:
> On 15 July 2020 at 16:16, Sam H wrote:
> | I am trying to download some data using read.csv and it works perfectly in
> | RStudio and fails in the R console in the terminal in Ubuntu 18.04 after
> | upgrading from R 3.6.3 to 4.0.2. Before upgrading this worked in the R
> | console in the terminal also without any issues.
> |
> | Why would that be? How to fix this?
> |
> | Below please find R code output and sessionInfo().
> |
> | *Works in RStudio*
> |
> | > read.csv("https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download", header=TRUE, as.is=TRUE, na="n/a")
>
> Ok, let's stop right here.  First off, for good debugging it helps to separate
>
>   - downloading a file via R from
>   - reading a file
>   - maybe varying the arguments you give there
>
> In my case this got easier. I clicked on the link (in Ubuntu 20.04) and it
> downloaded it. From there few problems. `read.csv()` just reads it:


In fact one can use the fread approach directly, rather than first using 
your system or your browser to download the copy:


z <- 
data.table::fread("https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=1&render=download", 
header=TRUE)
  Downloaded 486840 bytes...>
 > str(z)
Classes ‘data.table’ and 'data.frame':    3631 obs. of  9 variables:
  $ Symbol       : chr  "TXG" "YI" "PIH" "PIHPP" ...
  $ Name         : chr  "10x Genomics, Inc." "111, Inc." "1347 Property 
Insurance Holdings, Inc." "1347 Property Insurance Holdings, Inc." ...
  $ LastSale     : chr  "90.93" "6.31" "4.528" "24.35" ...
  $ MarketCap    : chr  "$8.94B" "$519.69M" "$27.48M" "n/a" ...
  $ IPOyear      : chr  "2019" "2018" "2014" "n/a" ...
  $ Sector       : chr  "Capital Goods" "Health Care" "Finance" 
"Finance" ...
  $ industry     : chr  "Biotechnology: Laboratory Analytical 
Instruments" "Medical/Nursing Services" "Property-Casualty Insurers" 
"Property-Casualty Insurers" ...
  $ Summary Quote: chr  "https://old.nasdaq.com/symbol/txg" 
"https://old.nasdaq.com/symbol/yi" "https://old.nasdaq.com/symbol/pih" 
"https://old.nasdaq.com/symbol/pihpp" ...
  $ V9           : logi  NA NA NA NA NA NA ...
  - attr(*, ".internal.selfref")=<externalptr>


I had earlier experienced the hanging of the original example in Ubuntu 
18.04 using R 3.6.1. I get teh same result in either a Terminal hosted R 
session or an Rstudio R session.

(It does leave hanging the question of why `read.csv` is failing.)

-- 

David.

>
> edd using rob:~/Downloads$ Rscript -e 'data.table::fread("companylist.csv", header=TRUE)'
>        Symbol                                   Name LastSale MarketCap IPOyear            Sector                                                     industry                       Summary Quote V9
>     1:    TXG                     10x Genomics, Inc.    88.91    $8.75B    2019     Capital Goods             Biotechnology: Laboratory Analytical Instruments   https://old.nasdaq.com/symbol/txg NA
>     2:     YI                              111, Inc.     6.64  $546.87M    2018       Health Care                                     Medical/Nursing Services    https://old.nasdaq.com/symbol/yi NA
>     3:    PIH 1347 Property Insurance Holdings, Inc.    4.528   $27.48M    2014           Finance                                   Property-Casualty Insurers   https://old.nasdaq.com/symbol/pih NA
>     4:  PIHPP 1347 Property Insurance Holdings, Inc.  24.8631       n/a     n/a           Finance                                   Property-Casualty Insurers https://old.nasdaq.com/symbol/pihpp NA
>     5:   TURN               180 Degree Capital Corp.     1.67   $51.97M     n/a           Finance                                   Finance/Investors Services  https://old.nasdaq.com/symbol/turn NA
>    ---
> 3622:     ZS                          Zscaler, Inc.   122.43   $15.98B    2018        Technology                                                 EDP Services    https://old.nasdaq.com/symbol/zs NA
> 3623:   ZUMZ                            Zumiez Inc.    25.55  $649.76M    2005 Consumer Services                               Clothing/Shoe/Accessory Stores  https://old.nasdaq.com/symbol/zumz NA
> 3624:   ZYNE          Zynerba Pharmaceuticals, Inc.     3.41   $85.08M    2015       Health Care                                        Major Pharmaceuticals  https://old.nasdaq.com/symbol/zyne NA
> 3625:   ZYXI                            Zynex, Inc.    26.22  $870.31M     n/a       Health Care Biotechnology: Electromedical & Electrotherapeutic Apparatus  https://old.nasdaq.com/symbol/zyxi NA
> 3626:   ZNGA                             Zynga Inc.     9.82   $10.54B    2011        Technology                                                 EDP Services  https://old.nasdaq.com/symbol/znga NA
> edd using rob:~/Downloads$
>
> For kicks, same with data.table:
>
> edd using rob:~/Downloads$ Rscript -e 'str(read.csv("companylist.csv"))'
> 'data.frame':   3626 obs. of  9 variables:
>   $ Symbol       : chr  "TXG" "YI" "PIH" "PIHPP" ...
>   $ Name         : chr  "10x Genomics, Inc." "111, Inc." "1347 Property Insurance Holdings, Inc." "1347 Property Insurance Holdings, Inc." ...
>   $ LastSale     : chr  "88.91" "6.64" "4.528" "24.8631" ...
>   $ MarketCap    : chr  "$8.75B" "$546.87M" "$27.48M" "n/a" ...
>   $ IPOyear      : chr  "2019" "2018" "2014" "n/a" ...
>   $ Sector       : chr  "Capital Goods" "Health Care" "Finance" "Finance" ...
>   $ industry     : chr  "Biotechnology: Laboratory Analytical Instruments" "Medical/Nursing Services" "Property-Casualty Insurers" "Property-Casualty Insurers" ...
>   $ Summary.Quote: chr  "https://old.nasdaq.com/symbol/txg" "https://old.nasdaq.com/symbol/yi" "https://old.nasdaq.com/symbol/pih" "https://old.nasdaq.com/symbol/pihpp" ...
>   $ X            : logi  NA NA NA NA NA NA ...
> edd using rob:~/Downloads$
>
> So in short, if you have a problem, it is not likely coming from the Ubuntu
> binary for R 4.0.2 which I am running here.
>
> Maybe start by downloading the file?  You could have firewall or other
> issues. We can't tell. And we can't reproduce the issue.
>
> Good luck,  Dirk
>



More information about the R-SIG-Debian mailing list