[Rd] read.csv

peter dalgaard pd@|gd @end|ng |rom gm@||@com
Tue Apr 16 14:03:40 CEST 2024


Hum...

This boils down to

> as.numeric("1.23e")
[1] 1.23
> as.numeric("1.23e-")
[1] 1.23
> as.numeric("1.23e+")
[1] 1.23

which in turn comes from this code in src/main/util.c (function R_strtod)

    if (*p == 'e' || *p == 'E') {
        int expsign = 1;
        switch(*++p) {
        case '-': expsign = -1;
        case '+': p++;
        default: ;
        }
        for (n = 0; *p >= '0' && *p <= '9'; p++) n = (n < MAX_EXPONENT_PREFIX) ? n * 10 + (*p - '0') : n;
        expn += expsign * n;
    }

which sets the exponent to zero even if the for loop terminates immediately.  

This might qualify as a bug, as it differs from the C function strtod which accepts

"A sequence of digits, optionally containing a decimal-point character (.), optionally followed by an exponent part (an e or E character followed by an optional sign and a sequence of digits)."

[Of course, there would be nothing to stop e.g. "1433E1" from being converted to numeric.]

-pd


> On 16 Apr 2024, at 12:46 , jing hua zhao <jinghuazhao using hotmail.com> wrote:
> 
> Dear R-developers,
> 
> I came to a somewhat unexpected behaviour of read.csv() which is trivial but worthwhile to note -- my data involves a protein named "1433E" but to save space I drop the quote so it becomes,
> 
> Gene,SNP,prot,log10p
> YWHAE,13:62129097_C_T,1433E,7.35
> YWHAE,4:72617557_T_TA,1433E,7.73
> 
> Both read.cv() and readr::read_csv() consider prot(ein) name as (possibly confused by scientific notation) numeric 1433 which only alerts me when I tried to combine data,
> 
> all_data <- data.frame()
> for (protein in proteins[1:7])
> {
>   cat(protein,":\n")
>   f <- paste0(protein,".csv")
>   if(file.exists(f))
>   {
>     p <- read.csv(f)
>     print(p)
>     if(nrow(p)>0) all_data  <- bind_rows(all_data,p)
>   }
> }
> 
> proteins[1:7]
> [1] "1433B" "1433E" "1433F" "1433G" "1433S" "1433T" "1433Z"
> 
> dplyr::bind_rows() failed to work due to incompatible types nevertheless rbind() went ahead without warnings.
> 
> Best wishes,
> 
> 
> Jing Hua
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com



More information about the R-devel mailing list