[Rd] read.csv
peter dalgaard
pd@|gd @end|ng |rom gm@||@com
Tue Apr 16 14:03:40 CEST 2024
Hum...
This boils down to
> as.numeric("1.23e")
[1] 1.23
> as.numeric("1.23e-")
[1] 1.23
> as.numeric("1.23e+")
[1] 1.23
which in turn comes from this code in src/main/util.c (function R_strtod)
if (*p == 'e' || *p == 'E') {
int expsign = 1;
switch(*++p) {
case '-': expsign = -1;
case '+': p++;
default: ;
}
for (n = 0; *p >= '0' && *p <= '9'; p++) n = (n < MAX_EXPONENT_PREFIX) ? n * 10 + (*p - '0') : n;
expn += expsign * n;
}
which sets the exponent to zero even if the for loop terminates immediately.
This might qualify as a bug, as it differs from the C function strtod which accepts
"A sequence of digits, optionally containing a decimal-point character (.), optionally followed by an exponent part (an e or E character followed by an optional sign and a sequence of digits)."
[Of course, there would be nothing to stop e.g. "1433E1" from being converted to numeric.]
-pd
> On 16 Apr 2024, at 12:46 , jing hua zhao <jinghuazhao using hotmail.com> wrote:
>
> Dear R-developers,
>
> I came to a somewhat unexpected behaviour of read.csv() which is trivial but worthwhile to note -- my data involves a protein named "1433E" but to save space I drop the quote so it becomes,
>
> Gene,SNP,prot,log10p
> YWHAE,13:62129097_C_T,1433E,7.35
> YWHAE,4:72617557_T_TA,1433E,7.73
>
> Both read.cv() and readr::read_csv() consider prot(ein) name as (possibly confused by scientific notation) numeric 1433 which only alerts me when I tried to combine data,
>
> all_data <- data.frame()
> for (protein in proteins[1:7])
> {
> cat(protein,":\n")
> f <- paste0(protein,".csv")
> if(file.exists(f))
> {
> p <- read.csv(f)
> print(p)
> if(nrow(p)>0) all_data <- bind_rows(all_data,p)
> }
> }
>
> proteins[1:7]
> [1] "1433B" "1433E" "1433F" "1433G" "1433S" "1433T" "1433Z"
>
> dplyr::bind_rows() failed to work due to incompatible types nevertheless rbind() went ahead without warnings.
>
> Best wishes,
>
>
> Jing Hua
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk Priv: PDalgd using gmail.com
More information about the R-devel
mailing list