[Rd] read.csv

Dirk Eddelbuettel edd @end|ng |rom deb|@n@org
Tue Apr 16 12:52:09 CEST 2024


On 16 April 2024 at 10:46, jing hua zhao wrote:
| Dear R-developers,
| 
| I came to a somewhat unexpected behaviour of read.csv() which is trivial but worthwhile to note -- my data involves a protein named "1433E" but to save space I drop the quote so it becomes,
| 
| Gene,SNP,prot,log10p
| YWHAE,13:62129097_C_T,1433E,7.35
| YWHAE,4:72617557_T_TA,1433E,7.73
| 
| Both read.cv() and readr::read_csv() consider prot(ein) name as (possibly confused by scientific notation) numeric 1433 which only alerts me when I tried to combine data,
| 
| all_data <- data.frame()
| for (protein in proteins[1:7])
| {
|    cat(protein,":\n")
|    f <- paste0(protein,".csv")
|    if(file.exists(f))
|    {
|      p <- read.csv(f)
|      print(p)
|      if(nrow(p)>0) all_data  <- bind_rows(all_data,p)
|    }
| }
| 
| proteins[1:7]
| [1] "1433B" "1433E" "1433F" "1433G" "1433S" "1433T" "1433Z"
| 
| dplyr::bind_rows() failed to work due to incompatible types nevertheless rbind() went ahead without warnings.

You may need to reconsider aiding read.csv() (and alternate reading
functions) by supplying column-type info instead of relying on educated
heuristic guesses which appear to fail here due to the nature of your data.

Other storage formats can store type info. That is generally safer and may be
an option too.

I think this was more of an email for r-help than r-devel.

Dirk

-- 
dirk.eddelbuettel.com | @eddelbuettel | edd using debian.org



More information about the R-devel mailing list