[Rd] read.csv

Kevin Coombes kev|n@r@coombe@ @end|ng |rom gm@||@com
Sat Apr 27 17:28:33 CEST 2024


I was horrified when I saw John Weinstein's article about Excel turning
gene names into dates. Mainly because I had been complaining about that
phenomenon for years, and it never remotely occurred to me that you could
get a publication out of it.

I eventually rectified the situation by publishing "Blasted Cell Line
Names", describing how to match different researchers' recording of the
names of cell lines, by applying techniques for DNA or protein sequence
alignment.

Best,
   Kevin

On Tue, Apr 16, 2024, 4:51 PM Reed A. Cartwright <racartwright using gmail.com>
wrote:

> Gene names being misinterpreted by spreadsheet software (read.csv is
> no different) is a classic issue in bioinformatics. It seems like
> every practitioner ends up encountering this issue in due time. E.g.
>
> https://pubmed.ncbi.nlm.nih.gov/15214961/
>
> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7
>
> https://www.nature.com/articles/d41586-021-02211-4
>
>
> https://www.theverge.com/2020/8/6/21355674/human-genes-rename-microsoft-excel-misreading-dates
>
>
> On Tue, Apr 16, 2024 at 3:46 AM jing hua zhao <jinghuazhao using hotmail.com>
> wrote:
> >
> > Dear R-developers,
> >
> > I came to a somewhat unexpected behaviour of read.csv() which is trivial
> but worthwhile to note -- my data involves a protein named "1433E" but to
> save space I drop the quote so it becomes,
> >
> > Gene,SNP,prot,log10p
> > YWHAE,13:62129097_C_T,1433E,7.35
> > YWHAE,4:72617557_T_TA,1433E,7.73
> >
> > Both read.cv() and readr::read_csv() consider prot(ein) name as
> (possibly confused by scientific notation) numeric 1433 which only alerts
> me when I tried to combine data,
> >
> > all_data <- data.frame()
> > for (protein in proteins[1:7])
> > {
> >    cat(protein,":\n")
> >    f <- paste0(protein,".csv")
> >    if(file.exists(f))
> >    {
> >      p <- read.csv(f)
> >      print(p)
> >      if(nrow(p)>0) all_data  <- bind_rows(all_data,p)
> >    }
> > }
> >
> > proteins[1:7]
> > [1] "1433B" "1433E" "1433F" "1433G" "1433S" "1433T" "1433Z"
> >
> > dplyr::bind_rows() failed to work due to incompatible types nevertheless
> rbind() went ahead without warnings.
> >
> > Best wishes,
> >
> >
> > Jing Hua
> >
> > ______________________________________________
> > R-devel using r-project.org mailing list
> >
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-devel__;!!IKRxdwAv5BmarQ!YJzURlAK1O3rlvXvq9xl99aUaYL5iKm9gnN5RBi-WJtWa5IEtodN3vaN9pCvRTZA23dZyfrVD7X8nlYUk7S1AK893A$
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list