[R] R - Reading a horizontally arranged csv file, replacing values and saving again

William Dunlap wdun|@p @end|ng |rom t|bco@com
Wed Oct 3 18:51:36 CEST 2018


Using read.table to read data with a variable number of entries per row is
probably a mistake - data.frames (which read.table returns) are not meant
for this sort of data.  You want to store the data in R as a list with
names.  E.g.,

readYourFormat <- function(connection) {
    # connection is a file name or connection object referring
    # to text of the form
    # Header A,Some text
    # Header B,<number>,<number>,<number>
    # Header C,<number>
    # Header D,<number>,<number>,<number>,<number>
    lines <- readLines(connection)
    lines <- lines[ nchar(lines)>0 ] # omit empty lines
    fields <- strsplit(lines, ",") # split lines by commas into fields
    headers <- vapply(fields, function(x)x[1], FUN.VALUE="") # first entry
is header
    names(fields) <- headers
    contents <- lapply(fields, function(x)x[-1]) # remaining entries are
contents
    contents[-1] <- lapply(contents[-1], as.numeric) # convert all but
first line's entries to numeric
    contents
}

After reading the data and fiddling with it (but keeping it a named list),
you can write it out in a similar format with

writeYourFormat <- function (data, connection = stdout())
{
    stopifnot(is.list(data), !is.null(names(data)))
    cat(file = connection, sep = "\n", paste(sep = ",", names(data),
        vapply(data, paste, collapse = ",", FUN.VALUE = "")))
}


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Oct 3, 2018 at 3:05 AM, Manoranjan Muthusamy <
ranjanmano167 using gmail.com> wrote:

> Thanks for the short but informative answer, Bill. But still, each row has
> four columns..right? Although the *NA* is replaced by a blank cell,
> because of the extra comma it still is a four column row. Is there any way
> to avoid/remove the extra comma when NA is replaced which will make it a
> three column row?
>
>
>
> On Mon, Oct 1, 2018 at 4:56 PM William Dunlap <wdunlap using tibco.com> wrote:
>
>> Read the help file for write.table
>>    ?write.table
>> and look at the descriptions of its arguments.
>>   > df <- data.frame(Text=c("Abe","Ben",NA,"David"), Age=c(19, NA, 12,
>> 10))
>>   > write.table(file=stdout(), t(df), sep=",")
>>   "V1","V2","V3","V4"
>>   "Text","Abe","Ben",NA,"David"
>>   "Age","19",NA,"12","10"
>>   > write.table(file=stdout(), t(df), quote=FALSE, na="", sep=",")
>>   V1,V2,V3,V4
>>   Text,Abe,Ben,,David
>>   Age,19,,12,10
>>
>>
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>> On Mon, Oct 1, 2018 at 2:24 AM, Manoranjan Muthusamy <
>> ranjanmano167 using gmail.com> wrote:
>>
>>> I have a horizontally arranged CSV file
>>> <https://www.dropbox.com/s/0ukyuifvpq1olqk/samplefile.csv?dl=0> (
>>> samplefile.csv) with headers are in the first column. Also, each row has
>>> a
>>> different number of columns. I want to read this CSV file, replace one of
>>> the cell value and save again as a CSV file with the same format as the
>>> original file with exactly same number of columns and rows. It sounds
>>> like
>>> a simple task, but I am struggling to find a way. I tried to do this with
>>> the help of this
>>> <https://stackoverflow.com/questions/17288197/reading-a-
>>> csv-file-organized-horizontally>
>>>  and this
>>> <https://stackoverflow.com/questions/20308621/how-do-i-
>>> write-a-csv-file-in-r-where-my-input-is-written-to-the-file-as-row>
>>> posts,
>>> but still, I can't get the output the way I want. Can somebody help me
>>> here?
>>>
>>> My attempt using the answer in this post
>>> <https://stackoverflow.com/questions/17288197/reading-a-
>>> csv-file-organized-horizontally>
>>> to
>>> read the CSV file (samplefile.csv) gives me the following output where
>>> headers are kinda messed up and empty cells are replaced with NA values
>>> which is not what I want
>>>
>>> aaa <- read.tcsv("samplefile.csv")
>>> aaa
>>>
>>>      Header.1 Header.2..units. Header.3..units. Header.3..units..11
>>> Some text           0.0625                0             2648962
>>> <NA>           0.0625             1200             6647473        <NA>
>>>           0.0625             1380                  14        <NA>
>>>      0.2500             1500                  15        <NA>
>>> 0.6250             1620                 NA6        <NA>
>>> 1.3125             1740                 NA7        <NA>
>>> 2.4375             1860                 NA8        <NA>
>>> 3.5625             1980                 NA9        <NA>
>>> 4.6250             2100                 NA10       <NA>
>>> 5.0000             2220                 NA11       <NA>
>>> 5.0000             2340                 NA12       <NA>
>>> 4.6250             2460                 NA13       <NA>
>>> 3.5625             2580                 NA14       <NA>
>>> 2.4375             2700                 NA15       <NA>
>>> 1.3125             2820                 NA16       <NA>
>>> 0.6250             2940                 NA17       <NA>
>>> 0.2500             3060                 NA18       <NA>
>>> 0.0625             3180                 NA19       <NA>
>>> 0.0000             3300                 NA20       <NA>
>>> 0.0000            18000                 NA
>>>
>>> Also, I am not sure how to go back to original format when I save the
>>> file
>>> again after a modification (for example after replacing a cell value)
>>>
>>> I tried saving the file again by using t (transpose) as given below
>>>
>>> write.csv(t(aaa), file ="samplefile_e.csv", row.names=T)
>>>
>>> but still, there are following issues in the saved file
>>>
>>>    1. messed up headers
>>>    2. empty cells are replaced with NA
>>>    3. when I open the file in a text editor all the values are shown as
>>>    characters
>>>
>>> Thanks,
>>> Mano
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/
>>> posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>

	[[alternative HTML version deleted]]




More information about the R-help mailing list