[R] R - Reading a horizontally arranged csv file, replacing values and saving again
William Dunlap
wdun|@p @end|ng |rom t|bco@com
Wed Oct 3 18:51:36 CEST 2018
Using read.table to read data with a variable number of entries per row is
probably a mistake - data.frames (which read.table returns) are not meant
for this sort of data. You want to store the data in R as a list with
names. E.g.,
readYourFormat <- function(connection) {
# connection is a file name or connection object referring
# to text of the form
# Header A,Some text
# Header B,<number>,<number>,<number>
# Header C,<number>
# Header D,<number>,<number>,<number>,<number>
lines <- readLines(connection)
lines <- lines[ nchar(lines)>0 ] # omit empty lines
fields <- strsplit(lines, ",") # split lines by commas into fields
headers <- vapply(fields, function(x)x[1], FUN.VALUE="") # first entry
is header
names(fields) <- headers
contents <- lapply(fields, function(x)x[-1]) # remaining entries are
contents
contents[-1] <- lapply(contents[-1], as.numeric) # convert all but
first line's entries to numeric
contents
}
After reading the data and fiddling with it (but keeping it a named list),
you can write it out in a similar format with
writeYourFormat <- function (data, connection = stdout())
{
stopifnot(is.list(data), !is.null(names(data)))
cat(file = connection, sep = "\n", paste(sep = ",", names(data),
vapply(data, paste, collapse = ",", FUN.VALUE = "")))
}
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Wed, Oct 3, 2018 at 3:05 AM, Manoranjan Muthusamy <
ranjanmano167 using gmail.com> wrote:
> Thanks for the short but informative answer, Bill. But still, each row has
> four columns..right? Although the *NA* is replaced by a blank cell,
> because of the extra comma it still is a four column row. Is there any way
> to avoid/remove the extra comma when NA is replaced which will make it a
> three column row?
>
>
>
> On Mon, Oct 1, 2018 at 4:56 PM William Dunlap <wdunlap using tibco.com> wrote:
>
>> Read the help file for write.table
>> ?write.table
>> and look at the descriptions of its arguments.
>> > df <- data.frame(Text=c("Abe","Ben",NA,"David"), Age=c(19, NA, 12,
>> 10))
>> > write.table(file=stdout(), t(df), sep=",")
>> "V1","V2","V3","V4"
>> "Text","Abe","Ben",NA,"David"
>> "Age","19",NA,"12","10"
>> > write.table(file=stdout(), t(df), quote=FALSE, na="", sep=",")
>> V1,V2,V3,V4
>> Text,Abe,Ben,,David
>> Age,19,,12,10
>>
>>
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>> On Mon, Oct 1, 2018 at 2:24 AM, Manoranjan Muthusamy <
>> ranjanmano167 using gmail.com> wrote:
>>
>>> I have a horizontally arranged CSV file
>>> <https://www.dropbox.com/s/0ukyuifvpq1olqk/samplefile.csv?dl=0> (
>>> samplefile.csv) with headers are in the first column. Also, each row has
>>> a
>>> different number of columns. I want to read this CSV file, replace one of
>>> the cell value and save again as a CSV file with the same format as the
>>> original file with exactly same number of columns and rows. It sounds
>>> like
>>> a simple task, but I am struggling to find a way. I tried to do this with
>>> the help of this
>>> <https://stackoverflow.com/questions/17288197/reading-a-
>>> csv-file-organized-horizontally>
>>> and this
>>> <https://stackoverflow.com/questions/20308621/how-do-i-
>>> write-a-csv-file-in-r-where-my-input-is-written-to-the-file-as-row>
>>> posts,
>>> but still, I can't get the output the way I want. Can somebody help me
>>> here?
>>>
>>> My attempt using the answer in this post
>>> <https://stackoverflow.com/questions/17288197/reading-a-
>>> csv-file-organized-horizontally>
>>> to
>>> read the CSV file (samplefile.csv) gives me the following output where
>>> headers are kinda messed up and empty cells are replaced with NA values
>>> which is not what I want
>>>
>>> aaa <- read.tcsv("samplefile.csv")
>>> aaa
>>>
>>> Header.1 Header.2..units. Header.3..units. Header.3..units..11
>>> Some text 0.0625 0 2648962
>>> <NA> 0.0625 1200 6647473 <NA>
>>> 0.0625 1380 14 <NA>
>>> 0.2500 1500 15 <NA>
>>> 0.6250 1620 NA6 <NA>
>>> 1.3125 1740 NA7 <NA>
>>> 2.4375 1860 NA8 <NA>
>>> 3.5625 1980 NA9 <NA>
>>> 4.6250 2100 NA10 <NA>
>>> 5.0000 2220 NA11 <NA>
>>> 5.0000 2340 NA12 <NA>
>>> 4.6250 2460 NA13 <NA>
>>> 3.5625 2580 NA14 <NA>
>>> 2.4375 2700 NA15 <NA>
>>> 1.3125 2820 NA16 <NA>
>>> 0.6250 2940 NA17 <NA>
>>> 0.2500 3060 NA18 <NA>
>>> 0.0625 3180 NA19 <NA>
>>> 0.0000 3300 NA20 <NA>
>>> 0.0000 18000 NA
>>>
>>> Also, I am not sure how to go back to original format when I save the
>>> file
>>> again after a modification (for example after replacing a cell value)
>>>
>>> I tried saving the file again by using t (transpose) as given below
>>>
>>> write.csv(t(aaa), file ="samplefile_e.csv", row.names=T)
>>>
>>> but still, there are following issues in the saved file
>>>
>>> 1. messed up headers
>>> 2. empty cells are replaced with NA
>>> 3. when I open the file in a text editor all the values are shown as
>>> characters
>>>
>>> Thanks,
>>> Mano
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/
>>> posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
[[alternative HTML version deleted]]
More information about the R-help
mailing list