[R] Parsing "back" to API strcuture
Eric Fail
eric.fail at gmx.us
Thu Sep 13 20:32:07 CEST 2012
Dear Jim,
Thank you for your response I appreciate your effort!
It is close, I must admit that. What I am looking for is an object
that is identical to 'RAW.API,' or at least in the stricture (I guess
i do not need the ","`Content-Type`" = structure(c("text/html",
"utf-8"), .Names = c("",
"charset")))" part.
When I investigate 'x.out' it also have the NA's. I've tried to fix
it, but I had to give up. It is strange because getting there seems so
easy (warning false logic!).
Here is what I got on my looong and alternative route in the hope that
someone on the list might be able to help
RAW.API <- structure("id,event_arm,name,dob,pushed_text,pushed_calc,complete\n\"01\",\"event_1_arm_1\",\"John\",\"1979-05-01\",\"\",\"\",2\n\"01\",\"event_2_arm_1\",\"John\",\"2012-09-02\",\"abc\",\"123\",1\n\"01\",\"event_3_arm_1\",\"John\",\"2012-09-10\",\"\",\"\",2\n\"02\",\"event_1_arm_1\",\"Mary\",\"1951-09-10\",\"def\",\"456\",2\n\"02\",\"event_2_arm_1\",\"Mary\",\"1978-09-12\",\"\",\"\",2\n",
"`Content-Type`" = structure(c("text/html", "utf-8"), .Names =
c("","charset")))
# I used an alternative way of converting it to a dataset to keep the
leading 0 in the id variables
x <- read.table(file = textConnection(RAW.API ), header = TRUE, sep =
",", na.strings = "", stringsAsFactors = FALSE, colClasses ="character")
x
# now put it back into the same string; write.csv does quote alphanumerics
write.csv(x, textConnection('output', 'w'), row.names = FALSE)
unlockBinding("output", env = .GlobalEnv)
# fixes the problem with the header
output[1] <- gsub("\\\"", "", output[1])
# removes NAs
output <- gsub("NA", "\"\"", output)
# removes "\ at the beginning of each line
output <- gsub("^\\\"", "", output)
# removes an " at the end of each line
output <- gsub("\\\"$", "", output)
# same as before
x.out <- paste(output, collapse = '\n\"')
# adds an line break at the end
x.out <- gsub("$", "\n", x.out)
# so much manual gsub ...
Any help would be very much appreciated.
On Wed, Sep 12, 2012 at 5:54 PM, jim holtman <jholtman at gmail.com> wrote:
> This is close, but it does quote the header names, but does produce
> the same dataframe when read back in:
>
>> RAW.API <- structure("id,event_arm,name,dob,pushed_text,pushed_calc,complete\n\"01\",\"event_1_arm_1\",\"John\",\"1979-05-01\",\"\",\"\",2\n\"01\",\"event_2_arm_1\",\"John\",\"2012-09-02\",\"abc\",\"123\",1\n\"01\",\"event_3_arm_1\",\"John\",\"2012-09-10\",\"\",\"\",2\n\"02\",\"event_1_arm_1\",\"Mary\",\"1951-09-10\",\"def\",\"456\",2\n\"02\",\"event_2_arm_1\",\"Mary\",\"1978-09-12\",\"\",\"\",2\n", "`Content-Type`" = structure(c("text/html", "utf-8"), .Names = c("", "charset")))
>> x <- read.csv(textConnection(RAW.API), as.is = TRUE)
>> x
> id event_arm name dob pushed_text pushed_calc complete
> 1 1 event_1_arm_1 John 1979-05-01 NA 2
> 2 1 event_2_arm_1 John 2012-09-02 abc 123 1
> 3 1 event_3_arm_1 John 2012-09-10 NA 2
> 4 2 event_1_arm_1 Mary 1951-09-10 def 456 2
> 5 2 event_2_arm_1 Mary 1978-09-12 NA 2
>>
>> # now put it back into the same string; write.csv does quote alphanumerics
>> write.csv(x, textConnection('output', 'w'), row.names = FALSE)
>> x.out <- paste(output, collapse = '\n')
>> # read it back in to show it is the same
>> x.in <- read.csv(textConnection(x.out), as.is = TRUE)
>> x.in
> id event_arm name dob pushed_text pushed_calc complete
> 1 1 event_1_arm_1 John 1979-05-01 NA 2
> 2 1 event_2_arm_1 John 2012-09-02 abc 123 1
> 3 1 event_3_arm_1 John 2012-09-10 NA 2
> 4 2 event_1_arm_1 Mary 1951-09-10 def 456 2
> 5 2 event_2_arm_1 Mary 1978-09-12 NA 2
>>
>
>
> On Wed, Sep 12, 2012 at 8:21 PM, Eric Fail <eric.fail at gmx.us> wrote:
>> Dear R experts,
>>
>> I'm reading data from an online database via API and it gets delivered in this messy comma separated structure,
>>
>>> RAW.API <- structure("id,event_arm,name,dob,pushed_text,pushed_calc,complete\n\"01\",\"event_1_arm_1\",\"John\",\"1979-05-01\",\"\",\"\",2\n\"01\",\"event_2_arm_1\",\"John\",\"2012-09-02\",\"abc\",\"123\",1\n\"01\",\"event_3_arm_1\",\"John\",\"2012-09-10\",\"\",\"\",2\n\"02\",\"event_1_arm_1\",\"Mary\",\"1951-09-10\",\"def\",\"456\",2\n\"02\",\"event_2_arm_1\",\"Mary\",\"1978-09-12\",\"\",\"\",2\n", "`Content-Type`" = structure(c("text/html", "utf-8"), .Names = c("", "charset")))
>>
>> I have this script that nicely parses it into a data frame,
>>
>>> (df <- read.table(file = textConnection(RAW.API), header = TRUE,
>> sep = ",", na.strings = "", stringsAsFactors = FALSE))
>>> id event_arm name dob pushed_text pushed_calc complete
>>> 1 1 event_1_arm_1 John 1979-05-01 <NA> NA 2
>>> 2 1 event_2_arm_1 John 2012-09-02 abc 123 1
>>> 3 1 event_3_arm_1 John 2012-09-10 <NA> NA 2
>>> 4 2 event_1_arm_1 Mary 1951-09-10 def 456 2
>>> 5 2 event_2_arm_1 Mary 1978-09-12 <NA> NA 2
>>
>> I then do some calculations and write them to pushed_text and pushed_calc whereafter I need to format the data back to the messy comma separated structure it came in.
>>
>> I imagine something like this,
>>
>>> API.back <- `some magic command`(df, ...)
>>
>>> identical(RAW.API, API.back)
>>> [1] TRUE
>>
>> Some command that can format my data from the data frame I made, df, back to the structure that the raw API-object came in, RAW.API.
>>
>> Any help would be appreciated.
>>
>> Thanks for reading.
>>
>> Eric
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
More information about the R-help
mailing list