[R] Parsing "back" to API strcuture
Eric Fail
eric.fail at gmx.us
Tue Sep 18 02:42:32 CEST 2012
Problem solved by Josh O'Brien on stackoverflow,
http://stackoverflow.com/questions/12393004/parsing-back-to-messy-api-strcuture/12435389#12435389
some_magic <- function(df) {
## Replace NA with "", converting column types as needed
df[] <- lapply(df, function(X) {
if(any(is.na(X))) {X[is.na(X)] <- ""; X} else {X}
})
## Print integers in first column as 2-digit character strings
## (DO NOTE: Hardwiring the number of printed digits here is probably
## inadvisable, though needed to _exactly_ reconstitute RAW.API.)
df[[1]] <- sprintf("%02.0f", df[[1]])
## Separately build header and table body, then suture them together
l1 <- paste(names(df), collapse=",")
l2 <- capture.output(write.table(df, sep=",", col.names=FALSE,
row.names=FALSE))
out <- paste0(c(l1, l2, ""), collapse="\n")
## Reattach attributes
att <- list("`Content-Type`" = structure(c("text/html", "utf-8"),
.Names = c("", "charset")))
attributes(out) <- att
out
}
identical(some_magic(df), RAW.API)
# [1] TRUE
On Thu, Sep 13, 2012 at 11:32 AM, Eric Fail <eric.fail at gmx.us> wrote:
> Dear Jim,
>
> Thank you for your response I appreciate your effort!
>
> It is close, I must admit that. What I am looking for is an object
> that is identical to 'RAW.API,' or at least in the stricture (I guess
> i do not need the ","`Content-Type`" = structure(c("text/html",
> "utf-8"), .Names = c("",
> "charset")))" part.
>
> When I investigate 'x.out' it also have the NA's. I've tried to fix
> it, but I had to give up. It is strange because getting there seems so
> easy (warning false logic!).
>
> Here is what I got on my looong and alternative route in the hope that
> someone on the list might be able to help
>
> RAW.API <- structure("id,event_arm,name,dob,pushed_text,pushed_calc,complete\n\"01\",\"event_1_arm_1\",\"John\",\"1979-05-01\",\"\",\"\",2\n\"01\",\"event_2_arm_1\",\"John\",\"2012-09-02\",\"abc\",\"123\",1\n\"01\",\"event_3_arm_1\",\"John\",\"2012-09-10\",\"\",\"\",2\n\"02\",\"event_1_arm_1\",\"Mary\",\"1951-09-10\",\"def\",\"456\",2\n\"02\",\"event_2_arm_1\",\"Mary\",\"1978-09-12\",\"\",\"\",2\n",
> "`Content-Type`" = structure(c("text/html", "utf-8"), .Names =
> c("","charset")))
>
> # I used an alternative way of converting it to a dataset to keep the
> leading 0 in the id variables
> x <- read.table(file = textConnection(RAW.API ), header = TRUE, sep =
> ",", na.strings = "", stringsAsFactors = FALSE, colClasses ="character")
> x
>
> # now put it back into the same string; write.csv does quote alphanumerics
> write.csv(x, textConnection('output', 'w'), row.names = FALSE)
> unlockBinding("output", env = .GlobalEnv)
> # fixes the problem with the header
> output[1] <- gsub("\\\"", "", output[1])
> # removes NAs
> output <- gsub("NA", "\"\"", output)
> # removes "\ at the beginning of each line
> output <- gsub("^\\\"", "", output)
> # removes an " at the end of each line
> output <- gsub("\\\"$", "", output)
> # same as before
> x.out <- paste(output, collapse = '\n\"')
> # adds an line break at the end
> x.out <- gsub("$", "\n", x.out)
>
> # so much manual gsub ...
>
> Any help would be very much appreciated.
>
> On Wed, Sep 12, 2012 at 5:54 PM, jim holtman <jholtman at gmail.com> wrote:
>> This is close, but it does quote the header names, but does produce
>> the same dataframe when read back in:
>>
>>> RAW.API <- structure("id,event_arm,name,dob,pushed_text,pushed_calc,complete\n\"01\",\"event_1_arm_1\",\"John\",\"1979-05-01\",\"\",\"\",2\n\"01\",\"event_2_arm_1\",\"John\",\"2012-09-02\",\"abc\",\"123\",1\n\"01\",\"event_3_arm_1\",\"John\",\"2012-09-10\",\"\",\"\",2\n\"02\",\"event_1_arm_1\",\"Mary\",\"1951-09-10\",\"def\",\"456\",2\n\"02\",\"event_2_arm_1\",\"Mary\",\"1978-09-12\",\"\",\"\",2\n", "`Content-Type`" = structure(c("text/html", "utf-8"), .Names = c("", "charset")))
>>> x <- read.csv(textConnection(RAW.API), as.is = TRUE)
>>> x
>> id event_arm name dob pushed_text pushed_calc complete
>> 1 1 event_1_arm_1 John 1979-05-01 NA 2
>> 2 1 event_2_arm_1 John 2012-09-02 abc 123 1
>> 3 1 event_3_arm_1 John 2012-09-10 NA 2
>> 4 2 event_1_arm_1 Mary 1951-09-10 def 456 2
>> 5 2 event_2_arm_1 Mary 1978-09-12 NA 2
>>>
>>> # now put it back into the same string; write.csv does quote alphanumerics
>>> write.csv(x, textConnection('output', 'w'), row.names = FALSE)
>>> x.out <- paste(output, collapse = '\n')
>>> # read it back in to show it is the same
>>> x.in <- read.csv(textConnection(x.out), as.is = TRUE)
>>> x.in
>> id event_arm name dob pushed_text pushed_calc complete
>> 1 1 event_1_arm_1 John 1979-05-01 NA 2
>> 2 1 event_2_arm_1 John 2012-09-02 abc 123 1
>> 3 1 event_3_arm_1 John 2012-09-10 NA 2
>> 4 2 event_1_arm_1 Mary 1951-09-10 def 456 2
>> 5 2 event_2_arm_1 Mary 1978-09-12 NA 2
>>>
>>
>>
>> On Wed, Sep 12, 2012 at 8:21 PM, Eric Fail <eric.fail at gmx.us> wrote:
>>> Dear R experts,
>>>
>>> I'm reading data from an online database via API and it gets delivered in this messy comma separated structure,
>>>
>>>> RAW.API <- structure("id,event_arm,name,dob,pushed_text,pushed_calc,complete\n\"01\",\"event_1_arm_1\",\"John\",\"1979-05-01\",\"\",\"\",2\n\"01\",\"event_2_arm_1\",\"John\",\"2012-09-02\",\"abc\",\"123\",1\n\"01\",\"event_3_arm_1\",\"John\",\"2012-09-10\",\"\",\"\",2\n\"02\",\"event_1_arm_1\",\"Mary\",\"1951-09-10\",\"def\",\"456\",2\n\"02\",\"event_2_arm_1\",\"Mary\",\"1978-09-12\",\"\",\"\",2\n", "`Content-Type`" = structure(c("text/html", "utf-8"), .Names = c("", "charset")))
>>>
>>> I have this script that nicely parses it into a data frame,
>>>
>>>> (df <- read.table(file = textConnection(RAW.API), header = TRUE,
>>> sep = ",", na.strings = "", stringsAsFactors = FALSE))
>>>> id event_arm name dob pushed_text pushed_calc complete
>>>> 1 1 event_1_arm_1 John 1979-05-01 <NA> NA 2
>>>> 2 1 event_2_arm_1 John 2012-09-02 abc 123 1
>>>> 3 1 event_3_arm_1 John 2012-09-10 <NA> NA 2
>>>> 4 2 event_1_arm_1 Mary 1951-09-10 def 456 2
>>>> 5 2 event_2_arm_1 Mary 1978-09-12 <NA> NA 2
>>>
>>> I then do some calculations and write them to pushed_text and pushed_calc whereafter I need to format the data back to the messy comma separated structure it came in.
>>>
>>> I imagine something like this,
>>>
>>>> API.back <- `some magic command`(df, ...)
>>>
>>>> identical(RAW.API, API.back)
>>>> [1] TRUE
>>>
>>> Some command that can format my data from the data frame I made, df, back to the structure that the raw API-object came in, RAW.API.
>>>
>>> Any help would be appreciated.
>>>
>>> Thanks for reading.
>>>
>>> Eric
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
More information about the R-help
mailing list