[R] how to lmport this dataframe into R

Gabor Grothendieck ggrothendieck at gmail.com
Sat Apr 24 21:40:24 CEST 2010


Define a function dejunkify which removes commas and parens and
converts cols 2 onwards to numeric.  Assuming that the lines near the
end of the file all start with B use "B" as the comment character and
read in the file and then dejunkify it.  Then read in the lines again
and extract out the lines that begin with B.  Using strapply in gsubfn
split it into the required 4 fields and dejunkify it.  If the actual
data does not follow these rules you may need to modify this somewhat.

library(gsubfn)

dejunkify <- function(x) {
	for(i in 2:ncol(x)) x[[i]] <- as.numeric(gsub("[,()]", "", x[[i]]))
	x
}

DF <- read.table("myfile", skip = 1, comment = "B",
		na.strings = "n", as.is = TRUE)
DF <- dejunkify(DF[-c(4,8,12))

Other <- grep("^B", readLines("myfile"), value = TRUE)
Other <- as.data.frame(strapply(Other, "(^.*) (\\S+) (\\S+) (\\S+)",
c, simplify = rbind))
Other <- dejunkify(Other)



On Sat, Apr 24, 2010 at 1:40 PM, Felipe Carrillo
<mazatlanmexico at yahoo.com> wrote:
> Hi:
> I need help with a dataframe(see pic attached). is a mix of dates and text.
> I want to create a table either using latex function from hmisc or xtable. I
> already know how to do this but the problem is getting the dataframe into R.
> I don't have a reproducible example but I am hoping that the pic attachment
> will make it to you. If someone is interested in helping with this task I could
> send the excel file offlist. Thanks
>
>
> Date First     Second Third
> 2/26/2010 0 ( - ) 0 ( - ) 7,002 (33 - 39)
> 2/27/2010 n (0 - 0) n (0 - 0) n (0 - 0)
> 2/28/2010 357 (123 - 123) 0 ( - ) 130,342 (29 - 57)
> 3/1/2010 144 (95 - 152) 99 (65 - 71) 22,741 (31 - 56)
> 3/2/2010 73 (126 - 152) 0 ( - ) 8,365 (31 - 53)
> 3/3/2010 43 (108 - 108) 86 (66 - 76) 5,962 (33 - 60)
> 3/4/2010 n (0 - 0) n (0 - 0) n (0 - 0)
> 3/5/2010 270 (101 - 140) 0 ( - ) 22,461 (30 - 61)
> 3/6/2010 121 (111 - 112) 40 (66 - 66) 12,485 (31 - 55)
> 3/7/2010 0 ( - ) 0 ( - ) 7,352 (31 - 56)
> 3/8/2010 34 (111 - 111) 33 (74 - 74) 2,908 (32 - 48)
> 3/9/2010 102 (111 - 140) 0 ( - ) 3,265 (27 - 48)
> 3/10/2010 0 ( - ) 35 (66 - 66) 1,993 (30 - 55)
> 3/11/2010 35 (125 - 125) 35 (70 - 70) 1,445 (33 - 62)
> Biweekly Lower 90% Confidence Interval -537 -549 35,097
> Biweekly Total 1,425 402 296,085
> Biweekly Upper 90% Confidence Interval 3,388 1,353 557,074
> Brood-year  Lower 90% Confidence Interval 2,578,499 74,306 2,249,920
> Brood Year Total 4,455,877 314,206 7,347,719
> Brood-year Upper 90% Confidence Interval 6,333,255 541,552 12,058,021
>
> Felipe D. Carrillo
> Supervisory Fishery Biologist
> Department of the Interior
> US Fish & Wildlife Service
> California, USA
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list