[R] Importing data from text file with mixed format
William Dunlap
wdunlap at tibco.com
Sun Oct 25 22:30:53 CET 2009
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of delnatan
> Sent: Saturday, October 24, 2009 8:32 PM
> To: r-help at r-project.org
> Subject: [R] Importing data from text file with mixed format
>
>
> Hi,
> I'm having difficulty importing my textfile that looks
> something like this:
>
> #begin text file
> Timepoint 1
> ObjectNumber Volume SurfaceArea
> 1 5.3 9.7
> 2 4.9 8.3
> 3 5.0 9.1
> 4 3.5 7.8
>
> Timepoint 2
> ObjectNumber Volume SurfaceArea
> 1 5.1 9.0
> 2 4.7 8.9
> 3 4.3 8.3
> 4 4.2 7.9
>
> ... #goes on to Timepoint 80
>
> How would I import this data into a list containing
> data.frame for each
> timepoint?
> I'd like my data to be organized like this:
>
> >myList
> [[1]]
> ObjectNumber Volume SurfaceArea
> 1 1 5.3 9.7
> 2 2 4.9 8.3
> 3 3 5.0 9.1
> 4 4 3.5 7.8
>
> [[2]]
> ObjectNumber Volume SurfaceArea
> 1 1 5.1 9.0
> 2 2 4.7 8.9
> 3 3 4.3 8.3
> 4 4 4.2 7.9
The following function reads that text file into one data.frame,
which has a Timepoint column, which is a format I usually find
more convenient. You can use split(data, data$Timepoint)
to get to the format you asked for. If you use the one-data-frame
format you can use the cast and melt functions from the reshape
package to rearrange it.
readMyData <- function (file) {
# read every line in the file
lines <- readLines(file)
# drop empty lines
lines <- grep("^[[:space:]]*$", lines, value=TRUE, invert=TRUE)
# find and check header lines
isHeaderLine <- regexpr("^ObjectNumber", lines) > 0
if (sum(isHeaderLine)==0)
stop("No header lines of form 'ObjectNumber ...'")
if (length(u <- unique(lines[isHeaderLine]))>1)
stop("Header lines vary: ", paste(sQuote(head(u)), collapse=",
"))
col.names <- strsplit(lines[which(isHeaderLine)[1]],
"[[:space:]]+")[[1]]
# after making column names from header lines, drop header lines
lines <- lines[!isHeaderLine]
# process Timepoint lines
isTimepointLine <- regexpr("^Timepoint", lines) > 0
if (sum(isTimepointLine)==0)
stop("No lines of form 'Timepoint <number>'")
timepoints <- sub("^Timepoint[[:space:]]*", "",
lines[isTimepointLine])
timepoints <- as.integer(timepoints)
if (any(is.na(timepoints)))
stop("Non-integer found in a Timepoint line: ",
sQuote(lines[isTimepointLine][which(is.na(timepoints))[1]]))
nRowsPerTimepoint <-
diff(c(which(isTimepointLine),length(isTimepointLine)+1)) - 1
# drop Timepoint lines. Remaining lines should be data lines
lines <- lines[!isTimepointLine]
# An error in read.table means there were lines we should have
dropped
result <- read.table(header=FALSE,
row.names=NULL,
col.names=col.names,
textConnection(lines))
# Add Timepoint column
result$Timepoint <- rep(timepoints, nRowsPerTimepoint)
result
}
E.g.,
> data <- readMyData("c:/temp/t.txt")
> data
ObjectNumber Volume SurfaceArea Timepoint
1 1 5.3 9.7 1
2 2 4.9 8.3 1
3 3 5.0 9.1 1
4 4 3.5 7.8 1
5 1 5.1 9.0 2
6 2 4.7 8.9 2
7 3 4.3 8.3 2
8 4 4.2 7.9 2
> split(data, data$Timepoint)
$`1`
ObjectNumber Volume SurfaceArea Timepoint
1 1 5.3 9.7 1
2 2 4.9 8.3 1
3 3 5.0 9.1 1
4 4 3.5 7.8 1
$`2`
ObjectNumber Volume SurfaceArea Timepoint
5 1 5.1 9.0 2
6 2 4.7 8.9 2
7 3 4.3 8.3 2
8 4 4.2 7.9 2
> mdata <- melt(data, id=c("ObjectNumber","Timepoint"))
> cast(mdata, Timepoint~variable, fun.aggregate=c,
subset=variable=="SurfaceArea")
Timepoint SurfaceArea_X1 SurfaceArea_X2 SurfaceArea_X3 SurfaceArea_X4
1 1 9.7 8.3 9.1 7.8
2 2 9.0 8.9 8.3 7.9
> cast(mdata, ObjectNumber~variable, fun.aggregate=c,
subset=variable=="SurfaceArea")
ObjectNumber SurfaceArea_X1 SurfaceArea_X2
1 1 9.7 9.0
2 2 8.3 8.9
3 3 9.1 8.3
4 4 7.8 7.9
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
>
> -Daniel
> --
> View this message in context:
> http://www.nabble.com/Importing-data-from-text-file-with-mixed
-format-tp26045031p26045031.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list