[R] a simple problem
David Winsemius
dwinsemius at comcast.net
Fri Mar 4 17:03:58 CET 2011
On Mar 4, 2011, at 9:50 AM, Asan Ramzan wrote:
> Hello R-help
>
> I am working with large data table that have the occasional label,
> a particular time point in an experiment. E.g:
>
> "Time (min)", "R1 R1", "R2 R1", "R3 R1", "R4 R1"
> .909, 1.117, 1.225, 1.048, 1.258
> 3.942, 1.113, 1.230, 1.049, 1.262
> 3.976, 1.105, 1.226, 1.051, 1.259
> 4.009, 1.114, 1.231, 1.053, 1.259
> 4.042, 1.107, 1.230, 1.048, 1.262
> 4.076, 1.108, 1.226, 1.045, 1.257
> 4.109, 1.109, 1.227, 1.047, 1.259
> 4.142, 1.108, 1.225, 1.052, 1.260
> 4.176, 1.105, 1.222, 1.046, 1.260
> 4.209, 1.106, 1.226, 1.050, 1.258
> 4.242, 1.105, 1.224, 1.047, 1.258
> 4.276, 1.104, 1.223, 1.048, 1.259
> 4.309, 1.106, 1.228, 1.050, 1.260
> 4.342, 1.103, 1.219, 1.049, 1.260
> 4.376, 1.107, 1.225, 1.052, 1.259
> 4.409, 1.105, 1.222, 1.047, 1.258
> 4.442, 1.106, 1.227, 1.048, 1.262
> 4.476, 1.105, 1.222, 1.049, 1.261
> 4.509, 1.102, 1.222, 1.047, 1.259
> 4.555, "Gly sar"
> 4.555, 1.107, 1.224, 1.048, 1.261
> 4.576, 1.109, 1.228, 1.053, 1.259
> 4.609, 1.103, 1.218, 1.046, 1.258
> 4.642, 1.105, 1.223, 1.048, 1.256
> 4.676, 1.108, 1.217, 1.048, 1.260
> 4.709, 1.124, 1.222, 1.047, 1.258
> When I try to read in the table, I get:
>> try<-read.table("200810_01.R",header=T,sep=",")
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
> na.strings, :
> line 136 did not have 5 elements
>
> Is there any way to tell R to ignore these labels or better
> still interpret them as being label for particular time
> points, so when it comes to draw a line graph it is annotated
> with these labels.
Option 1:
Prepare your data properly with an editor:
Option 2:
You could read the file with readLines, identify the offending lines
with grep or grepl, then separate the offenders and non-offenders.
lines <- readLines(textConnection('"Time (min)", "R1 R1", "R2 R1", "R3
R1", "R4 R1"
.909, 1.117, 1.225, 1.048, 1.258
3.942, 1.113, 1.230, 1.049, 1.262
3.976, 1.105, 1.226, 1.051, 1.259
4.009, 1.114, 1.231, 1.053, 1.259
4.042, 1.107, 1.230, 1.048, 1.262
4.076, 1.108, 1.226, 1.045, 1.257
4.109, 1.109, 1.227, 1.047, 1.259
4.142, 1.108, 1.225, 1.052, 1.260
4.176, 1.105, 1.222, 1.046, 1.260
4.209, 1.106, 1.226, 1.050, 1.258
4.242, 1.105, 1.224, 1.047, 1.258
4.276, 1.104, 1.223, 1.048, 1.259
4.309, 1.106, 1.228, 1.050, 1.260
4.342, 1.103, 1.219, 1.049, 1.260
4.376, 1.107, 1.225, 1.052, 1.259
4.409, 1.105, 1.222, 1.047, 1.258
4.442, 1.106, 1.227, 1.048, 1.262
4.476, 1.105, 1.222, 1.049, 1.261
4.509, 1.102, 1.222, 1.047, 1.259
4.555, "Gly sar"
4.555, 1.107, 1.224, 1.048, 1.261
4.576, 1.109, 1.228, 1.053, 1.259
4.609, 1.103, 1.218, 1.046, 1.258
4.642, 1.105, 1.223, 1.048, 1.256
4.676, 1.108, 1.217, 1.048, 1.260
4.709, 1.124, 1.222, 1.047, 1.258'))
read.table(textConnection(
lines[ c(TRUE, !grepl("[[:alpha:]]", lines)[-1]) ]),
skip=1)
# the quotes and spaces don't work well with R column naming
conventions
V1 V2 V3 V4 V5
1 .909, 1.117, 1.225, 1.048, 1.258
2 3.942, 1.113, 1.230, 1.049, 1.262
3 3.976, 1.105, 1.226, 1.051, 1.259
snipped
23 4.642, 1.105, 1.223, 1.048, 1.256
24 4.676, 1.108, 1.217, 1.048, 1.260
25 4.709, 1.124, 1.222, 1.047, 1.258
So even more compact would be:
read.table(textConnection(
lines[ !grepl("[[:alpha:]]", lines) ] ) )
Using the non-negated grepl expression should get you all the "labels"
lines
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list