[R] grep help needed

Tue Jul 26 21:35:51 CEST 2005

Thanks for your help, the proposed solutions were much more elegant  
than what I was attempting. I adopted a slight modification of Tom  
Mulholland's solution with a piece from John Fox's solution, but many  
of you had very similar solutions.

require(maptools)
nc <- read.shape(system.file("shapes/sids.shp", package = "maptools") 
[1])
mappolys <- Map2poly(nc, as.character(nc$att.data$FIPSNO))
selected.shapes <- which(nc$att.data$SID74 > 20)
# just to make it a smaller example
submap <- subset(mappolys, nc$att.data$SID74 > 20)

final.data <- NULL
for (j in 1:length(selected.shapes)){
     temp.verts <- matrix(as.vector(submap[[j]]),ncol = 2)
     n <- length(temp.verts[,1])
     temp.order <- 1:n
     temp.data <- cbind(rep(j,n),temp.order,temp.verts)
     final.data <- rbind(final.data,temp.data)
     }
colnames(final.data) <- c("PID", "POS", "X", "Y")
final.data
my.data <- as.data.frame(final.data)
class(my.data) <- c("PolySet", "data.frame")
attr(my.data, "projection") <- "LL"

meta <- nc[2]$att.data[selected.shapes,]
PID <- seq(1,length(submap))
meta.data <- cbind(PID, meta)
class(meta.data) <- c("PolyData", "data.frame")
attr(meta.data, "projection") <- "LL"

It would be nice if a variant of this was incorporated into  
PBSmapping to make it easier to import data from shapefiles!

Thanks again for your help,

Denis Chabot
Le 05-07-26 à 00:48, Mulholland, Tom a écrit :

>> -----Original Message-----
>> From: r-help-bounces at stat.math.ethz.ch
>> [mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of Denis Chabot
>> Sent: Tuesday, 26 July 2005 10:46 AM
>> To: R list
>> Subject: [R] grep help needed
>>
>>
>> Hi,
>>
>> In another thread ("PBSmapping and shapefiles") I asked for an easy
>> way to read "shapefiles" and transform them in data that PBSmapping
>> could use. One person is exploring some ways of doing this,
>> but it is
>> possible I'll have to do this "manually".
>>
>> With package "maptools" I am able to extract the information I need
>> from a shapefile but it is formatted like this:
>>
>> [[1]]
>>             [,1]     [,2]
>> [1,] -55.99805 51.68817
>> [2,] -56.00222 51.68911
>> [3,] -56.01694 51.68911
>> [4,] -56.03781 51.68606
>> [5,] -56.04639 51.68759
>> [6,] -56.04637 51.69445
>> [7,] -56.03777 51.70207
>> [8,] -56.02301 51.70892
>> [9,] -56.01317 51.71578
>> [10,] -56.00330 51.73481
>> [11,] -55.99805 51.73840
>> attr(,"pstart")
>> attr(,"pstart")$from
>> [1] 1
>>
>> attr(,"pstart")$to
>> [1] 11
>>
>> attr(,"nParts")
>> [1] 1
>> attr(,"shpID")
>> [1] NA
>>
>> [[2]]
>>            [,1]     [,2]
>> [1,] -57.76294 50.88770
>> [2,] -57.76292 50.88693
>> [3,] -57.76033 50.88163
>> [4,] -57.75668 50.88091
>> [5,] -57.75551 50.88169
>> [6,] -57.75562 50.88550
>> [7,] -57.75932 50.88775
>> [8,] -57.76294 50.88770
>> attr(,"pstart")
>> attr(,"pstart")$from
>> [1] 1
>>
>> attr(,"pstart")$to
>> [1] 8
>>
>> attr(,"nParts")
>> [1] 1
>> attr(,"shpID")
>> [1] NA
>>
>> I do not quite understand the structure of this data object (list of
>> lists I think)
>> but at this point I resorted to printing it on the console and
>> imported that text into Excel for further cleaning, which is easy
>> enough. I'd like to complete the process within R to save
>> time and to
>> circumvent Excel's limit of around 64000 lines. But I have a hard
>> time figuring out how to clean up this text in R.
>>
>> What I need to produce for PBSmapping is a file where each block of
>> coordinates shares one ID number, called PID, and a variable POS
>> indicates the position of each coordinate within a "shape".
>> All other
>> lines must disappear. So the above would become:
>>
>> PID POS X Y
>> 1 1 -55.99805 51.68817
>> 1 2 -56.00222 51.68911
>> 1 3 -56.01694 51.68911
>> 1 4 -56.03781 51.68606
>> 1 5 -56.04639 51.68759
>> 1 6 -56.04637 51.69445
>> 1 7 -56.03777 51.70207
>> 1 8 -56.02301 51.70892
>> 1 9 -56.01317 51.71578
>> 1 10 -56.00330 51.73481
>> 1 11 -55.99805 51.73840
>> 2 1 -57.76294 50.88770
>> 2 2 -57.76292 50.88693
>> 2 3 -57.76033 50.88163
>> 2 4 -57.75668 50.88091
>> 2 5 -57.75551 50.88169
>> 2 6 -57.75562 50.88550
>> 2 7 -57.75932 50.88775
>> 2 8 -57.76294 50.88770
>>
>> First I imported this text file into R:
>> test <- read.csv2("test file.txt",header=F, sep=";", colClasses =
>> "character")
>>
>> I used sep=";" to insure there would be only one variable in this
>> file, as it contains no ";"
>>
>> To remove lines that do not contain coordinates, I used the
>> fact that
>> longitudes are expressed as negative numbers, so with my very
>> limited
>> knowledge of grep searches, I thought of this, which is probably not
>> the best way to go:
>>
>> a <- rep("-", length(test$V1))
>> b <- grep(a, test$V1)
>>
>> this gives me a warning ("Warning message:
>> the condition has length > 1 and only the first element will be used
>> in: if (is.na(pattern)) {"
>> but seems to do what I need anyway
>>
>> c <- seq(1, length(test$V1))
>> d <- c %in% b
>>
>> e <- test$V1[d]
>>
>> Partial victory, now I only have lines that look like
>> [1,] -57.76294 50.88770
>>
>> But I don't know how to go further: the number in square
>> brackets can
>> be used for variable POS, after removing the square brackets and the
>> comma, but this requires a better knowledge of grep than I have.
>> Furthermore, I don't know how to add a PID (polygon ID) variable,
>> i.e. all lines of a polygon must have the same ID, as in the example
>> above (i.e. each time POS == 1, a new polygon starts and PID
>> needs to
>> be incremented by 1, and PID is kept constant for lines where
>> POS ! 1).
>>
>> Any help will be much appreciated.
>>
>> Sincerely,
>>
>> Denis Chabot
>