[R] numerical data frame
Richard.Cotton at hsl.gov.uk
Richard.Cotton at hsl.gov.uk
Mon Jan 7 11:07:19 CET 2008
> I've successfully import my synteny data to R by using scan
> command. Below show my results. My major problem with my data is how
> am i going to combine the column names with the data( splt) where i
> have tried on cbind but a warning message occur. I have realized
> that the splt data only have 5 column instead of 6. Please help me with
this!!
>
> I want my data to be a numerical data with a proper column and
> column names and to replace CS with 1 and CSO with 0 and also to
> get remove all the punctuations and the characters from the data.
> 1)for col names
>
> nms<-scan("C:/Users/user/Documents/cfa-1.txt",sep="\t",nlines=1,
> skip=10,what=character(0))
> Read 6 items
> > nms
> [1] "CS(O) id (number of marker/anchor) "
> [2] " Location(s) on reference "
> [3] "CS(O) size"
> [4] "CS(O) density on reference chromosome"
> [5] "Location(s) on tested "
> [6] "Breakpoints CS(O) locations (denstiy of marker/anchor)"
>
> 2) my data
>
> x<-scan("C:/Users/user/Documents/cfa-1.txt",sep="\n",skip=12,
> what=character(0))
> Read 21 items
> > splt<-strsplit(x,"\t")
> > splt
> [[1]]
> [1] "CS 1 (73): " " cfa1: [ 3251712 - 24126920 ] "
> [3] " 20875208 " " 3 "
> [5] " hsa18: [ 132170848 - 50139168 ] " "] 24126920, 24153560 [(8 ) "
> [[2]]
> [1] "CS 2 (3): " " cfa1: [ 24153560 - 24265894 ] "
> [3] " 112334 " " 27 "
> [5] " hsa18: [ 50105060 - 49934572 ] " "] 24265894, 24823786 [(7 ) "
> [[3]]
> [1] "CSO 3.1 (6): "
> [2] " cfa1: [ 24823786 - 27113036 ] "
> [3] " 2289250 "
> [4] " 3 "
> [5] " hsa18: [ 48121156 - 46579500 ]- Decreasing order - ] 27113036,
> 27418228 [ (13)"
> ...
You are probably better off using read.table or read.delim to get your
data into R, since you most likely want it in the form of a data frame
rather than a list.
Otherwise,try this.
#Convert to matrix
datamat <- matrix(unlist(splt), ncol=6, byrow=TRUE)
#This will remove punctuation, but it looks like you want to do something
more with some of the columns; I'm just not sure what it is.
nopunct <- gsub("[[:punct:]]", "", datamat)
#Convert to a data frame
df <- as.data.frame(nopunct)
#Make column 3 numeric (you will probably want to do something like this
for each one)
df[,3] <- as.numeric(df[,3])
# Set column names
names(df) <- nms
Regards,
Richie.
Mathematical Sciences Unit
HSL
------------------------------------------------------------------------
ATTENTION:
This message contains privileged and confidential inform...{{dropped:20}}
More information about the R-help
mailing list