[R] replacing value in column of data frame
jim holtman
jholtman at gmail.com
Wed Jul 9 13:59:54 CEST 2008
Try this; what you want to do is to change the 'levels' of the factor.
> x <- factor(c(1:10,'x','y','xy'))
> str(x)
Factor w/ 13 levels "1","10","2","3",..: 1 3 4 5 6 7 8 9 10 2 ...
> x
[1] 1 2 3 4 5 6 7 8 9 10 x y xy
Levels: 1 10 2 3 4 5 6 7 8 9 x xy y
> # your error
> x[x == 'x'] <- 23
Warning message:
In `[<-.factor`(`*tmp*`, x == "x", value = 23) :
invalid factor level, NAs generated
> x
[1] 1 2 3 4 5 6 7 8 9 10 <NA> y xy
Levels: 1 10 2 3 4 5 6 7 8 9 x xy y
>
> # work with the levels which is what you want to change
> x <- factor(c(1:10,'x','y','xy'))
> levels(x)[x == 'x'] <- '23'
> x
[1] 1 2 3 4 5 6 7 8 9 10 23 y xy
Levels: 1 10 2 3 4 5 6 7 8 9 23 xy y
>
On Wed, Jul 9, 2008 at 5:13 AM, Booman, M <m.booman at path.umcg.nl> wrote:
> Dear all,
>
> Probably a very basic question but I need some help.
> I have a data frame (made by read.table from a text file) of microarray data, of which the first column is a factor and the rest of the columns are numeric.
> The factor column contains chromosome names, so values 1 through 22 plus X, Y and XY. The numeric columns contain positions or intensity measurements.
> What I need to do is change the X's in the first column to a value of 23.
>
> This is what I thought I would do:
>
> BAF_temp <- read.table("BAF_all.txt", sep="\t", header=T) #to read in the table
> BAF_temp[,1][BAF_temp[,1]=="X"] <- 23 #"in rows where the first column of BAF_temp is X, change the first column of BAF_temp to 23"
>
> However with this last line I get an error: "Invalid factor level, NAs generated in '[<-.factor'('*tmp*', BAF_temp[,1]=="X", value=23)"
>
> (I tested if my syntax for selecting the rows of chromosome X was correct by trying
> BAF_X <- BAF_temp[BAF_temp[,1]=="X",]
> which worked to give me a data frame with only the rows of the X chromosome.)
>
> I then thought it might work better if I changed the data frame to a matrix.
> When I change the BAF_temp data frame into a matrix (by BAF_matrix <- as.matrix(BAF_temp)), then the command I used above:
> BAF_temp[,1][BAF_temp[,1]=="X"] <- 23
> works fine and the end result is as I meant it to be, with all the X's changed into 23's.
> However, by using as.matrix all columns are changed to 'character' including the numeric measurements (I understand this is because one of the columns of the data frame is 'factor')
>
> I would like some help on what is the best option to solve this. I have thought of a few options myself and would like your comment/help:
> 1. Is there another syntax I can use on the data frame to change the X's to 23's, so I don't have to change the data frame into a matrix first?
>
> 2. I could change the data frame into a matrix and run the syntax as I described, resulting in all columns becoming 'character'; is there then an easy way to turn the columns with measurements (columns 2 and further) back into 'numeric' while leaving the first column with the chromosome numbers as 'character'?
>
> 3. I thought of using data.matrix(BAF_temp) and making use of the fact that the first column of factors would be changed to the underlying numbers (because X being the 23rd level in the list would automaticly be changed to 23). However because the levels (chromosome names) of the factor column are ordered as "1", "10", "11", "12",....,"19", "2", "20", "21", "3", "4", etc. (I see this when using str(BAF_temp)) , this results in chromosome 10 being changed into a value of 2, chromosome 11 into 3, chromosome 2 into 12 etc. For info: the chromosome names in the text file that is imported are ordered just 1, 2, 3, etc.
>
> If anyone has some tips for me I would greatly appreciate it.
>
> Best wishes,
> Marije
>
>
>
>
> De inhoud van dit bericht is vertrouwelijk en alleen bestemd voor de geadresseerde(n). Anderen dan de geadresseerde(n) mogen geen gebruik maken van dit bericht, het niet openbaar maken of op enige wijze verspreiden of vermenigvuldigen. Het UMCG kan niet aansprakelijk gesteld worden voor een incomplete aankomst of vertraging van dit verzonden bericht.
>
> The contents of this message are confidential and only intended for the eyes of the addressee(s). Others than the addressee(s) are not allowed to use this message, to make it public or to distribute or multiply this message in any way. The UMCG cannot be held responsible for incomplete reception or delay of this transferred message.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?
More information about the R-help
mailing list