[R] replacing a factor value in a data frame
Dave Roberts
droberts at montana.edu
Fri Oct 28 18:01:14 CEST 2005
Federico,
There doesn't appear to be an instance of the value you want to
change in your example, so I had to improvise. Part of the problem may
be that the dataframe is composed of factors, and it's not possible to
convert the value of a factor to another value that's in the set of
possible values, given by the levels() function. So, if you want to
change GC to CG, but CG does not already exist in the set of possible
values you'll have to add it. E.g.
> tmp <- data
> levels(tmp[,30]) <- c(levels(data[,30]),'CG')
then, if the problem only occurs in one column it's an easy fix.
> tmp[data=='GC'] <- 'CG'
If GC occurs in multiple columns you'll either have to change the levels
for each column as I did just above, or work with a single column.
Since you don't have 30 columns in your example, let's pretend you want
to change all the instances of 'CC' in data$V5 to 'XX'
> tmp <- data
> levels(tmp$V5) <- c(levels(data$V5),'XX')
> tmp$V5[data$V5=='CC'] <- 'XX'
> tmp
V4 V5 V6 V7 V8 V9 V10
1 TT GG TT AC AG AG TT
2 AT XX TT AA AA AA TT
3 AT XX TT AC AA <NA> TT
4 TT XX TT AA AA AA TT
5 AT CG TT CC AA AA TT
6 TT XX TT AA AA AA TT
7 AT XX TT CC <NA> <NA> TT
8 TT XX TT AC AG AG TT
9 AT XX TT CC AG <NA> TT
10 TT XX TT CC GG GG TT
Notice that the instances of 'CC' in tmp$V7 did not change.
HTH, Dave Roberts
Federico Calboli wrote:
> Hi All,
>
> I have the following problem, that's driving me mad.
>
> I have a dataframe of factors, from a genetic scan of SNPs. I DO have
> NAs in the dataframe, which would look like:
>
> V4 V5 V6 V7 V8 V9 V10
> 1 TT GG TT AC AG AG TT
> 2 AT CC TT AA AA AA TT
> 3 AT CC TT AC AA <NA> TT
> 4 TT CC TT AA AA AA TT
> 5 AT CG TT CC AA AA TT
> 6 TT CC TT AA AA AA TT
> 7 AT CC TT CC <NA> <NA> TT
> 8 TT CC TT AC AG AG TT
> 9 AT CC TT CC AG <NA> TT
> 10 TT CC TT CC GG GG TT
>
>
> In the dataframe I have 1 column where one factor has been erroneosly
> given alternative readings: CG and GC.
>
> I want to change the instances of GC to CG and I use the code:
>
> data[data[,30]=="GC", 30] = "CG"
>
> but get the error:
> Error in "[<-.data.frame"(`*tmp*`, all[, 30] == "GC", 30
> missing values are not allowed in subscripted as
>
> Any hints?
>
> Cheers,
>
> Federico
>
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
David W. Roberts office 406-994-4548
Professor and Head FAX 406-994-3190
Department of Ecology email droberts at montana.edu
Montana State University
Bozeman, MT 59717-3460
More information about the R-help
mailing list