[R] Problem reading from a data frame
Marc Schwartz
marc_schwartz at comcast.net
Wed Jul 2 18:52:25 CEST 2008
Not likely the factor issue:
x <- factor(c("MT2342", "MT0982", "MT2874"))
> x
[1] MT2342 MT0982 MT2874
Levels: MT0982 MT2342 MT2874
> gsub("[^0-9]", "", x)
[1] "2342" "0982" "2874"
gsub() and friends coerce to character internally already:
> gsub
function (pattern, replacement, x, ignore.case = FALSE, extended = TRUE,
perl = FALSE, fixed = FALSE, useBytes = FALSE)
{
if (!is.character(x))
x <- as.character(x)
.Internal(gsub(as.character(pattern), as.character(replacement),
x, ignore.case, extended, perl, fixed, useBytes))
}
<environment: namespace:base>
More than likely what is happening is that 'PthwyGenes' is a single row
data frame:
x <- data.frame(A = "MT2342", B = "MT0982", C = "MT2874")
> x
A B C
1 MT2342 MT0982 MT2874
> str(x)
'data.frame': 1 obs. of 3 variables:
$ A: Factor w/ 1 level "MT2342": 1
$ B: Factor w/ 1 level "MT0982": 1
$ C: Factor w/ 1 level "MT2874": 1
Thus, when the code for gsub() attempts to coerce 'x' to character, as
per documented behavior, you get the factor level numeric codes coerced
to character:
> as.character(x[1, ])
[1] "1" "1" "1"
and then:
> gsub("[^0-9]", "", x[1, ])
[1] "1" "1" "1"
Thus, instead use:
> sapply(x[1, ], function(x) gsub("[^0-9]", "", x))
A B C
"2342" "0982" "2874"
or, if you just need the vector returned and not a data frame:
> gsub("[^0-9]", "", unlist(x[1, ]))
[1] "2342" "0982" "2874"
The key thing to remember is that a single extracted row in a data frame
is not a vector.
HTH,
Marc Schwartz
on 07/02/2008 10:51 AM jim holtman wrote:
> Seems to work fine for me:
>
>> x <- c("MT2342", "MT0982", "MT2874")
>> gsub("[^0-9]", "", x)
> [1] "2342" "0982" "2874"
>
> You might have 'factors' so you should use as.character to convert to
> character strings:
>
> gsub('[^0-9]','',as.character(PthwyGenes))
>
> On Wed, Jul 2, 2008 at 10:24 AM, <naw3 at duke.edu> wrote:
>> Hi,
>>
>> I have a data frame with strings that have two letters and four numbers. When I
>> store a whole row as a new vector and try to remove the preceding letters using
>> the gsub command, it returns characters of single numbers that have no relation
>> to the numbers in each string. I also noticed that when I view the new vector
>> before using gsub, it includes the original headers from the data frame. For
>> example,
>>
>> The original row will contain (i'm not showing the headers):
>>
>> MT2342 MT0982 MT2874
>>
>> and after I use the command, 'gsub('[^0-9]','',PthwyGenes),' I get:
>>
>> "6" "6" "8"
>>
>> and this result no longer has any headers.
>>
>> Does anyone know why this happens and how I can fix it?
>>
>> Thanks,
>> -Nina
More information about the R-help
mailing list