[R] Any method to speed up this problem?
Marc Schwartz
marc_schwartz at me.com
Thu Jun 18 16:36:27 CEST 2009
On Jun 18, 2009, at 9:28 AM, njhuang86 wrote:
>
> Hi all,
>
> Suppose I have a vector like this:
>
> [1] "STAT1" "STAT1" "STAT1" "STAT1" "GAPDH" "GAPDH" "GAPDH"
> "ACTB"
> "ACTB"
> [10] "ACTB" "DDR1" "RFC2" "HSPA6" "PAX8" "GUCA1A" "UBE1L"
> "THRA"
> "PTPN21"
> [19] "CCL5" "CYP2E1" "STAT1" "THRA" "PAX8"
>
> I would like to produce a vector such that it has the same length as
> the one
> above but it tells me where the duplicates are. So essentially, if I
> could
> represent each gene symbol as a specific number, and have the
> duplicates be
> the same number, that would be ideal. Right now, I'm using the unique
> command along with two nested for loops to do the job... But it's
> really
> taking too long... Any suggestions would be greatly appreciated.
> Thank you!
Is this what you want?
> Vec
[1] "STAT1" "STAT1" "STAT1" "STAT1" "GAPDH" "GAPDH" "GAPDH"
[8] "ACTB" "ACTB" "ACTB" "DDR1" "RFC2" "HSPA6" "PAX8"
[15] "GUCA1A" "UBE1L" "THRA" "PTPN21" "CCL5" "CYP2E1" "STAT1"
[22] "THRA" "PAX8"
> as.numeric(factor(Vec))
[1] 11 11 11 11 5 5 5 1 1 1 4 10 7 8 6 13 12 9 2 3 11 12
[23] 8
?
HTH,
Marc Schwartz
More information about the R-help
mailing list