[R] Transform a data.frame with "; " sep column and another one in a a new one with the same two column but with repetitions

arun smartpink111 at yahoo.com
Fri Jul 4 16:15:49 CEST 2014



Hi,
Try:
dat1 <- read.table(text="'1 > TC' 'WC'
'2 > 0'  'Instruments & Instrumentation; Nuclear Science & Technology;Physics, Particles & Fields; Spectroscopy'
'3 > 0' 'Nanoscience & Nanotechnology; Materials Science,Multidisciplinary; Physics, Applied'
'4 > 2'    'Physics, Nuclear; Physics, Particles & Fields'
'5 > 0'    'Chemistry, Inorganic & Nuclear'
'6 > 2'    'Chemistry, Physical; Materials Science, Multidisciplinary;Metallurgy & Metallurgical Engineering'",sep="",header=F, stringsAsFactors=F)

library(data.table)
Using `cSplit()` from
https://gist.github.com/mrdwab/11380733

cSplit(dat1, "V2", ";", "long")
        V1                                     V2
 1: 1 > TC                                     WC
 2:  2 > 0          Instruments & Instrumentation
 3:  2 > 0           Nuclear Science & Technology
 4:  2 > 0            Physics, Particles & Fields
 5:  2 > 0                           Spectroscopy
 6:  3 > 0           Nanoscience & Nanotechnology
 7:  3 > 0    Materials Science,Multidisciplinary
 8:  3 > 0                       Physics, Applied
 9:  4 > 2                       Physics, Nuclear
10:  4 > 2            Physics, Particles & Fields
11:  5 > 0         Chemistry, Inorganic & Nuclear
12:  6 > 2                    Chemistry, Physical
13:  6 > 2   Materials Science, Multidisciplinary
14:  6 > 2 Metallurgy & Metallurgical Engineering



A.K.


On Friday, July 4, 2014 9:53 AM, João Azevedo Patrício <joao.patricio at gmx.pt> wrote:
Hi,

I've been trying to solve this issue but with no success.

I have some data like this:

1 > TC    WC
2 > 0    Instruments & Instrumentation; Nuclear Science & Technology; 
Physics, Particles & Fields; Spectroscopy
3 > 0    Nanoscience & Nanotechnology; Materials Science, 
Multidisciplinary; Physics, Applied
4 > 2    Physics, Nuclear; Physics, Particles & Fields
5 > 0    Chemistry, Inorganic & Nuclear
6 > 2    Chemistry, Physical; Materials Science, Multidisciplinary; 
Metallurgy & Metallurgical Engineering

And I need to have this:

1 > TC    WC
2 > 0    Instruments & Instrumentation
2 > 0    Nuclear Science & Technology
2 > 0    Physics, Particles & Fields
2 > 0    Spectroscopy
3 > 0    Nanoscience & Nanotechnology
3 > 0    Materials Science, Multidisciplinary
3 > 0    Physics, Applied
4 > 2    Physics, Nuclear
4 > 2    Physics, Particles & Fields
5 > 0    Chemistry, Inorganic & Nuclear
6 > 2    Chemistry, Physical
6 > 2    Materials Science, Multidisciplinary
6 > 2    Metallurgy & Metallurgical Engineering

This means repeat the row for each element in WC and keeping the same 
value in TC. The goal is to check how many TC (sum) there are by WC, 
when WC is multiple.

i've tried to separate the column using strsplt but then I cannot keep 
the track of TC.

thanks in advance.
-- 
João Azevedo Patrício
Tel.: +31 91 400 53 63
Portugal
@ http://tripaforra.bl.ee

"Take 2 seconds to think before you act"

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list