[R] Transform a data.frame with "; " sep column and another one in a a new one with the same two column but with repetitions
João Azevedo Patrício
joao.patricio at gmx.pt
Mon Jul 7 11:49:46 CEST 2014
Em 05-07-2014 03:35, John McKown escreveu:
> On Fri, Jul 4, 2014 at 7:50 AM, João Azevedo Patrício
> <joao.patricio gmx.pt> wrote:
>> Hi,
>>
>> I've been trying to solve this issue but with no success.
>>
>> I have some data like this:
>>
>> 1 > TC WC
>> 2 > 0 Instruments & Instrumentation; Nuclear Science & Technology;
>> Physics, Particles & Fields; Spectroscopy
>> 3 > 0 Nanoscience & Nanotechnology; Materials Science, Multidisciplinary;
>> Physics, Applied
>> 4 > 2 Physics, Nuclear; Physics, Particles & Fields
>> 5 > 0 Chemistry, Inorganic & Nuclear
>> 6 > 2 Chemistry, Physical; Materials Science, Multidisciplinary;
>> Metallurgy & Metallurgical Engineering
>>
>> And I need to have this:
>>
>> 1 > TC WC
>> 2 > 0 Instruments & Instrumentation
>> 2 > 0 Nuclear Science & Technology
>> 2 > 0 Physics, Particles & Fields
>> 2 > 0 Spectroscopy
>> 3 > 0 Nanoscience & Nanotechnology
>> 3 > 0 Materials Science, Multidisciplinary
>> 3 > 0 Physics, Applied
>> 4 > 2 Physics, Nuclear
>> 4 > 2 Physics, Particles & Fields
>> 5 > 0 Chemistry, Inorganic & Nuclear
>> 6 > 2 Chemistry, Physical
>> 6 > 2 Materials Science, Multidisciplinary
>> 6 > 2 Metallurgy & Metallurgical Engineering
>>
>> This means repeat the row for each element in WC and keeping the same value
>> in TC. The goal is to check how many TC (sum) there are by WC, when WC is
>> multiple.
>>
>> i've tried to separate the column using strsplt but then I cannot keep the
>> track of TC.
>>
>> thanks in advance.
>> --
>> João Azevedo Patrício
> Best that I've come up with, which seems to give the result desired
> from the example data given.
>
> splitAtSemiColon <- function(input) {
> z <- strsplit(input$WC,';');
> result <- data.table(TC=rep(input$TC,sapply(z,length)), WC=unlist(z));
> return(result);
> }
>
> flatted.data <- splitAtSemiColon(original.data);
>
> <transcript>
>> print(original.data,right=FALSE)
> TC
> 1 0
> 2 0
> 3 2
> 4 0
> 5 2
> WC
> 1 Instruments & Instrumentation; Nuclear Science & Technology;
> Physics, Particles & Fields; Spectroscopy
> 2 Nanoscience & Nanotechnology; Materials Science, Multidisciplinary;
> Physics, Applied
> 3 Physics, Nuclear; Physics, Particles & Fields
> 4 Chemistry, Inorganic & Nuclear
> 5 Chemistry, Physical; Materials Science, Multidisciplinary;
> Metallurgy & Metallurgical Engineering
>>> print(splitAtSemiColon,right=FALSE);
> function(x) {
> z=strsplit(x$WC,';');
> result3=data.frame(TC=rep(x$TC,sapply(z,length)),WC=unlist(z));
> return(result3);
> }
>> print(splitAtSemiColon(original.data),right=FALSE);
> TC WC
> 1 0 Instruments & Instrumentation
> 2 0 Nuclear Science & Technology
> 3 0 Physics, Particles & Fields
> 4 0 Spectroscopy
> 5 0 Nanoscience & Nanotechnology
> 6 0 Materials Science, Multidisciplinary
> 7 0 Physics, Applied
> 8 2 Physics, Nuclear
> 9 2 Physics, Particles & Fields
> 10 0 Chemistry, Inorganic & Nuclear
> 11 2 Chemistry, Physical
> 12 2 Materials Science, Multidisciplinary
> 13 2 Metallurgy & Metallurgical Engineering
>
> Note that I still have a problem in that the WC data can have leading
> and/or trailing blanks due to the say that strsplit works. The easiest
> way to fix this is to use the strtrim() function from the stringr
> package.
>
>
Yes also have that problem. Tried to work it ou using "sub" but didn't
work at all.
--
João Azevedo Patrício
Tel.: +31 91 400 53 63
Portugal
@ http://tripaforra.bl.ee
"Take 2 seconds to think before you act"
More information about the R-help
mailing list