[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

Sven E. Templer sven.templer at gmail.com
Fri Dec 19 10:13:55 CET 2014


Another solution:

CaseID <- c("1015285", "1005317", "1012281", "1015285", "1015285", "1007183",
"1008833", "1015315", "1015322", "1015285")
Primary.Viol.Type <- c("AS.Age", "HS.Hours", "HS.Hours", "HS.Hours",
"RK.Records_CL",
"OT.Overtime", "OT.Overtime", "OT.Overtime", "V.Poster_Other",
"V.Poster_Other")

library(reshape2)
dcast(data.frame(CaseID, Primary.Viol.Type), CaseID~Primary.Viol.Type, length)

# result:

Using Primary.Viol.Type as value column: use value.var to override.
   CaseID AS.Age HS.Hours OT.Overtime RK.Records_CL V.Poster_Other
1 1005317      0        1           0             0              0
2 1007183      0        0           1             0              0
3 1008833      0        0           1             0              0
4 1012281      0        1           0             0              0
5 1015285      1        1           0             1              1
6 1015315      0        0           1             0              0
7 1015322      0        0           0             0              1


best, s.

On 19 December 2014 at 06:35, Chel Hee Lee <chl948 at mail.usask.ca> wrote:
> Please take a look at my code again.  The error message says that object
> 'Primary.Viol.Type' not found.  Have you ever created the object
> 'Primary.Viol.Type'?   It will be working if you replace 'Primary.Viol.Type'
> by 'PViol.Type.Per.Case.Original$Primary.Viol.Type' where 'factor()' is
> used.  I hope this helps.
>
> Chel Hee Lee
>
> On 12/18/2014 08:57 PM, Crombie, Burnette N wrote:
>>
>> Chel, your solution is fantastic on the dataset I submitted in my question
>> but it is not working when I import my real dataset into R.  Do I need to
>> vectorize the columns in my real dataset after importing?  I tried a few
>> things (###) but not making progress:
>>
>> MERGE_PViol.Detail.Per.Case <-
>> read.csv("~/FOIA_FLSA/MERGE_PViol.Detail.Per.Case_for_rtf10.csv",
>> stringsAsFactors=TRUE)
>>
>> ### select only certain columns
>> PViol.Type.Per.Case.Original <- MERGE_PViol.Detail.Per.Case[,c("CaseID",
>> "Primary.Viol.Type")]
>>
>> ### write.csv(PViol.Type.Per.Case,file="PViol.Type.Per.Case.Select.csv")
>> ### PViol.Type.Per.Case.Original <-
>> read.csv("~/FOIA_FLSA/PViol.Type.Per.Case.Select.csv")
>> ### PViol.Type.Per.Case.Original$X <- NULL
>> ###PViol.Type.Per.Case.Original[] <- lapply(PViol.Type.Per.Case.Original,
>> as.character)
>>
>> PViol.Type <- c("CaseID",
>>                  "BW.BackWages",
>>                  "LD.Liquid_Damages",
>>                  "MW.Minimum_Wage",
>>                  "OT.Overtime",
>>                  "RK.Records_FLSA",
>>                  "V.Poster_Other",
>>                  "AS.Age",
>>                  "BW.WHMIS_BackWages",
>>                  "HS.Hours",
>>                  "OA.HazOccupationAg",
>>                  "ON.HazOccupationNonAg",
>>                  "R3.Reg3AgeOccupation",
>>                  "RK.Records_CL",
>>                  "V.Other")
>>
>> PViol.Type.Per.Case.Original$Primary.Viol.Type <-
>> factor(Primary.Viol.Type, levels=PViol.Type, labels=PViol.Type)
>>
>> ### Error in factor(Primary.Viol.Type, levels = PViol.Type, labels =
>> PViol.Type) :  object 'Primary.Viol.Type' not found
>>
>> tmp <-
>> split(PViol.Type.Per.Case.Original,PViol.Type.Per.Case.Original$CaseID)
>> ans <- ifelse(do.call(rbind, lapply(tmp,
>> function(x)table(x$Primary.Viol.Type))), 1, NA)
>>
>>
>>
>> -----Original Message-----
>> From: Crombie, Burnette N
>> Sent: Thursday, December 18, 2014 3:01 PM
>> To: 'Chel Hee Lee'
>> Subject: RE: [R] Make 2nd col of 2-col df into header row of same df then
>> adjust col1 data display
>>
>> Thanks for taking the time to review this, Chel.  I've got to step away
>> from my desk, but will reply more substantially as soon as possible. -- BNC
>>
>> -----Original Message-----
>> From: Chel Hee Lee [mailto:chl948 at mail.usask.ca]
>> Sent: Thursday, December 18, 2014 2:43 PM
>> To: Jeff Newmiller; Crombie, Burnette N
>> Cc: r-help at r-project.org
>> Subject: Re: [R] Make 2nd col of 2-col df into header row of same df then
>> adjust col1 data display
>>
>> I like the approach presented by Jeff Newmiller as shown in the previous
>> post (I really like his way).  As he suggested, it would be good to start
>> with 'factor' since you have all values of 'Primary.Viol.Type'.
>> You may try to use 'split()' function for creating table that you wish to
>> build.  Please see the below (I hope this helps):
>>
>>   > PViol.Type.Per.Case.Original$Primary.Viol.Type <-
>> factor(Primary.Viol.Type, levels=PViol.Type, labels=PViol.Type)  >  > tmp <-
>> split(PViol.Type.Per.Case.Original,
>> PViol.Type.Per.Case.Original$CaseID)
>>   > ans <- ifelse(do.call(rbind, lapply(tmp, function(x)
>> table(x$Primary.Viol.Type))), 1, NA)  > ans
>>           CaseID BW.BackWages LD.Liquid_Damages MW.Minimum_Wage
>> OT.Overtime
>> 1005317     NA           NA                NA              NA          NA
>> 1007183     NA           NA                NA              NA           1
>> 1008833     NA           NA                NA              NA           1
>> 1012281     NA           NA                NA              NA          NA
>> 1015285     NA           NA                NA              NA          NA
>> 1015315     NA           NA                NA              NA           1
>> 1015322     NA           NA                NA              NA          NA
>>           RK.Records_FLSA V.Poster_Other AS.Age BW.WHMIS_BackWages
>> HS.Hours
>> 1005317              NA             NA     NA                 NA        1
>> 1007183              NA             NA     NA                 NA       NA
>> 1008833              NA             NA     NA                 NA       NA
>> 1012281              NA             NA     NA                 NA        1
>> 1015285              NA              1      1                 NA        1
>> 1015315              NA             NA     NA                 NA       NA
>> 1015322              NA              1     NA                 NA       NA
>>           OA.HazOccupationAg ON.HazOccupationNonAg R3.Reg3AgeOccupation
>> 1005317                 NA                    NA                   NA
>> 1007183                 NA                    NA                   NA
>> 1008833                 NA                    NA                   NA
>> 1012281                 NA                    NA                   NA
>> 1015285                 NA                    NA                   NA
>> 1015315                 NA                    NA                   NA
>> 1015322                 NA                    NA                   NA
>>           RK.Records_CL V.Other
>> 1005317            NA      NA
>> 1007183            NA      NA
>> 1008833            NA      NA
>> 1012281            NA      NA
>> 1015285             1      NA
>> 1015315            NA      NA
>> 1015322            NA      NA
>>   >
>>
>> Chel Hee Lee
>>
>> On 12/18/2014 10:02 AM, Jeff Newmiller wrote:
>>>
>>> No guarantees on "best"... but one way using base R could be:
>>>
>>> # Note that "CaseID" is actually not a valid PViol.Type as you had it
>>> PViol.Type <- c( "BW.BackWages"
>>>                  , "LD.Liquid_Damages"
>>>                  , "MW.Minimum_Wage"
>>>                  , "OT.Overtime"
>>>                  , "RK.Records_FLSA"
>>>                  , "V.Poster_Other"
>>>                  , "AS.Age"
>>>                  , "BW.WHMIS_BackWages"
>>>                  , "HS.Hours"
>>>                  , "OA.HazOccupationAg"
>>>                  , "ON.HazOccupationNonAg"
>>>                  , "R3.Reg3AgeOccupation"
>>>                  , "RK.Records_CL"
>>>                  , "V.Other" )
>>>
>>> # explicitly specifying all levels to the factor insures a complete #
>>> set of column outputs regardless of what is in the input
>>> PViol.Type.Per.Case.Original <-
>>>       data.frame( CaseID
>>>                 , Primary.Viol.Type=factor( Primary.Viol.Type
>>>                                           , levels=PViol.Type ) )
>>>
>>> tmp <- table( PViol.Type.Per.Case.Original ) ans <- data.frame(
>>> CaseID=rownames( tmp )
>>>                    , as.data.frame( ifelse( 0==tmp, NA, 1 ) )
>>>                    )
>>>
>>>
>>> On Wed, 17 Dec 2014, bcrombie wrote:
>>>
>>>> # I have a dataframe that contains 2 columns:
>>>> CaseID  <- c('1015285',
>>>> '1005317',
>>>> '1012281',
>>>> '1015285',
>>>> '1015285',
>>>> '1007183',
>>>> '1008833',
>>>> '1015315',
>>>> '1015322',
>>>> '1015285')
>>>>
>>>> Primary.Viol.Type <- c('AS.Age',
>>>> 'HS.Hours',
>>>> 'HS.Hours',
>>>> 'HS.Hours',
>>>> 'RK.Records_CL',
>>>> 'OT.Overtime',
>>>> 'OT.Overtime',
>>>> 'OT.Overtime',
>>>> 'V.Poster_Other',
>>>> 'V.Poster_Other')
>>>>
>>>> PViol.Type.Per.Case.Original <- data.frame(CaseID,Primary.Viol.Type)
>>>>
>>>> # CaseID?s can be repeated because there can be up to 14
>>>> Primary.Viol.Type?s per CaseID.
>>>>
>>>> # I want to transform this dataframe into one that has 15 columns,
>>>> where the first column is CaseID, and the rest are the 14 primary
>>>> viol. types.  The CaseID column will contain a list of the unique
>>>> CaseID?s (no
>>>> replicates) and
>>>> for each of their rows, there will be a ?1? under  a column
>>>> corresponding to a primary violation type recorded for that CaseID.
>>>> So, technically, there could be zero to 14 ?1?s? in a CaseID?s row.
>>>>
>>>> # For example, the row for CaseID '1015285' above would have a ?1?
>>>> under ?AS.Age?, ?HS.Hours?, ?RK.Records_CL?, and ?V.Poster_Other?,
>>>> but have "NA"
>>>> under the rest of the columns.
>>>>
>>>> PViol.Type <- c("CaseID",
>>>>                 "BW.BackWages",
>>>>            "LD.Liquid_Damages",
>>>>            "MW.Minimum_Wage",
>>>>            "OT.Overtime",
>>>>            "RK.Records_FLSA",
>>>>            "V.Poster_Other",
>>>>            "AS.Age",
>>>>            "BW.WHMIS_BackWages",
>>>>            "HS.Hours",
>>>>            "OA.HazOccupationAg",
>>>>            "ON.HazOccupationNonAg",
>>>>            "R3.Reg3AgeOccupation",
>>>>            "RK.Records_CL",
>>>>            "V.Other")
>>>>
>>>> PViol.Type.Columns <- t(data.frame(PViol.Type)
>>>>
>>>> # What is the best way to do this in R?
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://r.789695.n4.nabble.com/Make-2nd-col-of-2-col-df-into-header-ro
>>>> w-of-same-df-then-adjust-col1-data-display-tp4700878.html
>>>>
>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>>> ---------------------------------------------------------------------------
>>> Jeff Newmiller                        The     .....       .....  Go
>>> Live...
>>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
>>> Go...
>>>                                         Live:   OO#.. Dead: OO#..
>>> Playing
>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>> rocks...1k
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list