[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display
Sven E. Templer
sven.templer at gmail.com
Fri Dec 19 10:13:55 CET 2014
Another solution:
CaseID <- c("1015285", "1005317", "1012281", "1015285", "1015285", "1007183",
"1008833", "1015315", "1015322", "1015285")
Primary.Viol.Type <- c("AS.Age", "HS.Hours", "HS.Hours", "HS.Hours",
"RK.Records_CL",
"OT.Overtime", "OT.Overtime", "OT.Overtime", "V.Poster_Other",
"V.Poster_Other")
library(reshape2)
dcast(data.frame(CaseID, Primary.Viol.Type), CaseID~Primary.Viol.Type, length)
# result:
Using Primary.Viol.Type as value column: use value.var to override.
CaseID AS.Age HS.Hours OT.Overtime RK.Records_CL V.Poster_Other
1 1005317 0 1 0 0 0
2 1007183 0 0 1 0 0
3 1008833 0 0 1 0 0
4 1012281 0 1 0 0 0
5 1015285 1 1 0 1 1
6 1015315 0 0 1 0 0
7 1015322 0 0 0 0 1
best, s.
On 19 December 2014 at 06:35, Chel Hee Lee <chl948 at mail.usask.ca> wrote:
> Please take a look at my code again. The error message says that object
> 'Primary.Viol.Type' not found. Have you ever created the object
> 'Primary.Viol.Type'? It will be working if you replace 'Primary.Viol.Type'
> by 'PViol.Type.Per.Case.Original$Primary.Viol.Type' where 'factor()' is
> used. I hope this helps.
>
> Chel Hee Lee
>
> On 12/18/2014 08:57 PM, Crombie, Burnette N wrote:
>>
>> Chel, your solution is fantastic on the dataset I submitted in my question
>> but it is not working when I import my real dataset into R. Do I need to
>> vectorize the columns in my real dataset after importing? I tried a few
>> things (###) but not making progress:
>>
>> MERGE_PViol.Detail.Per.Case <-
>> read.csv("~/FOIA_FLSA/MERGE_PViol.Detail.Per.Case_for_rtf10.csv",
>> stringsAsFactors=TRUE)
>>
>> ### select only certain columns
>> PViol.Type.Per.Case.Original <- MERGE_PViol.Detail.Per.Case[,c("CaseID",
>> "Primary.Viol.Type")]
>>
>> ### write.csv(PViol.Type.Per.Case,file="PViol.Type.Per.Case.Select.csv")
>> ### PViol.Type.Per.Case.Original <-
>> read.csv("~/FOIA_FLSA/PViol.Type.Per.Case.Select.csv")
>> ### PViol.Type.Per.Case.Original$X <- NULL
>> ###PViol.Type.Per.Case.Original[] <- lapply(PViol.Type.Per.Case.Original,
>> as.character)
>>
>> PViol.Type <- c("CaseID",
>> "BW.BackWages",
>> "LD.Liquid_Damages",
>> "MW.Minimum_Wage",
>> "OT.Overtime",
>> "RK.Records_FLSA",
>> "V.Poster_Other",
>> "AS.Age",
>> "BW.WHMIS_BackWages",
>> "HS.Hours",
>> "OA.HazOccupationAg",
>> "ON.HazOccupationNonAg",
>> "R3.Reg3AgeOccupation",
>> "RK.Records_CL",
>> "V.Other")
>>
>> PViol.Type.Per.Case.Original$Primary.Viol.Type <-
>> factor(Primary.Viol.Type, levels=PViol.Type, labels=PViol.Type)
>>
>> ### Error in factor(Primary.Viol.Type, levels = PViol.Type, labels =
>> PViol.Type) : object 'Primary.Viol.Type' not found
>>
>> tmp <-
>> split(PViol.Type.Per.Case.Original,PViol.Type.Per.Case.Original$CaseID)
>> ans <- ifelse(do.call(rbind, lapply(tmp,
>> function(x)table(x$Primary.Viol.Type))), 1, NA)
>>
>>
>>
>> -----Original Message-----
>> From: Crombie, Burnette N
>> Sent: Thursday, December 18, 2014 3:01 PM
>> To: 'Chel Hee Lee'
>> Subject: RE: [R] Make 2nd col of 2-col df into header row of same df then
>> adjust col1 data display
>>
>> Thanks for taking the time to review this, Chel. I've got to step away
>> from my desk, but will reply more substantially as soon as possible. -- BNC
>>
>> -----Original Message-----
>> From: Chel Hee Lee [mailto:chl948 at mail.usask.ca]
>> Sent: Thursday, December 18, 2014 2:43 PM
>> To: Jeff Newmiller; Crombie, Burnette N
>> Cc: r-help at r-project.org
>> Subject: Re: [R] Make 2nd col of 2-col df into header row of same df then
>> adjust col1 data display
>>
>> I like the approach presented by Jeff Newmiller as shown in the previous
>> post (I really like his way). As he suggested, it would be good to start
>> with 'factor' since you have all values of 'Primary.Viol.Type'.
>> You may try to use 'split()' function for creating table that you wish to
>> build. Please see the below (I hope this helps):
>>
>> > PViol.Type.Per.Case.Original$Primary.Viol.Type <-
>> factor(Primary.Viol.Type, levels=PViol.Type, labels=PViol.Type) > > tmp <-
>> split(PViol.Type.Per.Case.Original,
>> PViol.Type.Per.Case.Original$CaseID)
>> > ans <- ifelse(do.call(rbind, lapply(tmp, function(x)
>> table(x$Primary.Viol.Type))), 1, NA) > ans
>> CaseID BW.BackWages LD.Liquid_Damages MW.Minimum_Wage
>> OT.Overtime
>> 1005317 NA NA NA NA NA
>> 1007183 NA NA NA NA 1
>> 1008833 NA NA NA NA 1
>> 1012281 NA NA NA NA NA
>> 1015285 NA NA NA NA NA
>> 1015315 NA NA NA NA 1
>> 1015322 NA NA NA NA NA
>> RK.Records_FLSA V.Poster_Other AS.Age BW.WHMIS_BackWages
>> HS.Hours
>> 1005317 NA NA NA NA 1
>> 1007183 NA NA NA NA NA
>> 1008833 NA NA NA NA NA
>> 1012281 NA NA NA NA 1
>> 1015285 NA 1 1 NA 1
>> 1015315 NA NA NA NA NA
>> 1015322 NA 1 NA NA NA
>> OA.HazOccupationAg ON.HazOccupationNonAg R3.Reg3AgeOccupation
>> 1005317 NA NA NA
>> 1007183 NA NA NA
>> 1008833 NA NA NA
>> 1012281 NA NA NA
>> 1015285 NA NA NA
>> 1015315 NA NA NA
>> 1015322 NA NA NA
>> RK.Records_CL V.Other
>> 1005317 NA NA
>> 1007183 NA NA
>> 1008833 NA NA
>> 1012281 NA NA
>> 1015285 1 NA
>> 1015315 NA NA
>> 1015322 NA NA
>> >
>>
>> Chel Hee Lee
>>
>> On 12/18/2014 10:02 AM, Jeff Newmiller wrote:
>>>
>>> No guarantees on "best"... but one way using base R could be:
>>>
>>> # Note that "CaseID" is actually not a valid PViol.Type as you had it
>>> PViol.Type <- c( "BW.BackWages"
>>> , "LD.Liquid_Damages"
>>> , "MW.Minimum_Wage"
>>> , "OT.Overtime"
>>> , "RK.Records_FLSA"
>>> , "V.Poster_Other"
>>> , "AS.Age"
>>> , "BW.WHMIS_BackWages"
>>> , "HS.Hours"
>>> , "OA.HazOccupationAg"
>>> , "ON.HazOccupationNonAg"
>>> , "R3.Reg3AgeOccupation"
>>> , "RK.Records_CL"
>>> , "V.Other" )
>>>
>>> # explicitly specifying all levels to the factor insures a complete #
>>> set of column outputs regardless of what is in the input
>>> PViol.Type.Per.Case.Original <-
>>> data.frame( CaseID
>>> , Primary.Viol.Type=factor( Primary.Viol.Type
>>> , levels=PViol.Type ) )
>>>
>>> tmp <- table( PViol.Type.Per.Case.Original ) ans <- data.frame(
>>> CaseID=rownames( tmp )
>>> , as.data.frame( ifelse( 0==tmp, NA, 1 ) )
>>> )
>>>
>>>
>>> On Wed, 17 Dec 2014, bcrombie wrote:
>>>
>>>> # I have a dataframe that contains 2 columns:
>>>> CaseID <- c('1015285',
>>>> '1005317',
>>>> '1012281',
>>>> '1015285',
>>>> '1015285',
>>>> '1007183',
>>>> '1008833',
>>>> '1015315',
>>>> '1015322',
>>>> '1015285')
>>>>
>>>> Primary.Viol.Type <- c('AS.Age',
>>>> 'HS.Hours',
>>>> 'HS.Hours',
>>>> 'HS.Hours',
>>>> 'RK.Records_CL',
>>>> 'OT.Overtime',
>>>> 'OT.Overtime',
>>>> 'OT.Overtime',
>>>> 'V.Poster_Other',
>>>> 'V.Poster_Other')
>>>>
>>>> PViol.Type.Per.Case.Original <- data.frame(CaseID,Primary.Viol.Type)
>>>>
>>>> # CaseID?s can be repeated because there can be up to 14
>>>> Primary.Viol.Type?s per CaseID.
>>>>
>>>> # I want to transform this dataframe into one that has 15 columns,
>>>> where the first column is CaseID, and the rest are the 14 primary
>>>> viol. types. The CaseID column will contain a list of the unique
>>>> CaseID?s (no
>>>> replicates) and
>>>> for each of their rows, there will be a ?1? under a column
>>>> corresponding to a primary violation type recorded for that CaseID.
>>>> So, technically, there could be zero to 14 ?1?s? in a CaseID?s row.
>>>>
>>>> # For example, the row for CaseID '1015285' above would have a ?1?
>>>> under ?AS.Age?, ?HS.Hours?, ?RK.Records_CL?, and ?V.Poster_Other?,
>>>> but have "NA"
>>>> under the rest of the columns.
>>>>
>>>> PViol.Type <- c("CaseID",
>>>> "BW.BackWages",
>>>> "LD.Liquid_Damages",
>>>> "MW.Minimum_Wage",
>>>> "OT.Overtime",
>>>> "RK.Records_FLSA",
>>>> "V.Poster_Other",
>>>> "AS.Age",
>>>> "BW.WHMIS_BackWages",
>>>> "HS.Hours",
>>>> "OA.HazOccupationAg",
>>>> "ON.HazOccupationNonAg",
>>>> "R3.Reg3AgeOccupation",
>>>> "RK.Records_CL",
>>>> "V.Other")
>>>>
>>>> PViol.Type.Columns <- t(data.frame(PViol.Type)
>>>>
>>>> # What is the best way to do this in R?
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://r.789695.n4.nabble.com/Make-2nd-col-of-2-col-df-into-header-ro
>>>> w-of-same-df-then-adjust-col1-data-display-tp4700878.html
>>>>
>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>>> ---------------------------------------------------------------------------
>>> Jeff Newmiller The ..... ..... Go
>>> Live...
>>> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
>>> Go...
>>> Live: OO#.. Dead: OO#..
>>> Playing
>>> Research Engineer (Solar/Batteries O.O#. #.O#. with
>>> /Software/Embedded Controllers) .OO#. .OO#.
>>> rocks...1k
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list