[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display
Boris Steipe
boris.steipe at utoronto.ca
Thu Dec 18 16:59:57 CET 2014
"Make a table that looks like..." sounds like a use case that would benefit from some reflection.
Anyway, at least don't put your IDs *in* the "table".
# Your data
CaseID <- c('1015285',
'1005317',
'1012281',
'1015285',
'1015285',
'1007183',
'1008833',
'1015315',
'1015322',
'1015285')
Primary.Viol.Type <- c('AS.Age',
'HS.Hours',
'HS.Hours',
'HS.Hours',
'RK.Records_CL',
'OT.Overtime',
'OT.Overtime',
'OT.Overtime',
'V.Poster_Other',
'V.Poster_Other')
# the code
uID <- unique(CaseID)
uVT <- unique(Primary.Viol.Type)
m <- matrix(NA, nrow=length(uID), ncol=length(uVT), dimnames=list(uID, uVT))
for (i in 1:length(CaseID)) {
m[CaseID[i], Primary.Viol.Type[i]] <- 1
}
# the result
AS.Age HS.Hours RK.Records_CL OT.Overtime V.Poster_Other
1015285 1 1 1 NA 1
1005317 NA 1 NA NA NA
1012281 NA 1 NA NA NA
1007183 NA NA NA 1 NA
1008833 NA NA NA 1 NA
1015315 NA NA NA 1 NA
1015322 NA NA NA NA 1
B.
On Dec 18, 2014, at 8:09 AM, Crombie, Burnette N <bcrombie at utk.edu> wrote:
> I want to achieve a table that looks like a grid of 1's for all cases in a survey. I'm an R beginner and don't have a clue how to do all the things you just suggested. I really appreciate the time you took to explain all of those options, though. -- BNC
>
> -----Original Message-----
> From: Boris Steipe [mailto:boris.steipe at utoronto.ca]
> Sent: Thursday, December 18, 2014 5:29 AM
> To: Crombie, Burnette N
> Cc: r-help at r-project.org
> Subject: Re: [R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display
>
> What you are describing sounds like a very spreadsheet-y thing.
>
> - The information is already IN your dataframe, and easy to get out by subsetting. Depending on your usecase, that may actually be the "best".
>
> - If the number of CaseIDs is large, I would use a hash of lists (if the data is sparse), or hash of named vectors if it's not sparse. Lookup is O(1) so that may be the best. (Cf package hash, and explanations there).
>
> - If it must be the spreadsheet-y thing, you could make a matrix with rownames and colnames taken from unique() of your respective dataframe. Instead of 1 and NA I probably would use TRUE/FALSE.
>
> - If it takes less time to wait for the results than to look up how apply() works, you can write a simple loop to populate your matrix. Otherwise apply() is much faster.
>
> - You could even use a loop to build the datastructure, checking for every cbind() whether the value in column 1 already exists in the table - but that's terrible and would make a kitten die somewhere on every iteration.
>
> All of these are possible, and you haven't told us enough about what you want to achieve to figure out what the "best" is. If you choose one of the options and need help with the code, let us know.
>
> Cheers,
> B.
>
>
>
>
>
> On Dec 17, 2014, at 10:15 PM, bcrombie <bcrombie at utk.edu> wrote:
>
>> # I have a dataframe that contains 2 columns:
>> CaseID <- c('1015285',
>> '1005317',
>> '1012281',
>> '1015285',
>> '1015285',
>> '1007183',
>> '1008833',
>> '1015315',
>> '1015322',
>> '1015285')
>>
>> Primary.Viol.Type <- c('AS.Age',
>> 'HS.Hours',
>> 'HS.Hours',
>> 'HS.Hours',
>> 'RK.Records_CL',
>> 'OT.Overtime',
>> 'OT.Overtime',
>> 'OT.Overtime',
>> 'V.Poster_Other',
>> 'V.Poster_Other')
>>
>> PViol.Type.Per.Case.Original <- data.frame(CaseID,Primary.Viol.Type)
>>
>> # CaseID's can be repeated because there can be up to 14
>> Primary.Viol.Type's per CaseID.
>>
>> # I want to transform this dataframe into one that has 15 columns,
>> where the first column is CaseID, and the rest are the 14 primary
>> viol. types. The CaseID column will contain a list of the unique
>> CaseID's (no replicates) and for each of their rows, there will be a
>> "1" under a column corresponding to a primary violation type recorded
>> for that CaseID. So, technically, there could be zero to 14 "1's" in a CaseID's row.
>>
>> # For example, the row for CaseID '1015285' above would have a "1"
>> under "AS.Age", "HS.Hours", "RK.Records_CL", and "V.Poster_Other", but have "NA"
>> under the rest of the columns.
>>
>> PViol.Type <- c("CaseID",
>> "BW.BackWages",
>> "LD.Liquid_Damages",
>> "MW.Minimum_Wage",
>> "OT.Overtime",
>> "RK.Records_FLSA",
>> "V.Poster_Other",
>> "AS.Age",
>> "BW.WHMIS_BackWages",
>> "HS.Hours",
>> "OA.HazOccupationAg",
>> "ON.HazOccupationNonAg",
>> "R3.Reg3AgeOccupation",
>> "RK.Records_CL",
>> "V.Other")
>>
>> PViol.Type.Columns <- t(data.frame(PViol.Type)
>>
>> # What is the best way to do this in R?
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/Make-2nd-col-of-2-col-df-into-header-row
>> -of-same-df-then-adjust-col1-data-display-tp4700878.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list