[R] reshape: melt and cast
stephen sefick
ssefick at gmail.com
Tue Sep 1 17:46:04 CEST 2015
You are welcome.
On Tue, Sep 1, 2015 at 10:44 AM, Matthew Pickard <
matthew.david.pickard at gmail.com> wrote:
> Yep, that works. Thanks, Stephen. I should have drawn the parallel with
> Excel Pivot tables sooner.
>
> On Tue, Sep 1, 2015 at 9:36 AM, stephen sefick <ssefick at gmail.com> wrote:
>
>> I would make this minimal. In other words, use an example data set, dput,
>> and use output of dput in a block of reproducible code. I don't understand
>> exactly what you want, but does sum work? If there is more than one record
>> for a given set of factors the sum is the sum of the counts. If only one
>> record, then the sum is the same as the original number.
>>
>> On Tue, Sep 1, 2015 at 10:00 AM, Matthew Pickard <
>> matthew.david.pickard at gmail.com> wrote:
>>
>>> Thanks, Stephen. I've looked into the fun.aggregate argument. I don't
>>> want to aggregate, so I thought leaving it blank (allowing it to default to
>>> NULL) would do that.
>>>
>>>
>>> Here's a corrected post (with further explanation):
>>>
>>> Hi,
>>>
>>> I have data that looks like this:
>>>
>>> >dput(head(ratings))
>>> structure(list(QCode = structure(c(5L, 7L, 5L, 7L, 5L, 7L), .Label =
>>> c("APPEAR",
>>> "FEAR", "FUN", "GRAT", "GUILT", "Joy", "LOVE", "UNGRAT"), class =
>>> "factor"),
>>> PID = structure(c(1L, 1L, 2L, 2L, 3L, 3L), .Label = c("1123",
>>> "1136", "1137", "1142", "1146", "1147", "1148", "1149", "1152",
>>> "1153", "1154", "1156", "1158", "1161", "1164", "1179", "1182",
>>> "1183", "1191", "1196", "1197", "1198", "1199", "1200", "1201",
>>> "1203", "1205", "1207", "1208", "1209", "1214", "1216", "1219",
>>> "1220", "1222", "1223", "1224", "1225", "1226", "1229", "1236",
>>> "1237", "1238", "1240", "1241", "1243", "1245", "1246", "1248",
>>> "1254", "1255", "1256", "1257", "1260", "1262", "1264", "1268",
>>> "1270", "1272", "1278", "1279", "1280", "1282", "1283", "1287",
>>> "1288", "1292", "1293", "1297", "1310", "1311", "1315", "1329",
>>> "1332", "1333", "1343", "1346", "1347", "1352", "1354", "1355",
>>> "1356", "1360", "1368", "1369", "1370", "1378", "1398", "1400",
>>> "1403", "1404", "1411", "1412", "1420", "1421", "1423", "1424",
>>> "1426", "1428", "1432", "1433", "1435", "1436", "1438", "1439",
>>> "1440", "1441", "1443", "1444", "1446", "1447", "1448", "1449",
>>> "1450", "1453", "1454", "1456", "1459", "1460", "1461", "1462",
>>> "1463", "1468", "1471", "1475", "1478", "1481", "1482", "1487",
>>> "1488", "1490", "1493", "1495", "1497", "1503", "1504", "1508",
>>> "1509", "1511", "1513", "1514", "1515", "1522", "1524", "1525",
>>> "1526", "1527", "1528", "1529", "1532", "1534", "1536", "1538",
>>> "1539", "1540", "1543", "1550", "1551", "1552", "1554", "1555",
>>> "1556", "1558", "1559"), class = "factor"), RaterName =
>>> structure(c(1L,
>>> 1L, 1L, 1L, 1L, 1L), .Label = c("cwormhoudt", "zspeidel"), class =
>>> "factor"),
>>> SI1 = c(2L, 1L, 1L, 1L, 2L, 1L), SI2 = c(2L, 2L, 2L, 2L,
>>> 2L, 3L), SI3 = c(3L, 3L, 3L, 3L, 2L, 4L), SI4 = c(1L, 2L,
>>> 1L, 1L, 1L, 1L), SI5 = c(1L, 1L, 1L, 1L, 1L, 1L), SI6 = c(1L,
>>> 1L, 1L, 1L, 1L, 1L), SI7 = c(3L, 1L, 2L, 1L, 2L, 1L), SI8 = c(3L,
>>> 1L, 3L, 1L, 3L, 1L), SI9 = c(3L, 1L, 2L, 1L, 1L, 1L), SI10 = c(2L,
>>> 1L, 2L, 1L, 2L, 1L), SI11 = c(1L, 3L, 1L, 2L, 1L, 4L)), .Names =
>>> c("QCode",
>>> "PID", "RaterName", "SI1", "SI2", "SI3", "SI4", "SI5", "SI6",
>>> "SI7", "SI8", "SI9", "SI10", "SI11"), row.names = c(NA, 6L), class =
>>> "data.frame")
>>>
>>>
>>> > dput(tail(ratings))
>>> structure(list(QCode = structure(c(3L, 3L, 3L, 1L, 1L, 3L), .Label =
>>> c("APPEAR",
>>> "FEAR", "FUN", "GRAT", "GUILT", "Joy", "LOVE", "UNGRAT"), class =
>>> "factor"),
>>> PID = structure(c(161L, 162L, 163L, 163L, 164L, 164L), .Label =
>>> c("1123",
>>> "1136", "1137", "1142", "1146", "1147", "1148", "1149", "1152",
>>> "1153", "1154", "1156", "1158", "1161", "1164", "1179", "1182",
>>> "1183", "1191", "1196", "1197", "1198", "1199", "1200", "1201",
>>> "1203", "1205", "1207", "1208", "1209", "1214", "1216", "1219",
>>> "1220", "1222", "1223", "1224", "1225", "1226", "1229", "1236",
>>> "1237", "1238", "1240", "1241", "1243", "1245", "1246", "1248",
>>> "1254", "1255", "1256", "1257", "1260", "1262", "1264", "1268",
>>> "1270", "1272", "1278", "1279", "1280", "1282", "1283", "1287",
>>> "1288", "1292", "1293", "1297", "1310", "1311", "1315", "1329",
>>> "1332", "1333", "1343", "1346", "1347", "1352", "1354", "1355",
>>> "1356", "1360", "1368", "1369", "1370", "1378", "1398", "1400",
>>> "1403", "1404", "1411", "1412", "1420", "1421", "1423", "1424",
>>> "1426", "1428", "1432", "1433", "1435", "1436", "1438", "1439",
>>> "1440", "1441", "1443", "1444", "1446", "1447", "1448", "1449",
>>> "1450", "1453", "1454", "1456", "1459", "1460", "1461", "1462",
>>> "1463", "1468", "1471", "1475", "1478", "1481", "1482", "1487",
>>> "1488", "1490", "1493", "1495", "1497", "1503", "1504", "1508",
>>> "1509", "1511", "1513", "1514", "1515", "1522", "1524", "1525",
>>> "1526", "1527", "1528", "1529", "1532", "1534", "1536", "1538",
>>> "1539", "1540", "1543", "1550", "1551", "1552", "1554", "1555",
>>> "1556", "1558", "1559"), class = "factor"), RaterName =
>>> structure(c(2L,
>>> 2L, 2L, 2L, 2L, 2L), .Label = c("cwormhoudt", "zspeidel"), class =
>>> "factor"),
>>> SI1 = c(1L, 1L, 1L, 1L, 1L, 1L), SI2 = c(3L, 2L, 2L, 3L,
>>> 3L, 2L), SI3 = c(3L, 2L, 3L, 3L, 3L, 2L), SI4 = c(1L, 1L,
>>> 1L, 1L, 1L, 1L), SI5 = c(1L, 1L, 1L, 1L, 1L, 1L), SI6 = c(1L,
>>> 1L, 1L, 1L, 1L, 1L), SI7 = c(1L, 1L, 1L, 2L, 2L, 1L), SI8 = c(1L,
>>> 1L, 1L, 1L, 1L, 1L), SI9 = c(1L, 1L, 1L, 1L, 1L, 1L), SI10 = c(1L,
>>> 1L, 1L, 2L, 2L, 1L), SI11 = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names =
>>> c("QCode",
>>> "PID", "RaterName", "SI1", "SI2", "SI3", "SI4", "SI5", "SI6",
>>> "SI7", "SI8", "SI9", "SI10", "SI11"), row.names = 2456:2461, class =
>>> "data.frame")
>>>
>>>
>>> I am trying to use the melt and cast functions to re-arrange to have
>>> column names QCode, PID, sItem, cwormhoudt, zpeidel. Under each of the
>>> last two columns I want the values that correspond to each of RaterNames.
>>>
>>> So, I melt the data like this:
>>>
>>> mratings = melt(ratings, variable_name="sItem")
>>>
>>> Then cast the data like this:
>>>
>>> > outData = cast(mratings, QCode + PID + sItem ~ RaterName)
>>> Aggregation requires fun.aggregate: length used as default
>>>
>>> But the value columns appear to be displaying counts and not the
>>> original values.
>>>
>>> > dput(head(outData))
>>> structure(list(QCode = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label =
>>> c("APPEAR",
>>> "FEAR", "FUN", "GRAT", "GUILT", "Joy", "LOVE", "UNGRAT"), class =
>>> "factor"),
>>> PID = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1123",
>>> "1136", "1137", "1142", "1146", "1147", "1148", "1149", "1152",
>>> "1153", "1154", "1156", "1158", "1161", "1164", "1179", "1182",
>>> "1183", "1191", "1196", "1197", "1198", "1199", "1200", "1201",
>>> "1203", "1205", "1207", "1208", "1209", "1214", "1216", "1219",
>>> "1220", "1222", "1223", "1224", "1225", "1226", "1229", "1236",
>>> "1237", "1238", "1240", "1241", "1243", "1245", "1246", "1248",
>>> "1254", "1255", "1256", "1257", "1260", "1262", "1264", "1268",
>>> "1270", "1272", "1278", "1279", "1280", "1282", "1283", "1287",
>>> "1288", "1292", "1293", "1297", "1310", "1311", "1315", "1329",
>>> "1332", "1333", "1343", "1346", "1347", "1352", "1354", "1355",
>>> "1356", "1360", "1368", "1369", "1370", "1378", "1398", "1400",
>>> "1403", "1404", "1411", "1412", "1420", "1421", "1423", "1424",
>>> "1426", "1428", "1432", "1433", "1435", "1436", "1438", "1439",
>>> "1440", "1441", "1443", "1444", "1446", "1447", "1448", "1449",
>>> "1450", "1453", "1454", "1456", "1459", "1460", "1461", "1462",
>>> "1463", "1468", "1471", "1475", "1478", "1481", "1482", "1487",
>>> "1488", "1490", "1493", "1495", "1497", "1503", "1504", "1508",
>>> "1509", "1511", "1513", "1514", "1515", "1522", "1524", "1525",
>>> "1526", "1527", "1528", "1529", "1532", "1534", "1536", "1538",
>>> "1539", "1540", "1543", "1550", "1551", "1552", "1554", "1555",
>>> "1556", "1558", "1559"), class = "factor"), sItem = structure(1:6,
>>> .Label = c("SI1",
>>> "SI2", "SI3", "SI4", "SI5", "SI6", "SI7", "SI8", "SI9", "SI10",
>>> "SI11"), class = "factor"), cwormhoudt = c(1L, 1L, 1L, 1L,
>>> 1L, 1L), zspeidel = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("QCode",
>>> "PID", "sItem", "cwormhoudt", "zspeidel"), row.names = c(NA,
>>> 6L), class = "data.frame")
>>>
>>> The correct output would have 3s, 4s, and 5s in the RaterName columns:
>>>
>>> > which(outData$zpeidel==3)
>>> integer(0)
>>>
>>> I don't want it to aggregate the data, I simply want it to insert the
>>> original values into the two RaterName columns. In cast, the fun.aggregate
>>> defaults to NULL, which I assumed was no function. But the output says it
>>> uses length as default.
>>>
>>> How to I prevent cast from aggregating the data according to counts? Am
>>> I doing something wrong?
>>>
>>> Thanks in advance.
>>>
>>>
>>> On Mon, Aug 31, 2015 at 11:32 PM, stephen sefick <ssefick at gmail.com>
>>> wrote:
>>>
>>>> This is very hard to read. Please use dput to provide data. I believe
>>>> the answer is in the manual. Look at the aggregation function argument.
>>>>
>>>> Please excuse my brevity; this message was sent from my telephone.
>>>> On Sep 1, 2015 12:11 AM, "Matt Pickard" <
>>>> matthew.david.pickard at gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have data that looks like this:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *> head(ratings) QCode PID RaterName SI1 SI2 SI3 SI4 SI5 SI6 SI7
>>>>> SI8 SI9
>>>>> SI10 SI111 GUILT 1123 cwormhoudt 2 2 3 1 1 1 3 3 3
>>>>> 2 12 LOVE 1123 cwormhoudt 1 2 3 2 1 1 1 1 1 1
>>>>> 33 GUILT 1136 cwormhoudt 1 2 3 1 1 1 2 3 2 2 14
>>>>> LOVE 1136 cwormhoudt 1 2 3 1 1 1 1 1 1 1 25
>>>>> GUILT
>>>>> 1137 cwormhoudt 2 2 2 1 1 1 2 3 1 2 16 LOVE
>>>>> 1137
>>>>> cwormhoudt 1 3 4 1 1 1 1 1 1 1 4*
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *> tail(ratings) QCode PID RaterName SI1 SI2 SI3 SI4 SI5 SI6 SI7
>>>>> SI8
>>>>> SI9 SI10 SI112456 FUN 1555 zspeidel 1 3 3 1 1 1 1 1
>>>>> 1 1 12457 FUN 1556 zspeidel 1 2 2 1 1 1 1 1
>>>>> 1 1 12458 FUN 1558 zspeidel 1 2 3 1 1 1 1 1
>>>>> 1 1 12459 APPEAR 1558 zspeidel 1 3 3 1 1 1 2 1
>>>>> 1 2 12460 APPEAR 1559 zspeidel 1 3 3 1 1 1 2 1
>>>>> 1 2 12461 FUN 1559 zspeidel 1 2 2 1 1 1 1 1
>>>>> 1 1 1*
>>>>> I am trying to use the melt and cast functions to re-arrange it to look
>>>>> like this:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> * QCode PID sItem cwormhoudt zspeidel1 APPEAR 1123 SI1
>>>>> 1 12 APPEAR 1123 SI2 4 13 APPEAR 1123
>>>>> SI3 1 24 APPEAR 1123 SI4 3 15 APPEAR
>>>>> 1123 SI5 1 16 APPEAR 1123 SI6 1 3*
>>>>> So, I melt the data like this:
>>>>>
>>>>>
>>>>>
>>>>> *mratings = melt(ratings, variable_name="sItem")*
>>>>> Then cast the data like this:
>>>>>
>>>>>
>>>>> *> outData = cast(mratings, QCode + PID + sItem ~ RaterName)Aggregation
>>>>> requires fun.aggregate: length used as default*
>>>>>
>>>>> But the value columns appear to be displaying counts and not the
>>>>> original
>>>>> values:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *> head(outData) QCode PID sItem cwormhoudt zspeidel1 APPEAR 1123
>>>>> SI1 1 12 APPEAR 1123 SI2 1 13 APPEAR
>>>>> 1123 SI3 1 14 APPEAR 1123 SI4 1 15
>>>>> APPEAR 1123 SI5 1 16 APPEAR 1123 SI6 1
>>>>> 1> which(outData$zpeidel==3)integer(0)*
>>>>> How to I prevent cast from aggregating the data according to counts?
>>>>> Am I
>>>>> doing something wrong?
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>> MP
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>
>>
>>
>> --
>> Stephen Sefick
>> **************************************************
>> Auburn University
>> Biological Sciences
>> 331 Funchess Hall
>> Auburn, Alabama
>> 36849
>> **************************************************
>> sas0025 at auburn.edu
>> http://www.auburn.edu/~sas0025
>> **************************************************
>>
>> Let's not spend our time and resources thinking about things that are so
>> little or so large that all they really do for us is puff us up and make us
>> feel like gods. We are mammals, and have not exhausted the annoying little
>> problems of being mammals.
>>
>> -K. Mullis
>>
>> "A big computer, a complex algorithm and a long time does not equal
>> science."
>>
>> -Robert Gentleman
>>
>>
>
--
Stephen Sefick
**************************************************
Auburn University
Biological Sciences
331 Funchess Hall
Auburn, Alabama
36849
**************************************************
sas0025 at auburn.edu
http://www.auburn.edu/~sas0025
**************************************************
Let's not spend our time and resources thinking about things that are so
little or so large that all they really do for us is puff us up and make us
feel like gods. We are mammals, and have not exhausted the annoying little
problems of being mammals.
-K. Mullis
"A big computer, a complex algorithm and a long time does not equal
science."
-Robert Gentleman
[[alternative HTML version deleted]]
More information about the R-help
mailing list