[R] what is wrong with this dataset?
Carl Witthoft
carl at witthoft.com
Thu Nov 24 03:08:11 CET 2011
As the Kroger Data Munger Guru would say, "What is the problem you
are trying to solve?"
The datasets look just fine from a structural point of view. What do you
want to do and what is wrong with the results you get?
<quote>
From: Kaiyin Zhong <kindlychung_at_gmail.com>
Date: Thu, 24 Nov 2011 09:39:20 +0800
> d = data.frame(gender=rep(c('f','m'), 5), pos=rep(c('worker', 'manager',
'speaker', 'sales', 'investor'), 2), lot1=rnorm(10), lot2=rnorm(10))
> d
gender pos lot1 lot2
1 f worker 1.1035316 0.8710510
2 m manager -0.4824027 -0.2595865
3 f speaker 0.8933589 -0.5966119
4 m sales 0.4489920 0.4971199
5 f investor 0.9246900 -0.7531117
6 m worker 0.2777642 -0.3338369
7 f manager -1.0890828 0.7073686
8 m speaker -1.3045821 0.4373199
9 f sales 0.3092965 -2.6441382
10 m investor -0.5770073 -1.5200347
> cast(melt(d))
Using gender, pos as id variables
gender pos lot1 lot2
1 f investor 0.9246900 -0.7531117
2 f manager -1.0890828 0.7073686
3 f sales 0.3092965 -2.6441382
4 f speaker 0.8933589 -0.5966119
5 f worker 1.1035316 0.8710510
6 m investor -0.5770073 -1.5200347
7 m manager -0.4824027 -0.2595865
8 m sales 0.4489920 0.4971199
9 m speaker -1.3045821 0.4373199
10 m worker 0.2777642 -0.3338369
> dataset = read.csv('datalist.csv')
> dataset
Gender Title Category Salary
1 M Manager 3 27000
2 F Manager 2 22500
3 M Sales Rep 1 18000
4 M Sales Rep 3 27000
5 F Manager 3 27000
6 M Secretary 4 31500
7 M Sales Rep 2 22500
8 M Secretary 2 22500
9 M Worker 4 40500
10 M Manager 4 37100
11 F Secretary 2 22500
12 F Manager 3 27000
13 M Worker 2 20000
14 M Manager 4 32000
15 F Sales Rep 2 22900
16 M Sales Rep 3 27000
17 F Sales Rep 2 22500
18 M Manager 1 18000
19 M Secretary 3 27000
20 F Sales Rep 3 27000
21 M Secretary 4 31500
22 M Worker 2 22500
23 M Manager 2 22500
24 M Worker 4 40500
25 M Worker 4 37100
26 F Secretary 2 22500
27 F Manager 3 27000
28 M Worker 2 20000
29 M Manager 4 32000
30 F Sales Rep 2 22900
> cast(melt(dataset))
Using Gender, Title as id variables
Aggregation requires fun.aggregate: length used as default
Gender Title Category Salary
1 F Manager 4 4
2 F Sales Rep 4 4
3 F Secretary 2 2
4 M Manager 6 6
5 M Sales Rep 4 4
6 M Secretary 4 4
7 M Worker 6 6
The content of datalist.xls is here:
http://paste.pound-python.org/show/15098/
--
Sent from my Cray XK6
"Pendeo-navem mei anguillae plena est."
More information about the R-help
mailing list