[R] data frame is killing me! help

Petr PIKAL petr.pikal at precheza.cz
Mon Oct 26 08:49:02 CET 2009


Hi

> data(gasoline)
> str(gasoline)
'data.frame':   60 obs. of  2 variables:
 $ octane: num  85.3 85.2 88.5 83.4 87.9 ...
 $ NIR   : AsIs [1:60, 1:401] -0.050193 -0.044227 -0.046867 -0.046705 
-0.050859 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr  "1" "2" "3" "4" ...
  .. ..$ : chr  "900 nm" "902 nm" "904 nm" "906 nm" ...
> str(gasoline$NIR)
 AsIs [1:60, 1:401] -0.050193 -0.044227 -0.046867 -0.046705 -0.050859 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:60] "1" "2" "3" "4" ...
  ..$ : chr [1:401] "900 nm" "902 nm" "904 nm" "906 nm" ...
> is.matrix(gasoline$NIR)
[1] TRUE

so the second element of gasoline data frame is a matrix

> ?AsIs

> df<-data.frame(x=1:5, I(matrix(rnorm(10), 5,2)))
> df
  x matrix.rnorm.10...5..2..1 matrix.rnorm.10...5..2..2
1 1              0.187703....              0.213312....
2 2              -0.66264....              -0.47941....
3 3              -0.82334....              -0.04324....
4 4              -0.37255....              0.883027....
5 5              -0.28700....              -1.03431....
> str(df)
'data.frame':   5 obs. of  2 variables:
 $ x                      : int  1 2 3 4 5
 $ matrix.rnorm.10...5..2.: AsIs [1:5, 1:2] 0.187703.... -0.66264.... 
-0.82334.... -0.37255.... -0.28700.... ...
> 

Regards
Petr

r-help-bounces at r-project.org napsal dne 23.10.2009 18:43:56:

> 
> I have read that one ,I want to this method to be used to my data.but I 
donot
> know how to put my data into R. 
> 
> James W. MacDonald wrote:
> > 
> > 
> > 
> > bbslover wrote:
> >> 
> >> 
> >> Steve Lianoglou-6 wrote:
> >>> Hi,
> >>>
> >>> On Oct 22, 2009, at 2:35 PM, bbslover wrote:
> >>>
> >>>> Usage
> >>>> data(gasoline)
> >>>> Format
> >>>> A data frame with 60 observations on the following 2 variables.
> >>>> octane
> >>>> a numeric vector. The octane number.
> >>>> NIR
> >>>> a matrix with 401 columns. The NIR spectrum
> >>>>
> >>>> and I see the gasoline data to see below
> >>>> NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.1694 nm 
NIR.1696 
> >>>> nm
> >>>> NIR.1698 nm NIR.1700 nm
> >>>> 1 1.242645 1.250789 1.246626 1.250985 1.264189 1.244678 1.245913 
> >>>> 1.221135
> >>>> 2 1.189116 1.223242 1.253306 1.282889 1.215065 1.225211 1.227985 
> >>>> 1.198851
> >>>> 3 1.198287 1.237383 1.260979 1.276677 1.218871 1.223132 1.230321 
> >>>> 1.208742
> >>>> 4 1.201066 1.233299 1.262966 1.272709 1.211068 1.215044 1.232655 
> >>>> 1.206696
> >>>> 5 1.259616 1.273713 1.296524 1.299507 1.226448 1.230718 1.232864 
> >>>> 1.202926
> >>>> 6 1.24109 1.262138 1.288401 1.291118 1.229769 1.227615 1.22763 
> >>>> 1.207576
> >>>> 7 1.245143 1.265648 1.274731 1.292441 1.218317 1.218147 1.222273 
> >>>> 1.200446
> >>>> 8 1.222581 1.245782 1.26002 1.290305 1.221264 1.220265 1.227947 
> >>>> 1.188174
> >>>> 9 1.234969 1.251559 1.272416 1.287405 1.211995 1.213263 1.215883 
> >>>> 1.196102
> >>>>
> >>>> look at this NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR. 
> >>>> 1694 nm
> >>>> NIR.1696 nm NIR.1698 nm NIR.1700 nm
> >>>>
> >>>> how can I add letters NIR to my variable, because my 600 
> >>>> independents never
> >>>> have NIR as the prefix. however, it is needed to model the plsr. 
for
> >>>> example aa=plsr(y~NIR, data=data ,....), the prefix NIR is 
> >>>> necessary, how
> >>>> can I do with it?
> >>> I'm not really sue that I'm getting you, but if your problem is that 
 
> >>> the column names of your data.frame don't match the variable names 
> >>> you'd like to use in your formula, just change the colnames of your 
> >>> data.frame to match your formula.
> >>>
> >>> BTW - I have no idea where to get this gasoline data set, so I'm 
just 
> >>> imagining:
> >>>
> >>> eg.
> >>> colnames(gasoline) <- c('put', 'the', 'variable', 'names', 'that', 
> >>> 'you', 'want', 'here')
> >>>
> >>> -steve
> >>>
> >>> --
> >>> Steve Lianoglou
> >>> Graduate Student: Computational Systems Biology
> >>>    |  Memorial Sloan-Kettering Cancer Center
> >>>    |  Weill Medical College of Cornell University
> >>> Contact Info: http://cbio.mskcc.org/~lianos/contact
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>>
> >> 
> >> thanks for you. but the numbers of indenpendence are so many, it is 
not
> >> easy
> >> to identify them one by one,  is there some better way?
> > 
> > You don't need to identify anything. What you need to do is read the 
> > help page for the function you want to use, so you (at the very least) 

> > know how to use the function.
> > 
> >  > library(pls)
> >  > data(gasoline)
> >  > fit <- plsr(octane~NIR, data=gasoline, validation = "CV")
> >  > summary(fit)
> > Data:    X dimension: 60 401
> >    Y dimension: 60 1
> > Fit method: kernelpls
> > Number of components considered: 53
> > 
> > VALIDATION: RMSEP
> > Cross-validated using 10 random segments.
> >         (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 
comps
> > CV           1.543    1.372   0.3827   0.2522   0.2347   0.2455 0.2281
> > adjCV        1.543    1.367   0.3740   0.2497   0.2360   0.2407 0.2243
> >         7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 
comps
> > CV      0.2311   0.2352   0.2455    0.2534    0.2737    0.2814 0.2832
> > adjCV   0.2257   0.2303   0.2395    0.2473    0.2646    0.2705 0.2726
> >         14 comps  15 comps  16 comps  17 comps  18 comps  19 comps  20
> > comps
> > CV       0.2913    0.2932    0.2985    0.3137    0.3289    0.3323 
> > 0.3391
> > adjCV    0.2808    0.2821    0.2863    0.3008    0.3141    0.3172 
> > 0.3228
> >         21 comps  22 comps  23 comps  24 comps  25 comps  26 comps  27
> > comps
> > CV       0.3476    0.3384    0.3316    0.3213    0.3155    0.3118 
> > 0.3062
> > adjCV    0.3307    0.3217    0.3154    0.3057    0.3002    0.2964 
> > 0.2908
> >         28 comps  29 comps  30 comps  31 comps  32 comps  33 comps  34
> > comps
> > CV       0.3033    0.3034    0.3074    0.3083    0.3094    0.3087 
> > 0.3105
> > adjCV    0.2881    0.2881    0.2917    0.2926    0.2936    0.2929 
> > 0.2946
> >         35 comps  36 comps  37 comps  38 comps  39 comps  40 comps  41
> > comps
> > CV       0.3108    0.3106    0.3105    0.3104    0.3104    0.3105 
> > 0.3105
> > adjCV    0.2949    0.2947    0.2946    0.2945    0.2945    0.2945 
> > 0.2946
> >         42 comps  43 comps  44 comps  45 comps  46 comps  47 comps  48
> > comps
> > CV       0.3105    0.3105    0.3105    0.3105    0.3105    0.3105 
> > 0.3105
> > adjCV    0.2946    0.2946    0.2946    0.2946    0.2946    0.2946 
> > 0.2946
> >         49 comps  50 comps  51 comps  52 comps  53 comps
> > CV       0.3105    0.3105    0.3105    0.3105    0.3105
> > adjCV    0.2946    0.2946    0.2946    0.2946    0.2946
> > 
> > TRAINING: % variance explained
> >          1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps 

> > 8 comps
> > X         70.97    78.56    86.15     95.4    96.12    96.97    97.32 
> >    98.1
> > octane    31.90    94.66    97.71     98.0    98.68    98.93    99.06 
> >    99.1
> >          9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15
> > comps
> > X         98.32     98.71     98.84     99.00     99.21     99.46 
> > 99.52
> > octane    99.20     99.24     99.36     99.44     99.49     99.51 
> > 99.58
> >          16 comps  17 comps  18 comps  19 comps  20 comps  21 comps 22 

> > comps
> > X          99.57     99.64     99.68     99.76     99.78     99.82 
> > 99.84
> > octane     99.65     99.69     99.78     99.81     99.86     99.89 
> > 99.92
> >          23 comps  24 comps  25 comps  26 comps  27 comps  28 comps 29 

> > comps
> > X          99.88     99.91     99.92     99.93     99.94     99.95 
> > 99.96
> > octane     99.93     99.94     99.95     99.97     99.98     99.99 
> > 99.99
> >          30 comps  31 comps  32 comps  33 comps  34 comps  35 comps 36 

> > comps
> > X          99.96     99.97     99.97     99.98     99.98     99.98 
> > 99.98
> > octane     99.99    100.00    100.00    100.00    100.00    100.00 
> > 100.00
> >          37 comps  38 comps  39 comps  40 comps  41 comps  42 comps 43 

> > comps
> > X          99.99     99.99     99.99     99.99       100       100  
> > 100
> > octane    100.00    100.00    100.00    100.00       100       100  
> > 100
> >          44 comps  45 comps  46 comps  47 comps  48 comps  49 comps 50 

> > comps
> > X            100       100       100       100       100       100  
> > 100
> > octane       100       100       100       100       100       100  
> > 100
> >          51 comps  52 comps  53 comps
> > X            100       100       100
> > octane       100       100       100
> > 
> > 
> >> 
> >> 
> > 
> > -- 
> > James W. MacDonald, M.S.
> > Biostatistician
> > Douglas Lab
> > University of Michigan
> > Department of Human Genetics
> > 5912 Buhl
> > 1241 E. Catherine St.
> > Ann Arbor MI 48109-5618
> > 734-615-7826
> > 
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> > 
> > 
> 
> -- 
> View this message in context: 
http://www.nabble.com/data-frame-is-killing-me%
> 21-help-tp26015079p26029667.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list