[R] data frame is killing me! help

Mon Oct 26 17:44:53 CET 2009

Thank you ,Petr
It is a good answer,clearly.

thanks! 

Petr Pikal wrote:
> 
> Hi
> 
>> data(gasoline)
>> str(gasoline)
> 'data.frame':   60 obs. of  2 variables:
>  $ octane: num  85.3 85.2 88.5 83.4 87.9 ...
>  $ NIR   : AsIs [1:60, 1:401] -0.050193 -0.044227 -0.046867 -0.046705 
> -0.050859 ...
>   ..- attr(*, "dimnames")=List of 2
>   .. ..$ : chr  "1" "2" "3" "4" ...
>   .. ..$ : chr  "900 nm" "902 nm" "904 nm" "906 nm" ...
>> str(gasoline$NIR)
>  AsIs [1:60, 1:401] -0.050193 -0.044227 -0.046867 -0.046705 -0.050859 ...
>  - attr(*, "dimnames")=List of 2
>   ..$ : chr [1:60] "1" "2" "3" "4" ...
>   ..$ : chr [1:401] "900 nm" "902 nm" "904 nm" "906 nm" ...
>> is.matrix(gasoline$NIR)
> [1] TRUE
> 
> so the second element of gasoline data frame is a matrix
> 
>> ?AsIs
> 
>> df<-data.frame(x=1:5, I(matrix(rnorm(10), 5,2)))
>> df
>   x matrix.rnorm.10...5..2..1 matrix.rnorm.10...5..2..2
> 1 1              0.187703....              0.213312....
> 2 2              -0.66264....              -0.47941....
> 3 3              -0.82334....              -0.04324....
> 4 4              -0.37255....              0.883027....
> 5 5              -0.28700....              -1.03431....
>> str(df)
> 'data.frame':   5 obs. of  2 variables:
>  $ x                      : int  1 2 3 4 5
>  $ matrix.rnorm.10...5..2.: AsIs [1:5, 1:2] 0.187703.... -0.66264.... 
> -0.82334.... -0.37255.... -0.28700.... ...
>> 
> 
> Regards
> Petr
> 
> r-help-bounces at r-project.org napsal dne 23.10.2009 18:43:56:
> 
>> 
>> I have read that one ,I want to this method to be used to my data.but I 
> donot
>> know how to put my data into R. 
>> 
>> James W. MacDonald wrote:
>> > 
>> > 
>> > 
>> > bbslover wrote:
>> >> 
>> >> 
>> >> Steve Lianoglou-6 wrote:
>> >>> Hi,
>> >>>
>> >>> On Oct 22, 2009, at 2:35 PM, bbslover wrote:
>> >>>
>> >>>> Usage
>> >>>> data(gasoline)
>> >>>> Format
>> >>>> A data frame with 60 observations on the following 2 variables.
>> >>>> octane
>> >>>> a numeric vector. The octane number.
>> >>>> NIR
>> >>>> a matrix with 401 columns. The NIR spectrum
>> >>>>
>> >>>> and I see the gasoline data to see below
>> >>>> NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.1694 nm 
> NIR.1696 
>> >>>> nm
>> >>>> NIR.1698 nm NIR.1700 nm
>> >>>> 1 1.242645 1.250789 1.246626 1.250985 1.264189 1.244678 1.245913 
>> >>>> 1.221135
>> >>>> 2 1.189116 1.223242 1.253306 1.282889 1.215065 1.225211 1.227985 
>> >>>> 1.198851
>> >>>> 3 1.198287 1.237383 1.260979 1.276677 1.218871 1.223132 1.230321 
>> >>>> 1.208742
>> >>>> 4 1.201066 1.233299 1.262966 1.272709 1.211068 1.215044 1.232655 
>> >>>> 1.206696
>> >>>> 5 1.259616 1.273713 1.296524 1.299507 1.226448 1.230718 1.232864 
>> >>>> 1.202926
>> >>>> 6 1.24109 1.262138 1.288401 1.291118 1.229769 1.227615 1.22763 
>> >>>> 1.207576
>> >>>> 7 1.245143 1.265648 1.274731 1.292441 1.218317 1.218147 1.222273 
>> >>>> 1.200446
>> >>>> 8 1.222581 1.245782 1.26002 1.290305 1.221264 1.220265 1.227947 
>> >>>> 1.188174
>> >>>> 9 1.234969 1.251559 1.272416 1.287405 1.211995 1.213263 1.215883 
>> >>>> 1.196102
>> >>>>
>> >>>> look at this NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR. 
>> >>>> 1694 nm
>> >>>> NIR.1696 nm NIR.1698 nm NIR.1700 nm
>> >>>>
>> >>>> how can I add letters NIR to my variable, because my 600 
>> >>>> independents never
>> >>>> have NIR as the prefix. however, it is needed to model the plsr. 
> for
>> >>>> example aa=plsr(y~NIR, data=data ,....), the prefix NIR is 
>> >>>> necessary, how
>> >>>> can I do with it?
>> >>> I'm not really sue that I'm getting you, but if your problem is that 
>  
>> >>> the column names of your data.frame don't match the variable names 
>> >>> you'd like to use in your formula, just change the colnames of your 
>> >>> data.frame to match your formula.
>> >>>
>> >>> BTW - I have no idea where to get this gasoline data set, so I'm 
> just 
>> >>> imagining:
>> >>>
>> >>> eg.
>> >>> colnames(gasoline) <- c('put', 'the', 'variable', 'names', 'that', 
>> >>> 'you', 'want', 'here')
>> >>>
>> >>> -steve
>> >>>
>> >>> --
>> >>> Steve Lianoglou
>> >>> Graduate Student: Computational Systems Biology
>> >>>    |  Memorial Sloan-Kettering Cancer Center
>> >>>    |  Weill Medical College of Cornell University
>> >>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>> >>>
>> >>> ______________________________________________
>> >>> R-help at r-project.org mailing list
>> >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> PLEASE do read the posting guide
>> >>> http://www.R-project.org/posting-guide.html
>> >>> and provide commented, minimal, self-contained, reproducible code.
>> >>>
>> >>>
>> >> 
>> >> thanks for you. but the numbers of indenpendence are so many, it is 
> not
>> >> easy
>> >> to identify them one by one,  is there some better way?
>> > 
>> > You don't need to identify anything. What you need to do is read the 
>> > help page for the function you want to use, so you (at the very least) 
> 
>> > know how to use the function.
>> > 
>> >  > library(pls)
>> >  > data(gasoline)
>> >  > fit <- plsr(octane~NIR, data=gasoline, validation = "CV")
>> >  > summary(fit)
>> > Data:    X dimension: 60 401
>> >    Y dimension: 60 1
>> > Fit method: kernelpls
>> > Number of components considered: 53
>> > 
>> > VALIDATION: RMSEP
>> > Cross-validated using 10 random segments.
>> >         (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 
> comps
>> > CV           1.543    1.372   0.3827   0.2522   0.2347   0.2455 0.2281
>> > adjCV        1.543    1.367   0.3740   0.2497   0.2360   0.2407 0.2243
>> >         7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 
> comps
>> > CV      0.2311   0.2352   0.2455    0.2534    0.2737    0.2814 0.2832
>> > adjCV   0.2257   0.2303   0.2395    0.2473    0.2646    0.2705 0.2726
>> >         14 comps  15 comps  16 comps  17 comps  18 comps  19 comps  20
>> > comps
>> > CV       0.2913    0.2932    0.2985    0.3137    0.3289    0.3323 
>> > 0.3391
>> > adjCV    0.2808    0.2821    0.2863    0.3008    0.3141    0.3172 
>> > 0.3228
>> >         21 comps  22 comps  23 comps  24 comps  25 comps  26 comps  27
>> > comps
>> > CV       0.3476    0.3384    0.3316    0.3213    0.3155    0.3118 
>> > 0.3062
>> > adjCV    0.3307    0.3217    0.3154    0.3057    0.3002    0.2964 
>> > 0.2908
>> >         28 comps  29 comps  30 comps  31 comps  32 comps  33 comps  34
>> > comps
>> > CV       0.3033    0.3034    0.3074    0.3083    0.3094    0.3087 
>> > 0.3105
>> > adjCV    0.2881    0.2881    0.2917    0.2926    0.2936    0.2929 
>> > 0.2946
>> >         35 comps  36 comps  37 comps  38 comps  39 comps  40 comps  41
>> > comps
>> > CV       0.3108    0.3106    0.3105    0.3104    0.3104    0.3105 
>> > 0.3105
>> > adjCV    0.2949    0.2947    0.2946    0.2945    0.2945    0.2945 
>> > 0.2946
>> >         42 comps  43 comps  44 comps  45 comps  46 comps  47 comps  48
>> > comps
>> > CV       0.3105    0.3105    0.3105    0.3105    0.3105    0.3105 
>> > 0.3105
>> > adjCV    0.2946    0.2946    0.2946    0.2946    0.2946    0.2946 
>> > 0.2946
>> >         49 comps  50 comps  51 comps  52 comps  53 comps
>> > CV       0.3105    0.3105    0.3105    0.3105    0.3105
>> > adjCV    0.2946    0.2946    0.2946    0.2946    0.2946
>> > 
>> > TRAINING: % variance explained
>> >          1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps 
> 
>> > 8 comps
>> > X         70.97    78.56    86.15     95.4    96.12    96.97    97.32 
>> >    98.1
>> > octane    31.90    94.66    97.71     98.0    98.68    98.93    99.06 
>> >    99.1
>> >          9 comps  10 comps  11 comps  12 comps  13 comps  14 comps  15
>> > comps
>> > X         98.32     98.71     98.84     99.00     99.21     99.46 
>> > 99.52
>> > octane    99.20     99.24     99.36     99.44     99.49     99.51 
>> > 99.58
>> >          16 comps  17 comps  18 comps  19 comps  20 comps  21 comps 22 
> 
>> > comps
>> > X          99.57     99.64     99.68     99.76     99.78     99.82 
>> > 99.84
>> > octane     99.65     99.69     99.78     99.81     99.86     99.89 
>> > 99.92
>> >          23 comps  24 comps  25 comps  26 comps  27 comps  28 comps 29 
> 
>> > comps
>> > X          99.88     99.91     99.92     99.93     99.94     99.95 
>> > 99.96
>> > octane     99.93     99.94     99.95     99.97     99.98     99.99 
>> > 99.99
>> >          30 comps  31 comps  32 comps  33 comps  34 comps  35 comps 36 
> 
>> > comps
>> > X          99.96     99.97     99.97     99.98     99.98     99.98 
>> > 99.98
>> > octane     99.99    100.00    100.00    100.00    100.00    100.00 
>> > 100.00
>> >          37 comps  38 comps  39 comps  40 comps  41 comps  42 comps 43 
> 
>> > comps
>> > X          99.99     99.99     99.99     99.99       100       100  
>> > 100
>> > octane    100.00    100.00    100.00    100.00       100       100  
>> > 100
>> >          44 comps  45 comps  46 comps  47 comps  48 comps  49 comps 50 
> 
>> > comps
>> > X            100       100       100       100       100       100  
>> > 100
>> > octane       100       100       100       100       100       100  
>> > 100
>> >          51 comps  52 comps  53 comps
>> > X            100       100       100
>> > octane       100       100       100
>> > 
>> > 
>> >> 
>> >> 
>> > 
>> > -- 
>> > James W. MacDonald, M.S.
>> > Biostatistician
>> > Douglas Lab
>> > University of Michigan
>> > Department of Human Genetics
>> > 5912 Buhl
>> > 1241 E. Catherine St.
>> > Ann Arbor MI 48109-5618
>> > 734-615-7826
>> > 
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> > 
>> > 
>> 
>> -- 
>> View this message in context: 
> http://www.nabble.com/data-frame-is-killing-me%
>> 21-help-tp26015079p26029667.html
>> Sent from the R help mailing list archive at Nabble.com.
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://www.nabble.com/data-frame-is-killing-me%21-help-tp26015079p26063206.html
Sent from the R help mailing list archive at Nabble.com.