[R] data frame is killing me! help
Petr PIKAL
petr.pikal at precheza.cz
Mon Oct 26 08:49:02 CET 2009
Hi
> data(gasoline)
> str(gasoline)
'data.frame': 60 obs. of 2 variables:
$ octane: num 85.3 85.2 88.5 83.4 87.9 ...
$ NIR : AsIs [1:60, 1:401] -0.050193 -0.044227 -0.046867 -0.046705
-0.050859 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr "1" "2" "3" "4" ...
.. ..$ : chr "900 nm" "902 nm" "904 nm" "906 nm" ...
> str(gasoline$NIR)
AsIs [1:60, 1:401] -0.050193 -0.044227 -0.046867 -0.046705 -0.050859 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:60] "1" "2" "3" "4" ...
..$ : chr [1:401] "900 nm" "902 nm" "904 nm" "906 nm" ...
> is.matrix(gasoline$NIR)
[1] TRUE
so the second element of gasoline data frame is a matrix
> ?AsIs
> df<-data.frame(x=1:5, I(matrix(rnorm(10), 5,2)))
> df
x matrix.rnorm.10...5..2..1 matrix.rnorm.10...5..2..2
1 1 0.187703.... 0.213312....
2 2 -0.66264.... -0.47941....
3 3 -0.82334.... -0.04324....
4 4 -0.37255.... 0.883027....
5 5 -0.28700.... -1.03431....
> str(df)
'data.frame': 5 obs. of 2 variables:
$ x : int 1 2 3 4 5
$ matrix.rnorm.10...5..2.: AsIs [1:5, 1:2] 0.187703.... -0.66264....
-0.82334.... -0.37255.... -0.28700.... ...
>
Regards
Petr
r-help-bounces at r-project.org napsal dne 23.10.2009 18:43:56:
>
> I have read that one ,I want to this method to be used to my data.but I
donot
> know how to put my data into R.
>
> James W. MacDonald wrote:
> >
> >
> >
> > bbslover wrote:
> >>
> >>
> >> Steve Lianoglou-6 wrote:
> >>> Hi,
> >>>
> >>> On Oct 22, 2009, at 2:35 PM, bbslover wrote:
> >>>
> >>>> Usage
> >>>> data(gasoline)
> >>>> Format
> >>>> A data frame with 60 observations on the following 2 variables.
> >>>> octane
> >>>> a numeric vector. The octane number.
> >>>> NIR
> >>>> a matrix with 401 columns. The NIR spectrum
> >>>>
> >>>> and I see the gasoline data to see below
> >>>> NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.1694 nm
NIR.1696
> >>>> nm
> >>>> NIR.1698 nm NIR.1700 nm
> >>>> 1 1.242645 1.250789 1.246626 1.250985 1.264189 1.244678 1.245913
> >>>> 1.221135
> >>>> 2 1.189116 1.223242 1.253306 1.282889 1.215065 1.225211 1.227985
> >>>> 1.198851
> >>>> 3 1.198287 1.237383 1.260979 1.276677 1.218871 1.223132 1.230321
> >>>> 1.208742
> >>>> 4 1.201066 1.233299 1.262966 1.272709 1.211068 1.215044 1.232655
> >>>> 1.206696
> >>>> 5 1.259616 1.273713 1.296524 1.299507 1.226448 1.230718 1.232864
> >>>> 1.202926
> >>>> 6 1.24109 1.262138 1.288401 1.291118 1.229769 1.227615 1.22763
> >>>> 1.207576
> >>>> 7 1.245143 1.265648 1.274731 1.292441 1.218317 1.218147 1.222273
> >>>> 1.200446
> >>>> 8 1.222581 1.245782 1.26002 1.290305 1.221264 1.220265 1.227947
> >>>> 1.188174
> >>>> 9 1.234969 1.251559 1.272416 1.287405 1.211995 1.213263 1.215883
> >>>> 1.196102
> >>>>
> >>>> look at this NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.
> >>>> 1694 nm
> >>>> NIR.1696 nm NIR.1698 nm NIR.1700 nm
> >>>>
> >>>> how can I add letters NIR to my variable, because my 600
> >>>> independents never
> >>>> have NIR as the prefix. however, it is needed to model the plsr.
for
> >>>> example aa=plsr(y~NIR, data=data ,....), the prefix NIR is
> >>>> necessary, how
> >>>> can I do with it?
> >>> I'm not really sue that I'm getting you, but if your problem is that
> >>> the column names of your data.frame don't match the variable names
> >>> you'd like to use in your formula, just change the colnames of your
> >>> data.frame to match your formula.
> >>>
> >>> BTW - I have no idea where to get this gasoline data set, so I'm
just
> >>> imagining:
> >>>
> >>> eg.
> >>> colnames(gasoline) <- c('put', 'the', 'variable', 'names', 'that',
> >>> 'you', 'want', 'here')
> >>>
> >>> -steve
> >>>
> >>> --
> >>> Steve Lianoglou
> >>> Graduate Student: Computational Systems Biology
> >>> | Memorial Sloan-Kettering Cancer Center
> >>> | Weill Medical College of Cornell University
> >>> Contact Info: http://cbio.mskcc.org/~lianos/contact
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>>
> >>
> >> thanks for you. but the numbers of indenpendence are so many, it is
not
> >> easy
> >> to identify them one by one, is there some better way?
> >
> > You don't need to identify anything. What you need to do is read the
> > help page for the function you want to use, so you (at the very least)
> > know how to use the function.
> >
> > > library(pls)
> > > data(gasoline)
> > > fit <- plsr(octane~NIR, data=gasoline, validation = "CV")
> > > summary(fit)
> > Data: X dimension: 60 401
> > Y dimension: 60 1
> > Fit method: kernelpls
> > Number of components considered: 53
> >
> > VALIDATION: RMSEP
> > Cross-validated using 10 random segments.
> > (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6
comps
> > CV 1.543 1.372 0.3827 0.2522 0.2347 0.2455 0.2281
> > adjCV 1.543 1.367 0.3740 0.2497 0.2360 0.2407 0.2243
> > 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13
comps
> > CV 0.2311 0.2352 0.2455 0.2534 0.2737 0.2814 0.2832
> > adjCV 0.2257 0.2303 0.2395 0.2473 0.2646 0.2705 0.2726
> > 14 comps 15 comps 16 comps 17 comps 18 comps 19 comps 20
> > comps
> > CV 0.2913 0.2932 0.2985 0.3137 0.3289 0.3323
> > 0.3391
> > adjCV 0.2808 0.2821 0.2863 0.3008 0.3141 0.3172
> > 0.3228
> > 21 comps 22 comps 23 comps 24 comps 25 comps 26 comps 27
> > comps
> > CV 0.3476 0.3384 0.3316 0.3213 0.3155 0.3118
> > 0.3062
> > adjCV 0.3307 0.3217 0.3154 0.3057 0.3002 0.2964
> > 0.2908
> > 28 comps 29 comps 30 comps 31 comps 32 comps 33 comps 34
> > comps
> > CV 0.3033 0.3034 0.3074 0.3083 0.3094 0.3087
> > 0.3105
> > adjCV 0.2881 0.2881 0.2917 0.2926 0.2936 0.2929
> > 0.2946
> > 35 comps 36 comps 37 comps 38 comps 39 comps 40 comps 41
> > comps
> > CV 0.3108 0.3106 0.3105 0.3104 0.3104 0.3105
> > 0.3105
> > adjCV 0.2949 0.2947 0.2946 0.2945 0.2945 0.2945
> > 0.2946
> > 42 comps 43 comps 44 comps 45 comps 46 comps 47 comps 48
> > comps
> > CV 0.3105 0.3105 0.3105 0.3105 0.3105 0.3105
> > 0.3105
> > adjCV 0.2946 0.2946 0.2946 0.2946 0.2946 0.2946
> > 0.2946
> > 49 comps 50 comps 51 comps 52 comps 53 comps
> > CV 0.3105 0.3105 0.3105 0.3105 0.3105
> > adjCV 0.2946 0.2946 0.2946 0.2946 0.2946
> >
> > TRAINING: % variance explained
> > 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps
> > 8 comps
> > X 70.97 78.56 86.15 95.4 96.12 96.97 97.32
> > 98.1
> > octane 31.90 94.66 97.71 98.0 98.68 98.93 99.06
> > 99.1
> > 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps 15
> > comps
> > X 98.32 98.71 98.84 99.00 99.21 99.46
> > 99.52
> > octane 99.20 99.24 99.36 99.44 99.49 99.51
> > 99.58
> > 16 comps 17 comps 18 comps 19 comps 20 comps 21 comps 22
> > comps
> > X 99.57 99.64 99.68 99.76 99.78 99.82
> > 99.84
> > octane 99.65 99.69 99.78 99.81 99.86 99.89
> > 99.92
> > 23 comps 24 comps 25 comps 26 comps 27 comps 28 comps 29
> > comps
> > X 99.88 99.91 99.92 99.93 99.94 99.95
> > 99.96
> > octane 99.93 99.94 99.95 99.97 99.98 99.99
> > 99.99
> > 30 comps 31 comps 32 comps 33 comps 34 comps 35 comps 36
> > comps
> > X 99.96 99.97 99.97 99.98 99.98 99.98
> > 99.98
> > octane 99.99 100.00 100.00 100.00 100.00 100.00
> > 100.00
> > 37 comps 38 comps 39 comps 40 comps 41 comps 42 comps 43
> > comps
> > X 99.99 99.99 99.99 99.99 100 100
> > 100
> > octane 100.00 100.00 100.00 100.00 100 100
> > 100
> > 44 comps 45 comps 46 comps 47 comps 48 comps 49 comps 50
> > comps
> > X 100 100 100 100 100 100
> > 100
> > octane 100 100 100 100 100 100
> > 100
> > 51 comps 52 comps 53 comps
> > X 100 100 100
> > octane 100 100 100
> >
> >
> >>
> >>
> >
> > --
> > James W. MacDonald, M.S.
> > Biostatistician
> > Douglas Lab
> > University of Michigan
> > Department of Human Genetics
> > 5912 Buhl
> > 1241 E. Catherine St.
> > Ann Arbor MI 48109-5618
> > 734-615-7826
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
> --
> View this message in context:
http://www.nabble.com/data-frame-is-killing-me%
> 21-help-tp26015079p26029667.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list