[BioC] [R] help with linear model

Petr PIKAL petr.pikal at precheza.cz
Mon Oct 26 12:31:27 CET 2009


r-help-bounces at r-project.org napsal dne 26.10.2009 11:31:26:

> Thank you all for your replies. I have tried transposing my data and 
before
> but I did not mention it because I was getting the same error. In the
> present case though it worked because I put
> >lm1=lm(*norm~*.,data=t(data))
> instead of
> >lm1=lm(*fm1*, data=t(data))
> where *fm1=norm~cols...*

There shall not be any difference. I suspect that your formula definition 
has superfluous commas and/or t(data) change names which you suppose to be 
e.g. 206427_s_at but it can not be valid name.

look at

head(t(data))

how names are changed. You need to change your formula according to names.

Regards
Petr



> I actually didn't know that there exists such a difference between 
norm~cols
> and norm~.
> I wonder why...
> 
> Thank you all again!
> Best,
> Eleni
> 
> On Mon, Oct 26, 2009 at 12:24 PM, Petr PIKAL <petr.pikal at precheza.cz> 
wrote:
> 
> > Hi
> >
> >
> > r-help-bounces at r-project.org napsal dne 26.10.2009 10:48:51:
> >
> > > Dear list,
> > >
> > > I have been searching for a week to fit a simple linear model to my
> > data. I
> > > have looked into the previous posts but I haven't found anything
> > relevant to
> > > my problem. I guess it is something simple...I just cannot see it.
> > > I have the following data frame, named "data", which is a subset of 
a
> > > microarray experiment. The columns are the samples and the rows are 
the
> > > probes. I binded the first line, called "norm", which represents the
> > > estimated output. I want to create a linear model which shows the
> > > relationship between the gene expressions (rows) and the output 
(norm).
> > >
> > >  *data*
> > >             GSM276723.CEL GSM276724.CEL GSM276725.CEL GSM276726.CEL
> > > norm             0.897000      0.590000      0.683000      0.949000
> > > 206427_s_at      5.387205      6.036506      8.824783     10.864122
> > > 205338_s_at      6.454779     13.143095      6.123212     12.726562
> > > 209848_s_at      6.703062      7.783330     12.175654      9.339651
> > > 205694_at        5.894131      5.794516     12.876555     11.534664
> > > 201909_at       12.616538     12.913255     12.275182     12.767743
> > > 208894_at       13.049286      9.317874     12.873516     13.527182
> > > 216512_s_at      6.324789     12.783791      6.216932     12.013404
> > > 205337_at        6.175940     12.158796      6.117519     12.041078
> > > 201850_at        6.633013      6.465900      6.535434      7.749985
> > > 210982_s_at     12.444791      8.597388     12.197696     12.963449
> > >             GSM276727.CEL GSM276728.CEL GSM276729.CEL GSM276731.CEL
> > > norm             0.302000      0.597000      0.270000      0.530000
> > > 206427_s_at      5.690357      8.014055     13.034753      5.493977
> > > 205338_s_at      5.757048      7.706341     13.258410      5.562588
> > > 209848_s_at      6.461028      7.036515     13.633649      5.874098
> > > 205694_at        5.519552      5.297107      6.498811      5.146150
> > > 201909_at       12.814454     11.592632      6.594229      6.650796
> > > 208894_at       13.835359     13.028096      5.839909      6.045578
> > > 216512_s_at      6.033096      7.273650     12.669054      5.946932
> > > 205337_at        5.879028      7.381713     12.633829      5.379559
> > > 201850_at        9.684397      6.560014      8.523229      6.573052
> > > 210982_s_at     13.342729     12.470517      5.903681      5.658115
> > >             GSM276732.CEL GSM276735.CEL GSM276736.CEL GSM276737.CEL
> > > norm              0.43400      0.647000      0.113000      1.000000
> > > 206427_s_at      12.80257      5.645002      6.519554     13.572480
> > > 205338_s_at      13.38057      5.804107     11.090690     14.024922
> > > 209848_s_at      13.27718      6.490851      9.784199     14.101162
> > > 205694_at        11.37717      5.802105      7.944963     14.060492
> > > 201909_at        13.24126     12.263899     12.578315      6.443491
> > > 208894_at        12.29916      7.563361      9.971493      7.094214
> > > 216512_s_at      13.00303      5.905789     10.512761     13.647573
> > > 205337_at        12.63560      5.430138     10.707242     13.020312
> > > 201850_at        12.71874      6.275480      6.987962     12.354580
> > > 210982_s_at      11.53559      7.225199      9.322706      6.617615
> > >             GSM276738.CEL GSM276739.CEL GSM276740.CEL GSM276742.CEL
> > > norm              0.35700      0.967000      0.823000      1.000000
> > > 206427_s_at      13.33764     13.607918     13.190551     12.387189
> > > 205338_s_at      13.65492     12.812950     12.237476     12.912605
> > > 209848_s_at      13.48525     13.435389     13.851347     12.540495
> > > 205694_at         7.70928     10.045331     13.391456     11.103841
> > > 201909_at        12.47093     11.937344      6.631023      7.160071
> > > 208894_at        12.20508      8.892181      6.478889      5.927860
> > > 216512_s_at      13.42313     12.151691     11.620552     12.341763
> > > 205337_at        12.67544     12.036528     11.641203     12.275845
> > > 201850_at        11.85481     13.172666     12.964316     12.156142
> > > 210982_s_at      11.49940      8.380404      6.121762      5.921634
> > >             GSM276743.CEL GSM276744.CEL GSM276745.CEL GSM276747.CEL
> > > norm             0.899000      0.927000      0.754000      0.437000
> > > 206427_s_at     12.665097     12.604673     11.446630     13.000295
> > > 205338_s_at     13.261141     12.448096     13.185698     12.510952
> > > 209848_s_at     13.396711     13.882529     13.040600     12.984137
> > > 205694_at       10.888474      7.094063      8.630120     12.321685
> > > 201909_at       12.100560      6.666787     12.330600      6.572282
> > > 208894_at        7.741437      8.348155     10.106442      6.009902
> > > 216512_s_at     12.830373     11.504074     12.300163     11.525958
> > > 205337_at       12.264569     11.676281     11.940917     11.618351
> > > 201850_at       11.055564     12.202366      7.327056     12.853055
> > > 210982_s_at      7.285289      8.129298      9.577032      5.924993
> > >             GSM276748.CEL GSM276752.CEL GSM276754.CEL GSM276756.CEL
> > > norm             0.321000      0.620000      0.155000      0.946000
> > > 206427_s_at      9.081283     11.446978      8.191261     13.192507
> > > 205338_s_at     13.737773     13.698520     12.983830     10.948681
> > > 209848_s_at     13.234025     12.956672     10.644642     13.176656
> > > 205694_at        7.953865      7.397013      7.170732     13.618932
> > > 201909_at       12.533684      7.049442      6.804030      7.135974
> > > 208894_at       11.868729      8.558455      6.629858      6.850639
> > > 216512_s_at     13.589290     12.781853     12.060414     10.143297
> > > 205337_at       13.084386     12.442617     12.104849     10.364035
> > > 201850_at        6.615453      8.104145      7.058739      6.514298
> > > 210982_s_at     11.058085      7.891520      6.516261      6.532226
> > >             GSM276758.CEL GSM276759.CEL
> > > norm             0.767000      0.218000
> > > 206427_s_at      5.742074     11.232337
> > > 205338_s_at      6.375289     13.406557
> > > 209848_s_at      6.226996      6.835458
> > > 205694_at        5.864042     11.218719
> > > 201909_at        6.907489      7.316435
> > > 208894_at       12.596987     12.408412
> > > 216512_s_at      6.308256     12.318892
> > > 205337_at        6.063775     12.389912
> > > 201850_at        6.816491      6.602764
> > > 210982_s_at     11.985288     11.853911
> > >
> > > *What I did is the following:*
> > > >fm1=as.formula((norm) ~ "206427_s_at" + "205338_s_at" + 
"209848_s_at" +
> > > "205694_at" + "201909_at" + "208894_at" + "216512_s_at" + 
"205337_at" +
> > > "201850_at" + "210982_s_at")
> > > >lm1=lm(fm1,data1new)
> > >
> > > And I receive the following error:
> > > Error in terms.formula(formula, data = data) :
> > >   invalid model formula in ExtractVars
> > >
> > >
> > > *I have also tried:*
> > > >cols=rownames(data3)  %%%%Where data3 is the same data frame with 
data
> > > above, but without the "norm" row binded yet
> > > thus: > cols
> > >  [1] "206427_s_at" "205338_s_at" "209848_s_at" "205694_at" 
"201909_at"
> > >  [6] "208894_at"   "216512_s_at" "205337_at"   "201850_at" 
"210982_s_at"
> > >
> > > > lm1=lm(fm1,data1new)
> > >
> > > and in this case Ireceive the following error:
> > > Error in model.frame.default(formula = fm1, data = data1new,
> > > drop.unused.levels = TRUE) :
> > > variable lengths differ (found for 'cols')
> > >
> > > Could anyone help me with this?
> >
> > Usual expectation is that data are arranged columnwise. Each column is 
a
> > variable and each row is an observation. So you shall transform your 
data
> > to this form e.g. by
> >
> > t(yourdata).
> >
> > Other issue can be if your data are really numeric what you can test 
by
> >
> > str(yourdata)
> >
> > which shall show a structure of your data.
> > If everything is OK than
> >
> > lm(norm ~ . , data = data1new) shall produce linear model of norm on 
all
> > other columns in data frame data1new)
> >
> > Regards
> > Petr
> >
> > >
> > > Thank you very much in advance,
> > > Eleni
> > >
> > >    [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the Bioconductor mailing list