[R] matlab/gauss code in R

Sebastian Kruk residuo.solow at gmail.com
Mon Jun 25 01:27:20 CEST 2007


Hi all!

I would like to import a matlab or gauss code to R.

Could you help me?

Bye,

Sebastián.

2007/6/23, r-help-request en stat.math.ethz.ch <r-help-request en stat.math.ethz.ch>:
> Send R-help mailing list submissions to
>        r-help en stat.math.ethz.ch
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        https://stat.ethz.ch/mailman/listinfo/r-help
> or, via email, send a message with subject or body 'help' to
>        r-help-request en stat.math.ethz.ch
>
> You can reach the person managing the list at
>        r-help-owner en stat.math.ethz.ch
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of R-help digest..."
>
>
> Today's Topics:
>
>   1. what is "better" when combining data frames? merge vs. rbind
>      & cbind (Thomas Pujol)
>   2. extract index during execution of sapply (Christian Bieli)
>   3. multiple return (Manuele Pesenti)
>   4. Re: what is "better" when combining data frames? merge vs.
>      rbind &   cbind (Duncan Murdoch)
>   5. Re: multiple return (Mahbub Latif)
>   6. Re: how to create cumulative histogram from two   independent
>      variables? (Jim Lemon)
>   7. Re: Adding exponents (superscript format) to a plot (John Kane)
>   8. Re: using lme on multiple datasets in one shot (Douglas Bates)
>   9. Re: abline plots at wrong abscissae after boxplot (S Ellison)
>  10. Boxplot issues (S Ellison)
>  11. Re: help on the use of ldBand (Frank E Harrell Jr)
>  12. Re: multiple return (ONKELINX, Thierry)
>  13. Re: extract index during execution of sapply (Martin Morgan)
>  14. vectorize a function (Robin Hankin)
>  15. Re: Tools For Preparing Data For Analysis (Kevin E. Thorpe)
>  16. Re: Data consistency checks in functions (Kuhn, Max)
>  17. Re: vectorize a function (Christos Hatzis)
>  18. Re: extract index during execution of sapply (Ben Bolker)
>  19. Re: extract index during execution of sapply (Thomas Lumley)
>  20. fitCopula (Oden, Kevin)
>  21. how to ave this? (Weiwei Shi)
>  22. fitCopula (Oden, Kevin)
>  23. Re: how to ave this? (Weiwei Shi)
>  24. Re: Result depends on order of factors in unbalanced designs
>      (lme, anova)? (Prof Brian Ripley)
>  25. Re: Tools For Preparing Data For Analysis (Christophe Pallier)
>  26. Re: interpretation of F-statistics in GAMs (Simon Wood)
>  27. Re: Boxplot issues (Martin Maechler)
>  28. Re: Overlaying lattice graphs (continued) (S?bastien)
>  29. Matrix library, CHOLMOD error: problem too large (Jose Quesada )
>  30. Re: Matrix library, CHOLMOD error: problem too large
>      (Duncan Murdoch)
>  31. Imputing missing values in time series (Horace Tso)
>  32. Re: Imputing missing values in time series (Erik Iverson)
>  33. Re: Imputing missing values in time series (Horace Tso)
>  34. Re: Switching X-axis and Y-axis for histogram (hadley wickham)
>  35. Re: Imputing missing values in time series (Leeds, Mark (IED))
>  36. Re: Imputing missing values in time series (Horace Tso)
>  37. Re: Overlaying lattice graphs (continued) (hadley wickham)
>  38. Re: Overlaying lattice graphs (continued) (Deepayan Sarkar)
>  39. Re: Stacked barchart color (hadley wickham)
>  40. Re: Visualize quartiles of plot line (hadley wickham)
>  41. "heatmap" color still a spectrum for binary outcomes?
>      (Patrick Ayscue)
>  42. Barchart legend position (Spilak,Jacqueline [Edm])
>  43. Lattice: hiding only some strips (Michael Hoffman)
>  44. Re: Overlaying lattice graphs (continued) (S?bastien)
>  45. Bayesian Networks (Mario.Carvalho.Fernandes en bpi.pt)
>  46. connecting to running process possible? (Charles Cosse)
>  47. Re: Overlaying lattice graphs (continued) (hadley wickham)
>  48. RServe (java2R) question (Guanrao Chen)
>  49. Re: Lattice: hiding only some strips (Deepayan Sarkar)
>  50. interaction contrast (szhan en uoguelph.ca)
>  51. Re: Barchart legend position (Deepayan Sarkar)
>  52. How to run "mathematica" or "c" programs in R? (Zhang Jian)
>  53. Re: Imputing missing values in time series (Horace Tso)
>  54. Asteriscs in a plot to represent approximate size of p-values
>      (Judith Flores)
>  55. merging more than two data frames (Andrew Yee)
>  56. speed issues / pros & cons: dataframe vs. matrix (Thomas Pujol)
>  57. Re: Matrix *package*, CHOLMOD error: problem too large
>      (Martin Maechler)
>  58. Re: speed issues / pros & cons: dataframe vs. matrix
>      (Duncan Murdoch)
>  59. Re: Lattice: hiding only some strips (Michael Hoffman)
>  60. Re: Lattice: hiding only some strips (deepayan.sarkar en gmail.com)
>  61. Re: how to ave this? (Mike Meredith)
>  62. Re: how to ave this? (Dimitris Rizopoulos)
>  63. Re: merging more than two data frames (Mark Wardle)
>  64. latex of ftable (Hmisc?) (Dieter Menne)
>  65. Re: latex of ftable (Hmisc?) (Chuck Cleland)
>  66. Re: Data consistency checks in functions (Mike Meredith)
>  67. Re: latex of ftable (Hmisc?) (Dieter Menne)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 21 Jun 2007 12:36:16 -0700 (PDT)
> From: Thomas Pujol <thomas.pujol en yahoo.com>
> Subject: [R] what is "better" when combining data frames? merge vs.
>        rbind & cbind
> To: r-help en stat.math.ethz.ch
> Message-ID: <510958.56904.qm en web59308.mail.re1.yahoo.com>
> Content-Type: text/plain
>
> I often need to "combine" data frames, sometimes "vertically" and other times "horizontally".
>
> When it "better" to use merge? When is it better to use rbind or cbind?
>
> Are there clear pros and cons of each approach?
>
>
> ---------------------------------
>
>        [[alternative HTML version deleted]]
>
>
>
> ------------------------------
>
> Message: 2
> Date: Fri, 22 Jun 2007 12:31:39 +0200
> From: Christian Bieli <christian.bieli en unibas.ch>
> Subject: [R] extract index during execution of sapply
> To: R help list <r-help en stat.math.ethz.ch>
> Message-ID: <467BA50B.60408 en unibas.ch>
> Content-Type: text/plain; charset=ISO-8859-15; format=flowed
>
> Hi there
> During execution of sapply I want to extract the number of times the
> function given to supply has been executed. I came up with:
>
> mylist <- list(a=3,b=6,c=9)
> sapply(mylist,function(x)as.numeric(gsub("[^0-9]","",deparse(substitute(x)))))
>
> This works fine, but looks quite ugly. I'm sure that there's a more
> elegant way to do this.
>
> Any suggestion?
>
> Christian
>
>
>
> ------------------------------
>
> Message: 3
> Date: Fri, 22 Jun 2007 12:37:37 +0200
> From: Manuele Pesenti <amicogodzilla en bruttocarattere.org>
> Subject: [R] multiple return
> To: r-help en stat.math.ethz.ch
> Message-ID: <200706221237.37479.amicogodzilla en bruttocarattere.org>
> Content-Type: text/plain;  charset="us-ascii"
>
> Dear User,
> what's the correct way to obtain a multiple return from a function?
>
> for example creating the simple function:
>
> somma <- function (a, b) {
>  c <- a+b
>  return (a, b, c)
> }
>
> when I call it, it runs but returns the following output:
>
> > somma(5, 7)
> $a
> [1] 5
>
> $b
> [1] 7
>
> $c
> [1] 12
>
> Warning message:
> return multi-argomento sono deprecati in: return(a, b, c)
>
> i.e. multi-return is deprecated...
>
> thanks a lot
> best regards
>        Manuele
>
> --
> Manuele Pesenti
>        manuele en inventati.org
>        amicogodzilla en jabber.linux.it
>        http://mpesenti.polito.it
>
>
>
> ------------------------------
>
> Message: 4
> Date: Fri, 22 Jun 2007 06:40:23 -0400
> From: Duncan Murdoch <murdoch en stats.uwo.ca>
> Subject: Re: [R] what is "better" when combining data frames? merge
>        vs. rbind &     cbind
> To: Thomas Pujol <thomas.pujol en yahoo.com>
> Cc: r-help en stat.math.ethz.ch
> Message-ID: <467BA717.7080202 en stats.uwo.ca>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> On 21/06/2007 3:36 PM, Thomas Pujol wrote:
> > I often need to "combine" data frames, sometimes "vertically" and other times "horizontally".
> >
> > When it "better" to use merge? When is it better to use rbind or cbind?
> >
> > Are there clear pros and cons of each approach?
>
> If rbind or cbind work, use them.  They are much simpler, but much less
> flexible.
>
> Duncan Murdoch
>
>
>
> ------------------------------
>
> Message: 5
> Date: Fri, 22 Jun 2007 17:20:01 +0600
> From: "Mahbub Latif" <mlatif en isrt.ac.bd>
> Subject: Re: [R] multiple return
> To: amicogodzilla en bruttocarattere.org
> Cc: r-help en stat.math.ethz.ch
> Message-ID:
>        <5faba43d0706220420m6f34f831l7353b680925dae34 en mail.gmail.com>
> Content-Type: text/plain
>
> one way --
>
>
> somma <- function (a, b) {
>  c <- a+b
>  return (list(a=a, b=a, c=c))
> }
>
> Mahbub.
>
> On 6/22/07, Manuele Pesenti <amicogodzilla en bruttocarattere.org> wrote:
> >
> > Dear User,
> > what's the correct way to obtain a multiple return from a function?
> >
> > for example creating the simple function:
> >
> > somma <- function (a, b) {
> >   c <- a+b
> >   return (a, b, c)
> > }
> >
> > when I call it, it runs but returns the following output:
> >
> > > somma(5, 7)
> > $a
> > [1] 5
> >
> > $b
> > [1] 7
> >
> > $c
> > [1] 12
> >
> > Warning message:
> > return multi-argomento sono deprecati in: return(a, b, c)
> >
> > i.e. multi-return is deprecated...
> >
> > thanks a lot
> > best regards
> >         Manuele
> >
> > --
> > Manuele Pesenti
> >         manuele en inventati.org
> >         amicogodzilla en jabber.linux.it
> >         http://mpesenti.polito.it
> >
> > ______________________________________________
> > R-help en stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> A H M Mahbub Latif, PhD
> Assistant Professor
> Applied Statistics
> Institute of Statistical Research and Training
> University of Dhaka, Dhaka 1000, Bangladesh
> web : http://www.isrt.ac.bd/mlatif
> ----
> Computers are like airconditioners: They stop working properly if you open
> windows.
>
>        [[alternative HTML version deleted]]
>
>
>
> ------------------------------
>
> Message: 6
> Date: Fri, 22 Jun 2007 21:36:29 +1000
> From: Jim Lemon <jim en bitwrit.com.au>
> Subject: Re: [R] how to create cumulative histogram from two
>        independent     variables?
> To: Jose Borreguero <borreguero en gmail.com>, R-help en stat.math.ethz.ch
> Message-ID: <467BB43D.3060004 en bitwrit.com.au>
> Content-Type: text/plain; charset=us-ascii; format=flowed
>
> Jose Borreguero wrote:
> > Hi all,
> > I am extremely newbie to R. Can anybody jump-start me with any clues as to
> > how do I get a cumulative histogram from two independent variables,
> > cumhist(X,Y) ?
> > -jose
> >
> Hi Jose,
>
> Is this something like you want?
>
> var1<-sample(1:10,100,TRUE)
> var2<-sample(1:10,100,TRUE)
> barplot(rbind(hist(var1,plot=FALSE)$counts,hist(var2,plot=FALSE)$counts))
>
> Jim
>
>
>
> ------------------------------
>
> Message: 7
> Date: Fri, 22 Jun 2007 07:44:23 -0400 (EDT)
> From: John Kane <jrkrideau en yahoo.ca>
> Subject: Re: [R] Adding exponents (superscript format) to a plot
> To: Judith Flores <juryef en yahoo.com>, RHelp <r-help en stat.math.ethz.ch>
> Message-ID: <466583.57100.qm en web32802.mail.mud.yahoo.com>
> Content-Type: text/plain; charset=iso-8859-1
>
> # Using expression to add superscipts to the labels
>
> vec=c(1,10,100,1000,10000,100000,1000000,10000000)
>  plot(vec,vec,log="xy", axes=F)
>  axis(1, at=10^c(0,2,4,6), labels=expression(1, 10^2,
> 10^4, 10^6))
>  axis(2, at=10^c(0,2,4,6), labels=expression(1, 10^2,
> 10^4, 10^6), las=1)
>  box()
>
> --- Judith Flores <juryef en yahoo.com> wrote:
>
> > Hi,
> >
> >    I need to add exponents to a label in one of the
> > axes of a plot, how can I do this?
> >
> > Thank you,
> >
> > Judith
> >
> >
> >
> >
> ____________________________________________________________________________________
> > Food fight? Enjoy some healthy debate
> >
> > ______________________________________________
> > R-help en stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> > reproducible code.
> >
>
>
>
> ------------------------------
>
> Message: 8
> Date: Fri, 22 Jun 2007 06:45:26 -0500
> From: "Douglas Bates" <bates en stat.wisc.edu>
> Subject: Re: [R] using lme on multiple datasets in one shot
> To: "maitra en iastate.edu" <maitra en iastate.edu>
> Cc: R-help <r-help en stat.math.ethz.ch>
> Message-ID:
>        <40e66e0b0706220445j5b5d74d7nb46237f83f0b8432 en mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> On 6/21/07, maitra en iastate.edu <maitra en iastate.edu> wrote:
> > Dear list,
>
> > I would like to do a huge number of lme's using the same design matrix
> > (and fixed and random effects). Is it possible to do this efficiently?
> > Doing otherwise is not an option for my example.
>
> > Basically, I am wanting to do the following which is possible using lm:
>
> > X <- matrix(rnorm(50),10,5)
> > Y <- matrix(rnorm(50),10,5)
> > lm(Y~X)
>
> > with lme. Any suggestions?
>
> It would not be easy to do this.  Neither lme nor lmer were designed
> to make this easy to do.  There is a better chance of accomplishing
> this by creating a custom function based on the current lmer but the
> modifications required are non-trivial.
>
> This is a reasonable thing to want to accomplish and I will add it to
> the "To Do" list for lmer.  However it is not something I will be able
> to get to soon.
>
>
>
> ------------------------------
>
> Message: 9
> Date: Fri, 22 Jun 2007 12:49:38 +0100
> From: "S Ellison" <S.Ellison en lgc.co.uk>
> Subject: Re: [R] abline plots at wrong abscissae after boxplot
> To: <r-help en stat.math.ethz.ch>, <bwilfley en tripleringtech.com>
> Message-ID: <s67bc579.034 en tedmail2.lgc.co.uk>
> Content-Type: text/plain; charset=US-ASCII
>
> Boxplot positions and labels are not the same thing.
> You have groups 'called' "2", "3", "4". As factors - which is what bocplot will turn them into -  they will be treated as arbitrary labels and _numbered_ 1:3 (try as.numeric(factor(x)).
>
> So your lm() used 2:4, but your plot (and abline) uses 1:3 for positions and "2" - "4" as labels.
>
> The best option used to be to plot the boxes at positions 2:4. Look at the at= parameter in boxplot.
> But that is now of little help because there is no way of overriding xlim, leaving you no alternative but to reformulate your model with an offset or something.
>
> I will take up the boxplot xlim issue separately on R-dev; it's not the only such.
>
> Steve Ellison.
>
> >>> "Brian Wilfley" <bwilfley en tripleringtech.com> 21/06/2007 22:44:17 >>>
> I'm trying to add lines to a plot originally made with "boxplot", but
> the lines appear in the wrong place.
>
> *******************************************************************
> This email and any attachments are confidential. Any use, co...{{dropped}}
>
>
>
> ------------------------------
>
> Message: 10
> Date: Fri, 22 Jun 2007 13:02:20 +0100
> From: "S Ellison" <S.Ellison en lgc.co.uk>
> Subject: [R] Boxplot issues
> To: <r-help en stat.math.ethz.ch>
> Message-ID: <s67bc871.078 en tedmail2.lgc.co.uk>
> Content-Type: text/plain; charset=US-ASCII
>
> Boxplot and bxp seem to have changed behaviour a bit of late (R 2.4.1). Or maybe I am mis-remembering.
>
> An annoying feature is that while at=3:6 will work, there is no way of overriding the default xlim of 0.5 to n+0.5. That prevents plotting boxes on, for example, interval scales - a useful thing to do at times. I really can see no good reason for bxp to hard-core the xlim=c(0.5, n+0.5) in the function body; it should be a parameter default conditional on horizontal=, not hard coded.
>
> Also, boxplot does not drop empty groups. I'm sure it used to. I know it is good to be able to see where a factor level is unpopulated, but its a nuisance with fractional factorials and some nested or survey problems when many are known to be missing and are of no interest. Irrespective of whether my memory is correct, the option would be useful. How hard can it be to add a 'drop.empty=F' default to boxplot to allow it to switch?
>
> Obviously, these are things I can fix locally. But who 'owns' boxplot so I can provide suggested code to them for later releases?
>
> Steve Ellison
>
>
>
> *******************************************************************
> This email and any attachments are confidential. Any use, co...{{dropped}}
>
>
>
> ------------------------------
>
> Message: 11
> Date: Fri, 22 Jun 2007 07:35:51 -0500
> From: Frank E Harrell Jr <f.harrell en vanderbilt.edu>
> Subject: Re: [R] help on the use of ldBand
> To: Tomas Goicoa <tomas.goicoa en unavarra.es>
> Cc: r-help en stat.math.ethz.ch
> Message-ID: <467BC227.8000701 en vanderbilt.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Tomas Goicoa wrote:
> >   Hi R Users,
> >
> >   I am trying to use the ldBand package. Together
> > with the package, I have downloaded the ld98
> > program (version for windows) as indicated in the
> > help page on ldBand. I did it, but obtained an
> > error message "Error in (head + 1):length(w) :
> > Argument NA/NaN" when I copied the help examples,
> > so it seems that a conection between R and ld98
> > is not well performed in my computer.  Did I put
> > ld98.exe in the wrong place? If so,  where should
> > I put it? Thanks a lot in advance,
> >
> >
> >   Berta Iba?ez Beroiz
>
> Do you mean the Hmisc package?  Do you mean the ldBands function?  Did
> you put ld98.exe in a place that is in your system path as specified in
> the ldBands help file?  And please read the posting guide referenced below.
>
> Frank
>
> >
> > ______________________________________________
> > R-help en stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
> Frank E Harrell Jr   Professor and Chair           School of Medicine
>                      Department of Biostatistics   Vanderbilt University
>
>
>
> ------------------------------
>
> Message: 12
> Date: Fri, 22 Jun 2007 13:27:35 +0200
> From: "ONKELINX, Thierry" <Thierry.ONKELINX en inbo.be>
> Subject: Re: [R] multiple return
> To: <amicogodzilla en bruttocarattere.org>, <r-help en stat.math.ethz.ch>
> Message-ID:
>        <2E9C414912813E4EB981326983E0A104031A9629 en inboexch.inbo.be>
> Content-Type: text/plain;       charset="us-ascii"
>
> Put the return values in a vector or list
>
> somma <- function (a, b) {
>   c <- a+b
>   return (c(a = a, b = b, c = c))
> }
>
> somma(5,7)
>  a  b  c
>  5  7 12
>
>
> somma <- function (a, b) {
>   c <- a+b
>   return (list(a = a, b = b, c = c))
> }
>
> somma(5,7)
> $a
> [1] 5
>
> $b
> [1] 7
>
> $c
> [1] 12
>
> Cheers,
>
> Thierry
> ------------------------------------------------------------------------
> ----
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature
> and Forest
> Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
> methodology and quality assurance
> Gaverstraat 4
> 9500 Geraardsbergen
> Belgium
> tel. + 32 54/436 185
> Thierry.Onkelinx en inbo.be
> www.inbo.be
>
> Do not put your faith in what statistics say until you have carefully
> considered what they do not say.  ~William W. Watt
> A statistical analysis, properly conducted, is a delicate dissection of
> uncertainties, a surgery of suppositions. ~M.J.Moroney
>
>
>
> > -----Oorspronkelijk bericht-----
> > Van: r-help-bounces en stat.math.ethz.ch
> > [mailto:r-help-bounces en stat.math.ethz.ch] Namens Manuele Pesenti
> > Verzonden: vrijdag 22 juni 2007 12:38
> > Aan: r-help en stat.math.ethz.ch
> > Onderwerp: [R] multiple return
> >
> > Dear User,
> > what's the correct way to obtain a multiple return from a function?
> >
> > for example creating the simple function:
> >
> > somma <- function (a, b) {
> >   c <- a+b
> >   return (a, b, c)
> > }
> >
> > when I call it, it runs but returns the following output:
> >
> > > somma(5, 7)
> > $a
> > [1] 5
> >
> > $b
> > [1] 7
> >
> > $c
> > [1] 12
> >
> > Warning message:
> > return multi-argomento sono deprecati in: return(a, b, c)
> >
> > i.e. multi-return is deprecated...
> >
> > thanks a lot
> > best regards
> >       Manuele
> >
> > --
> > Manuele Pesenti
> >       manuele en inventati.org
> >       amicogodzilla en jabber.linux.it
> >       http://mpesenti.polito.it
> >
> > ______________________________________________
> > R-help en stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> ------------------------------
>
> Message: 13
> Date: Fri, 22 Jun 2007 06:20:27 -0700
> From: Martin Morgan <mtmorgan en fhcrc.org>
> Subject: Re: [R] extract index during execution of sapply
> To: Christian Bieli <christian.bieli en unibas.ch>
> Cc: R help list <r-help en stat.math.ethz.ch>
> Message-ID: <6phodj85dno.fsf en gopher4.fhcrc.org>
> Content-Type: text/plain; charset=us-ascii
>
> Christian,
>
> A favorite of mine is to use lexical scope and a 'factory' model:
>
> > fun_factory <- function() {
> +     i <- 0                  # 'state' variable(s), unique to each fun_factory
> +     function(x) {           # fun_factory return value; used as sapply FUN
> +         i <<- i + 1         # <<- assignment finds i
> +         x^i                 # return value of sapply FUN
> +     }
> + }
> >
> > sapply(1:10, fun_factory())
>  [1]           1           4          27         256        3125       46656
>  [7]      823543    16777216   387420489 10000000000
>
>
> Christian Bieli <christian.bieli en unibas.ch> writes:
>
> > Hi there
> > During execution of sapply I want to extract the number of times the
> > function given to supply has been executed. I came up with:
> >
> > mylist <- list(a=3,b=6,c=9)
> > sapply(mylist,function(x)as.numeric(gsub("[^0-9]","",deparse(substitute(x)))))
> >
> > This works fine, but looks quite ugly. I'm sure that there's a more
> > elegant way to do this.
> >
> > Any suggestion?
> >
> > Christian
> >
> > ______________________________________________
> > R-help en stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Martin Morgan
> Bioconductor / Computational Biology
> http://bioconductor.org
>
>
>
> ------------------------------
>
> Message: 14
> Date: Fri, 22 Jun 2007 14:28:07 +0100
> From: Robin Hankin <r.hankin en noc.soton.ac.uk>
> Subject: [R] vectorize a function
> To: RHelp help <r-help en stat.math.ethz.ch>
> Message-ID: <F77DEE3F-E5AA-4A9B-A722-18F7DA006F46 en noc.soton.ac.uk>
> Content-Type: text/plain; charset=US-ASCII; format=flowed
>
> Hello everyone
>
> suppose I have an integer vector "a" of length "n" and
> a symmetric matrix "M" of size n-by-n.
>
> Vector "a" describes a partition of a set of "n" elements
> and matrix M describes a penalty function: row i column
> j represents the penalty if element i and element j
> are in the same partition.
>
> Toy example follows; the real case is much larger
> and I need to evaluate my penalty function many times.
>
> If a <- c(1,1,2,1,3)  then elements 1,2,4 are in the
> same partition; element 3 is in a partition on its own
> and element 5 is in a partition on its own.
>
> The total penalty  can be described by the following (ugly)
> function:
>
> f <- function(a,M){
>   out <- 0
>   for(i in unique(a)){
>     out <- out + sum(M[which(a==i),which(a==i)])
>   }
>   return(out)
> }
>
>
> so with
>
> M <- matrix(rpois(25,3),5,5)
> M <- M+t(M)
> diag(M) <- 0
> a <- c(1,2,1,1,3)
>
> f(a,M) gives the total penalty.
>
>
> QUESTION:  how to rewrite f() so that it has no loop?
>
>
>
>
>
>
> --
> Robin Hankin
> Uncertainty Analyst
> National Oceanography Centre, Southampton
> European Way, Southampton SO14 3ZH, UK
>  tel  023-8059-7743
>
>
>
> ------------------------------
>
> Message: 15
> Date: Fri, 22 Jun 2007 09:56:20 -0400
> From: "Kevin E. Thorpe" <kevin.thorpe en utoronto.ca>
> Subject: Re: [R] Tools For Preparing Data For Analysis
> To: r-help en stat.math.ethz.ch
> Message-ID: <467BD504.4040705 en utoronto.ca>
> Content-Type: text/plain; charset=ISO-8859-1
>
> I am posting to this thread that has been quiet for some time because I
> remembered the following question.
>
> Christophe Pallier wrote:
> > Hi,
> >
> > Can you provide examples of data formats that are problematic to read and
> > clean with R ?
>
> Today I had a data manipulation problem that I don't know how to do in R
> so I solved it with perl.  Since I'm always interested in learning more
> about complex data manipulation in R I am posting my problem in the
> hopes of receiving some hints for doing this in R.
>
> If anyone has nothing better to do than play with other people's data,
> I would be happy to send the row files off-list.
>
> Background:
>
> I have been given data that contains two measurements of left
> ventricular ejection fraction.  One of the methods is echocardiogram
> which sometimes gives a true quantitative value and other times a
> semi-quantitative value.  The desire is to compare echo with the
> other method (MUGA).  In most cases, patients had either quantitative
> or semi-quantitative.  Same patients had both.  The data came
> to me in excel files with, basically, no patient identifiers to link
> the "both" with the semi-quantitative patients (the "both" patients
> were in multiple data sets).
>
> What I wanted to do was extract from the semi-quantitative data file
> those patients with only semi-quantitative.  All I have to link with
> are the semi-quantitative echo and the MUGA and these pairs of values
> are not unique.
>
> To make this more concrete, here are some portions of the raw data.
>
> "Both"
>
> "ID NUM","ECHO","MUGA","Semiquant","Quant"
> "B",12,37,10,12
> "D",13,13,10,13
> "E",13,26,10,15
> "F",13,31,10,13
> "H",15,15,10,15
> "I",15,21,10,15
> "J",15,22,10,15
> "K",17,22,10,17
> "N",17.5,4,10,17.5
> "P",18,25,10,18
> "R",19,25,10,19
>
> Seimi-quantitative
>
> "echo","muga","quant"
> 10,20,0      <-- keep
> 10,20,0      <-- keep
> 10,21,0      <-- remove
> 10,21,0      <-- keep
> 10,24,0      <-- keep
> 10,25,0      <-- remove
> 10,25,0      <-- remove
> 10,25,0      <-- keep
>
> Here is the perl program I wrote for this.
>
> #!/usr/bin/perl
>
> open(BOTH, "quant_qual_echo.csv") || die "Can't open quant_qual_echo.csv";
> # Discard first row;
> $_ = <BOTH>;
> while(<BOTH>) {
>    chomp;
>    ($id, $e, $m, $sq, $qu) = split(/,/);
>    $both{$sq,$m}++;
> }
> close(BOTH);
>
> open(OUT, "> qual_echo_only.csv") || die "Can't open qual_echo_only.csv";
> print OUT "pid,echo,muga,quant\n";
> $pid = 2001;
>
> open(QUAL, "qual_echo.csv") || die "Can't open qual_echo.csv";
> # Discard first row
> $_ = <QUAL>;
> while(<QUAL>) {
>    chomp;
>    ($echo, $muga, $quant) = split(/,/);
>    if ($both{$echo,$muga} > 0) {
>        $both{$echo,$muga}--;
>    }
>    else {
>        print OUT "$pid,$echo,$muga,$quant\n";
>        $pid++;
>    }
> }
> close(QUAL);
> close(OUT);
>
> open(OUT, "> both_echo.csv") || die "Can't open both_echo.csv";
> print OUT "pid,echo,muga,quant\n";
> $pid = 3001;
>
> open(BOTH, "quant_qual_echo.csv") || die "Can't open quant_qual_echo.csv";
> # Discard first row;
> $_ = <BOTH>;
> while(<BOTH>) {
>    chomp;
>    ($id, $e, $m, $sq, $qu) = split(/,/);
>    print OUT "$pid,$sq,$m,0\n";
>    print OUT "$pid,$qu,$m,1\n";
>    $pid++;
> }
> close(BOTH);
> close(OUT);
>
>
> --
> Kevin E. Thorpe
> Biostatistician/Trialist, Knowledge Translation Program
> Assistant Professor, Department of Public Health Sciences
> Faculty of Medicine, University of Toronto
> email: kevin.thorpe en utoronto.ca  Tel: 416.864.5776  Fax: 416.864.6057
>
>
>
> ------------------------------
>
> Message: 16
> Date: Fri, 22 Jun 2007 09:58:39 -0400
> From: "Kuhn, Max" <Max.Kuhn en pfizer.com>
> Subject: Re: [R] Data consistency checks in functions
> To: "Anup Nandialath" <anup_nandialath en yahoo.com>,
>        <r-help en stat.math.ethz.ch>
> Message-ID:
>        <71257D09F114DA4A8E134DEAC70F25D308B9745C en groamrexm03.amer.pfizer.com>
> Content-Type: text/plain;       charset="us-ascii"
>
> Anup,
>
> There are two ways to pass arguments to functions in R: as named
> arguments or by position*.
>
> Users *can* supply arguments that are inconsistent with the order that
> you specify in the function definition, but only if they are used as
> named arguments:
>
>   myfun(X = someMatrix, values = aVector, theta = whatever)
>
> If the arguments are passed by position (as in myfun(beta, val1)), R
> will assume that the first argument is theta, the second is X, etc since
> that is how the function is defined.
>
> My suggestion would be to leave these arguments without defaults and put
> a lot of checks in the function (using is.matrix, is.vector and a few
> that check the content of the data I those objects).
>
>
> * You can also mix the two:
>
>   foo(data, outcome, start = rep(0, 3))
>
>
>
> Max
>
> -----Original Message-----
> From: r-help-bounces en stat.math.ethz.ch
> [mailto:r-help-bounces en stat.math.ethz.ch] On Behalf Of Anup Nandialath
> Sent: Friday, June 22, 2007 12:19 AM
> To: r-help en stat.math.ethz.ch
> Subject: [R] Data consistency checks in functions
>
> Dear friends,
>
> I'm writing a function with three arguments
>
> myfun <- function(theta, X, values)
>
> {
> ....
> ....
> }
>
> in this function, I'm trying to write consistency checks. In order to
> compute the statistic of interest I only need theta and values. The idea
> of having X in there is that, if values is not provided by the user,
> then values is computed from X.
>
> my problem is I'm trying to write consistency checks. For instance if i
> say
>
> output <- myfun(beta, val1), how do I ensure that R reads this as
> passing arguments to "theta" and "values". In other words is it possible
> to bypass X completely if values is provided. Also how is it possible
> for R to recognize the second argument as being values and not X. This
> is important because X is a matrix and values is a vector. Therefore any
> checks using the dimensions of either one will land in trouble if it
> does not correctly capture that.
>
> Thanks in advance
> Sincerely
>
> Anup
>
>
> ---------------------------------
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help en stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ----------------------------------------------------------------------
> LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}}
>
>
>
> ------------------------------
>
> Message: 17
> Date: Fri, 22 Jun 2007 10:17:24 -0400
> From: "Christos Hatzis" <christos en nuverabio.com>
> Subject: Re: [R] vectorize a function
> To: "'Robin Hankin'" <r.hankin en noc.soton.ac.uk>,        "'RHelp help'"
>        <r-help en stat.math.ethz.ch>
> Message-ID:
>        <001b01c7b4d8$0e8129f0$0e010a0a en headquarters.silicoinsights>
> Content-Type: text/plain;       charset="us-ascii"
>
> How about:
>
> sum(sapply(unique(a), function(x) {b <- which(a==x); sum(M[b, b])}))
>
> HTH
> -Christos
>
> Christos Hatzis, Ph.D.
> Nuvera Biosciences, Inc.
> 400 West Cummings Park
> Suite 5350
> Woburn, MA 01801
> Tel: 781-938-3830
> www.nuverabio.com
>
>
> > -----Original Message-----
> > From: r-help-bounces en stat.math.ethz.ch
> > [mailto:r-help-bounces en stat.math.ethz.ch] On Behalf Of Robin Hankin
> > Sent: Friday, June 22, 2007 9:28 AM
> > To: RHelp help
> > Subject: [R] vectorize a function
> >
> > Hello everyone
> >
> > suppose I have an integer vector "a" of length "n" and a
> > symmetric matrix "M" of size n-by-n.
> >
> > Vector "a" describes a partition of a set of "n" elements and
> > matrix M describes a penalty function: row i column j
> > represents the penalty if element i and element j are in the
> > same partition.
> >
> > Toy example follows; the real case is much larger and I need
> > to evaluate my penalty function many times.
> >
> > If a <- c(1,1,2,1,3)  then elements 1,2,4 are in the same
> > partition; element 3 is in a partition on its own and element
> > 5 is in a partition on its own.
> >
> > The total penalty  can be described by the following (ugly)
> > function:
> >
> > f <- function(a,M){
> >    out <- 0
> >    for(i in unique(a)){
> >      out <- out + sum(M[which(a==i),which(a==i)])
> >    }
> >    return(out)
> > }
> >
> >
> > so with
> >
> > M <- matrix(rpois(25,3),5,5)
> > M <- M+t(M)
> > diag(M) <- 0
> > a <- c(1,2,1,1,3)
> >
> > f(a,M) gives the total penalty.
> >
> >
> > QUESTION:  how to rewrite f() so that it has no loop?
> >
> >
> >
> >
> >
> >
> > --
> > Robin Hankin
> > Uncertainty Analyst
> > National Oceanography Centre, Southampton European Way,
> > Southampton SO14 3ZH, UK
> >   tel  023-8059-7743
> >
> > ______________________________________________
> > R-help en stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
>
>
> ------------------------------
>
> Message: 18
> Date: Fri, 22 Jun 2007 13:23:08 +0000 (UTC)
> From: Ben Bolker <bolker en ufl.edu>
> Subject: Re: [R] extract index during execution of sapply
> To: r-help en stat.math.ethz.ch
> Message-ID: <loom.20070622T151700-639 en post.gmane.org>
> Content-Type: text/plain; charset=us-ascii
>
> Christian Bieli <christian.bieli <at> unibas.ch> writes:
>
> >
> > Hi there
> > During execution of sapply I want to extract the number of times the
> > function given to supply has been executed. I came up with:
> >
> > mylist <- list(a=3,b=6,c=9)
> > sapply(mylist,function(x)as.numeric(gsub("[^0-9]","",deparse(substitute(x)))))
> >
> > This works fine, but looks quite ugly. I'm sure that there's a more
> > elegant way to do this.
> >
> > Any suggestion?
> >
> > Christian
> >
>
>   I would love to have an answer to this -- when I run
> into this kind of problem I usually end up using mapply:
> e.g., suppose I have
>
> mylist <- replicate(5,list(x=runif(10),y=runif(10)),simplify=FALSE)
>
> and I want to plot each element in a different color.  I'd like
> to be able to do
>
> plot(0:1,0:1,type="n")
> lapply(mylist,plot,col=i)
>
> but instead I do
>
> mapply(function(x,i) points(x,col=i),mylist,1:5)
>
> would it be too ugly to have a special variable called INDEX
> that could be used within an sapply/lapply statement?
>
>
>
> ------------------------------
>
> Message: 19
> Date: Fri, 22 Jun 2007 07:45:14 -0700 (PDT)
> From: Thomas Lumley <tlumley en u.washington.edu>
> Subject: Re: [R] extract index during execution of sapply
> To: Ben Bolker <bolker en ufl.edu>
> Cc: r-help en stat.math.ethz.ch
> Message-ID:
>        <Pine.LNX.4.64.0706220732470.20743 en homer22.u.washington.edu>
> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>
> On Fri, 22 Jun 2007, Ben Bolker wrote:
>
> > Christian Bieli <christian.bieli <at> unibas.ch> writes:
> >
> >>
> >> Hi there
> >> During execution of sapply I want to extract the number of times the
> >> function given to supply has been executed. I came up with:
> >>
> >> mylist <- list(a=3,b=6,c=9)
> >> sapply(mylist,function(x)as.numeric(gsub("[^0-9]","",deparse(substitute(x)))))
> >>
> >> This works fine, but looks quite ugly. I'm sure that there's a more
> >> elegant way to do this.
> >>
> >> Any suggestion?
> >>
> >> Christian
> >>
> >
> >   I would love to have an answer to this -- when I run
> > into this kind of problem I usually end up using mapply:
> > e.g., suppose I have
> >
> > mylist <- replicate(5,list(x=runif(10),y=runif(10)),simplify=FALSE)
> >
> > and I want to plot each element in a different color.  I'd like
> > to be able to do
> >
> > plot(0:1,0:1,type="n")
> > lapply(mylist,plot,col=i)
> >
> > but instead I do
> >
> > mapply(function(x,i) points(x,col=i),mylist,1:5)
> >
> > would it be too ugly to have a special variable called INDEX
> > that could be used within an sapply/lapply statement?
> >
>
> There are two distinct suggestions here: a variable that says *how many*
> times the function has been called, and a variable that say *which
> element* is currently being operated on.   The first seems undesirable as
> order of evaluation really should not matter in the apply functions.
>
> The second makes more sense but is still a little tricky. AFAICS there is
> no way for lapply() to find out whether FUN will accept an argument INDEX
> without an "unused argument(s)" error, so it can't just be passed as an
> argument.  This suggests having yet another apply function, that would
> assume an INDEX argument and might be written
>   yapply<-function(X,FUN, ...) {
>        index<-seq(length.out=length(X))
>         mapply(FUN,X,INDEX=index,MoreArgs=list(...))
>        }
>
> However, I think it would be preferable in many cases for INDEX to be
> names(X) if it exists, rather than 1:n.  In any case, it is easy  to write
> the function.
>
>        -thomas
>
>
>
> ------------------------------
>
> Message: 20
> Date: Fri, 22 Jun 2007 10:49:08 -0400
> From: "Oden, Kevin" <kevin.oden en wachovia.com>
> Subject: [R] fitCopula
> To: <r-help en stat.math.ethz.ch>
> Message-ID:
>        <E3A68C90920A014CBB128279519B1B35042FEB87 en M1WACA0030I001.cibna.msds.wachovia.net>
>
> Content-Type: text/plain
>
> I  am using R 2.5.0 on windows XP and trying to fit copula.  I see the
> following code works for some users, however my code crashes on the
> chol.   Any suggestions?
>
>
>
> >  mycop <- tCopula(param=0.5, dim=8, dispstr="ex", df=5)
>
> >  x <- rcopula(mycop, 1000)
>
> >  myfit <- fitCopula(x, mycop, c(0.6, 10), optim.control=list(trace=1),
> method="Nelder-Mead")
>
>  Nelder-Mead direct search function minimizer
>
> function value for initial parameters = -1747.582044
>
>  Scaled convergence tolerance is 2.6041e-05
>
> Stepsize computed as 1.000000
>
> Error in chol(x, pivot = FALSE) : the leading minor of order 2 is not
> positive definite
>
>
>
> Kevin D. Oden
>
> e: kevin.oden en wachovia.com <mailto:kevin.oden en wachovia.com>
>
>
>
>
>        [[alternative HTML version deleted]]
>
>
>
> ------------------------------
>
> Message: 21
> Date: Fri, 22 Jun 2007 10:58:35 -0400
> From: "Weiwei Shi" <helprhelp en gmail.com>
> Subject: [R] how to ave this?
> To: "R Help" <R-help en stat.math.ethz.ch>
> Message-ID:
>        <cdf817830706220758r10e93178x971a53e574e9488d en mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hi,
>
> I have a list that looks like this:
> [[1]]
>             fc          tt
> 50   0.07526882 0.000000000
> 100  0.09289617 0.000000000
> 150  0.12359551 0.000000000
>
> [[2]]
>             fc          tt
> 50   0.02040816 0.000000000
> 100  0.03626943 0.005025126
> 150  0.05263158 0.010101010
>
> and I am wondering how to "average" it so that I have one matrix t0 at
> the end, and t0[1,1] = (0.075..+0.0204..)/2
>
> Thanks,
>
> --
> Weiwei Shi, Ph.D
> Research Scientist
> GeneGO, Inc.
>
> "Did you always know?"
> "No, I did not. But I believed..."
> ---Matrix III
>
>
>
> ------------------------------
>
> Message: 22
> Date: Fri, 22 Jun 2007 11:04:15 -0400
> From: "Oden, Kevin" <kevin.oden en wachovia.com>
> Subject: [R] fitCopula
> To: <r-help en stat.math.ethz.ch>
> Message-ID:
>        <E3A68C90920A014CBB128279519B1B35042FEB88 en M1WACA0030I001.cibna.msds.wachovia.net>
>
> Content-Type: text/plain
>
>  I  am using R 2.5.0 on windows XP and trying to fit copula.  I see the
> following code works for some users, however my code crashes on the
> chol.   Any suggestions?
>
>
>
> >  mycop <- tCopula(param=0.5, dim=8, dispstr="ex", df=5)
>
> >  x <- rcopula(mycop, 1000)
>
> >  myfit <- fitCopula(x, mycop, c(0.6, 10), optim.control=list(trace=1),
> method="Nelder-Mead")
>
>  Nelder-Mead direct search function minimizer
>
> function value for initial parameters = -1747.582044
>
>  Scaled convergence tolerance is 2.6041e-05
>
> Stepsize computed as 1.000000
>
> Error in chol(x, pivot = FALSE) : the leading minor of order 2 is not
> positive definite
>
>
>
> Kevin D. Oden
>
> e: kevin.oden en wachovia.com <mailto:kevin.oden en wachovia.com>
>
>
>
>
>        [[alternative HTML version deleted]]
>
>
>
> ------------------------------
>
> Message: 23
> Date: Fri, 22 Jun 2007 11:20:19 -0400
> From: "Weiwei Shi" <helprhelp en gmail.com>
> Subject: Re: [R] how to ave this?
> To: "R Help" <R-help en stat.math.ethz.ch>
> Message-ID:
>        <cdf817830706220820k7db2f82dv3e2a2e7d7a39ff69 en mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> one of my approaches is:
>
> x0 = sapply(mylist, cbind)
>
> and manipulate from x0 (x0[1:nrow(x0)/2, ] correponds to fc and the
> lower part is tt.
>
> but it is not neat way.
>
>
> On 6/22/07, Weiwei Shi <helprhelp en gmail.com> wrote:
> > Hi,
> >
> > I have a list that looks like this:
> > [[1]]
> >              fc          tt
> > 50   0.07526882 0.000000000
> > 100  0.09289617 0.000000000
> > 150  0.12359551 0.000000000
> >
> > [[2]]
> >              fc          tt
> > 50   0.02040816 0.000000000
> > 100  0.03626943 0.005025126
> > 150  0.05263158 0.010101010
> >
> > and I am wondering how to "average" it so that I have one matrix t0 at
> > the end, and t0[1,1] = (0.075..+0.0204..)/2
> >
> > Thanks,
> >
> > --
> > Weiwei Shi, Ph.D
> > Research Scientist
> > GeneGO, Inc.
> >
> > "Did you always know?"
> > "No, I did not. But I believed..."
> > ---Matrix III
> >
>
>
> --
> Weiwei Shi, Ph.D
> Research Scientist
> GeneGO, Inc.
>
> "Did you always know?"
> "No, I did not. But I believed..."
> ---Matrix III
>
>
>
> ------------------------------
>
> Message: 24
> Date: Fri, 22 Jun 2007 16:26:57 +0100 (BST)
> From: Prof Brian Ripley <ripley en stats.ox.ac.uk>
> Subject: Re: [R] Result depends on order of factors in unbalanced
>        designs (lme, anova)?
> To: Karl Knoblick <karlknoblich en yahoo.de>
> Cc: r-help en stat.math.ethz.ch
> Message-ID: <Pine.LNX.4.64.0706221617190.11817 en gannet.stats.ox.ac.uk>
> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>
> 'anova' is rather a misnomer here.  In terms of the description in
> ?anova.lme, you have
>
>      When only one fitted model object is present, a data frame with
>      the sums of squares, numerator degrees of freedom, denominator
>      degrees of freedom, F-values, and P-values for Wald tests for the
>      terms in the model ...
>
> but there are no 'sums of squares' shown.  However, the crucial part of
> that help page is
>
>     type: an optional character string specifying the type of sum of
>           squares to be used in F-tests for the terms in the model. If
>           '"sequential"', the sequential sum of squares obtained by
>           including the terms in the order they appear in the model is
>           used; else, if '"marginal"', the marginal sum of squares
>           obtained by deleting a term from the model at a time is used.
>           This argument is only used when a single fitted object is
>           passed to the function. Partial matching of arguments is
>           used, so only the first character needs to be provided.
>           Defaults to '"sequential"'.
>
> so these are sequential fits (just like anova for an lm fit), and yes,
> sequential fits do in general depend on the sequence of terms.
>
> The issues of interpretation are exactly those of unbalanced linear
> models, and you will find advice on that in many places, e.g. in MASS.
>
>
> On Thu, 21 Jun 2007, Karl Knoblick wrote:
>
> > Dear R-Community!
> >
> > For example I have a study with 4 treatment groups (10 subjects per group) and 4 visits. Additionally, the gender is taken into account. I think - and hope this is a goog idea (!) - this data can be analysed using lme as below.
> >
> > In a balanced design everything is fine, but in an unbalanced design there are differences depending on fitting y~visit*treat*gender or y~gender*visit*treat - at least with anova (see example). Does this make sense? Which ordering might be the correct one?
> >
> > Here the example script:
> > library(nlme)
> > set.seed(123)
> > # Random generation of data:
> > NSubj<-40 # No. of subjects
> > set.seed(1234)
> > id<-factor(rep(c(1:NSubj),4)) # ID of subjects
> > treat<-factor(rep(rep(1:4,each=5),4)) # Treatment 4 Levels
> > gender<-factor(rep(rep(1:2, each=20),4))
> > visit<-factor(rep(1:4, each=NSubj))
> > y<-runif(4*NSubj) # Results
> > # Add effects
> > y<-y+0.01*as.integer(visit)
> > y<-y+0.02*as.integer(gender)
> > y<-y+0.024*as.integer(treat)
> > df<-data.frame(id, treat, gender, visit, y)
> > # groupedData object for lme
> > gdat<-groupedData(y ~ visit|id, data=df)
> > # fits - different ordering of factors
> > fit1<-lme(y ~ visit*treat*gender, data=gdat, random = ~visit|id)
> > anova(fit1)
> > fit2<-lme(y ~ gender*treat*visit, data=gdat, random = ~visit|id)
> > anova(fit2)
> > # Result: identical (balanced design so far), ok
> > # Now change gender of subject 1
> > gdat$gender[c(1,41,81,121)]<-2
> > # onece more fits with different ordering of factors
> > fit1<-lme(y ~ visit*treat*gender, data=gdat, random = ~visit|id)
> > anova(fit1)
> > fit2<-lme(y ~ gender*treat*visit, data=gdat, random = ~visit|id)
> > anova(fit2)
> > # Result: There are differences!!
> >
> > Hope anybody can help or give me advice how to interpret these results correctly or how to avoid this problem! Is there a better possibility to analyse these data than lme?
> >
> > Thanks!
> > Karl
>
> --
> Brian D. Ripley,                  ripley en stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
>
>
> ------------------------------
>
> Message: 25
> Date: Fri, 22 Jun 2007 17:27:42 +0200
> From: "Christophe Pallier" <christophe en pallier.org>
> Subject: Re: [R] Tools For Preparing Data For Analysis
> To: "Kevin E. Thorpe" <kevin.thorpe en utoronto.ca>
> Cc: r-help en stat.math.ethz.ch
> Message-ID:
>        <dea6cb960706220827y4c281c31t493106eeaffedd4b en mail.gmail.com>
> Content-Type: text/plain
>
> If I understand correctly (from your Perl script)
>
> 1. you count the number of occurences of each "(echo, muga)" pairs in the
> first file.
>
> 2. you remove from the second file the lines that correspond to these
> occurences.
>
> If this is indeed your aim, here's a solution in R:
>
> cumcount <- function(x) {
>  y <- numeric(length(x))
>  for (i in 1:length(y)) {
>     y[i] = sum(x[1:i] == x[i])
>  }
>  y
> }
>
> both <- read.csv('both_echo.csv')
> v <- table(paste(both$echo, "_", both$muga, sep=""))
>
> semi <- read.csv('qual_echo.csv')
> s <- paste(semi$echo, "_", semi$muga, sep="")
> cs = cumcount(s)
> count = v[s]
> count[is.na(count)]=0
>
> semi2 <- data.frame(semi, s, cs, count, keep = cs > count)
>
> > semi2
>  echo muga quant     s cs count  keep
> 1   10   20     0 10_20  1     0  TRUE
> 2   10   20     0 10_20  2     0  TRUE
> 3   10   21     0 10_21  1     1 FALSE
> 4   10   21     0 10_21  2     1  TRUE
> 5   10   24     0 10_24  1     0  TRUE
> 6   10   25     0 10_25  1     2 FALSE
> 7   10   25     0 10_25  2     2 FALSE
> 8   10   25     0 10_25  3     2  TRUE
>
>
> My code is not very readable...
> Yet, the 'trick' of using an helper function like 'cumcount' might be
> instructive.
>
> Christophe Pallier
>
>
> On 6/22/07, Kevin E. Thorpe <kevin.thorpe en utoronto.ca> wrote:
> >
> > I am posting to this thread that has been quiet for some time because I
> > remembered the following question.
> >
> > Christophe Pallier wrote:
> > > Hi,
> > >
> > > Can you provide examples of data formats that are problematic to read
> > and
> > > clean with R ?
> >
> > Today I had a data manipulation problem that I don't know how to do in R
> > so I solved it with perl.  Since I'm always interested in learning more
> > about complex data manipulation in R I am posting my problem in the
> > hopes of receiving some hints for doing this in R.
> >
> > If anyone has nothing better to do than play with other people's data,
> > I would be happy to send the row files off-list.
> >
> > Background:
> >
> > I have been given data that contains two measurements of left
> > ventricular ejection fraction.  One of the methods is echocardiogram
> > which sometimes gives a true quantitative value and other times a
> > semi-quantitative value.  The desire is to compare echo with the
> > other method (MUGA).  In most cases, patients had either quantitative
> > or semi-quantitative.  Same patients had both.  The data came
> > to me in excel files with, basically, no patient identifiers to link
> > the "both" with the semi-quantitative patients (the "both" patients
> > were in multiple data sets).
> >
> > What I wanted to do was extract from the semi-quantitative data file
> > those patients with only semi-quantitative.  All I have to link with
> > are the semi-quantitative echo and the MUGA and these pairs of values
> > are not unique.
> >
> > To make this more concrete, here are some portions of the raw data.
> >
> > "Both"
> >
> > "ID NUM","ECHO","MUGA","Semiquant","Quant"
> > "B",12,37,10,12
> > "D",13,13,10,13
> > "E",13,26,10,15
> > "F",13,31,10,13
> > "H",15,15,10,15
> > "I",15,21,10,15
> > "J",15,22,10,15
> > "K",17,22,10,17
> > "N",17.5,4,10,17.5
> > "P",18,25,10,18
> > "R",19,25,10,19
> >
> > Seimi-quantitative
> >
> > "echo","muga","quant"
> > 10,20,0      <-- keep
> > 10,20,0      <-- keep
> > 10,21,0      <-- remove
> > 10,21,0      <-- keep
> > 10,24,0      <-- keep
> > 10,25,0      <-- remove
> > 10,25,0      <-- remove
> > 10,25,0      <-- keep
> >
> > Here is the perl program I wrote for this.
> >
> > #!/usr/bin/perl
> >
> > open(BOTH, "quant_qual_echo.csv") || die "Can't open quant_qual_echo.csv";
> > # Discard first row;
> > $_ = <BOTH>;
> > while(<BOTH>) {
> >     chomp;
> >     ($id, $e, $m, $sq, $qu) = split(/,/);
> >     $both{$sq,$m}++;
> > }
> > close(BOTH);
> >
> > open(OUT, "> qual_echo_only.csv") || die "Can't open qual_echo_only.csv";
> > print OUT "pid,echo,muga,quant\n";
> > $pid = 2001;
> >
> > open(QUAL, "qual_echo.csv") || die "Can't open qual_echo.csv";
> > # Discard first row
> > $_ = <QUAL>;
> > while(<QUAL>) {
> >     chomp;
> >     ($echo, $muga, $quant) = split(/,/);
> >     if ($both{$echo,$muga} > 0) {
> >         $both{$echo,$muga}--;
> >     }
> >     else {
> >         print OUT "$pid,$echo,$muga,$quant\n";
> >         $pid++;
> >     }
> > }
> > close(QUAL);
> > close(OUT);
> >
> > open(OUT, "> both_echo.csv") || die "Can't open both_echo.csv";
> > print OUT "pid,echo,muga,quant\n";
> > $pid = 3001;
> >
> > open(BOTH, "quant_qual_echo.csv") || die "Can't open quant_qual_echo.csv";
> > # Discard first row;
> > $_ = <BOTH>;
> > while(<BOTH>) {
> >     chomp;
> >     ($id, $e, $m, $sq, $qu) = split(/,/);
> >     print OUT "$pid,$sq,$m,0\n";
> >     print OUT "$pid,$qu,$m,1\n";
> >     $pid++;
> > }
> > close(BOTH);
> > close(OUT);
> >
> >
> > --
> > Kevin E. Thorpe
> > Biostatistician/Trialist, Knowledge Translation Program
> > Assistant Professor, Department of Public Health Sciences
> > Faculty of Medicine, University of Toronto
> > email: kevin.thorpe en utoronto.ca  Tel: 416.864.5776  Fax: 416.864.6057
> >
> > ______________________________________________
> > R-help en stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Christophe Pallier (http://www.pallier.org)
>
>        [[alternative HTML version deleted]]
>
>
>
> ------------------------------
>
> Message: 26
> Date: Fri, 22 Jun 2007 16:30:44 +0100
> From: Simon Wood <s.wood en bath.ac.uk>
> Subject: Re: [R] interpretation of F-statistics in GAMs
> To: r-help en stat.math.ethz.ch
> Message-ID: <200706221630.44317.s.wood en bath.ac.uk>
> Content-Type: text/plain;  charset="iso-8859-1"
>
> On Friday 15 June 2007 08:06, robert.ptacnik en niva.no wrote:
> > dear listers,
> > I use gam (from mgcv) for evaluation of shape and strength of relationships
> > between a response variable and several predictors.
> > How can I interpret the 'F' values viven in the GAM summary? Is it
> > appropriate to treat them in a similar manner as the T-statistics in a
> > linear model, i.e. larger values mean that this variable has a stronger
> > impact than a variable with smaller F?
> - I'd be a bit cautious about this (even for T-statistics and linear models
> it's not quite clear to me what `impact' means if judged this way). These gam
> F statistics are only meant to provide a rough and ready means of judging
> approximate significance of terms, and I'm unsure about interpreting a
> comparison of such F ratios: for example the F statistics can be based on
> differerent numbers of degrees of freedom, depending on the term concerned...
>
> > When I run my analysis for two different response varables (but identical
> > predictors), is there a way to compare the F values among tests (like to
> > standardize them by teh sum of F within each test?) I append two summaries
> > below.
> - Again, I don't really known how this would work. I'd be more inclined to
> compare the plotted terms and associated CIs (and maybe the p-values),
> especially if you are using GAMs in a quite exploratory way (e.g. if the
> assumption of an additive structure is really a convenience, rather than
> being something that is suggested by the underlying science).
>
> best,
> Simon
>
> >
> >
> > ### example 1 ###
> >
> > Family: gaussian
> > Link function: identity
> >
> > Formula:
> > dep[sel, i] ~ s(date, k = 3) + s(depth, k = kn) + s(temp, k = kn) +
> >     s(light, k = kn) + s(PO4, k = kn) + s(DIN, k = kn) + s(prop.agpla,
> >     k = kn)
> >
> > Parametric coefficients:
> >             Estimate Std. Error t value Pr(>|t|)
> > (Intercept)   5.1048     0.0384   132.9   <2e-16 ***
> > ---
> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >
> > Approximate significance of smooth terms:
> >                 edf Est.rank      F  p-value
> > s(date)       1.669        2 12.161 1.07e-05 ***
> > s(depth)      1.671        2 36.125 4.85e-14 ***
> > s(temp)       1.927        2  6.686  0.00156 **
> > s(light)      1.886        2 12.604 7.20e-06 ***
> > s(PO4)        1.676        2  3.237  0.04143 *
> > s(DIN)        1.000        1 38.428 3.41e-09 ***
> > s(prop.agpla) 1.405        2 15.987 3.79e-07 ***
> > ---
> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >
> > R-sq.(adj) =  0.687   Deviance explained = 70.5%
> > GCV score = 0.31995   Scale est. = 0.30076   n = 204
> >
> > ### example 2 ###
> > Family: gaussian
> > Link function: identity
> >
> > Formula:
> > dep[sel, i] ~ s(date, k = 3) + s(depth, k = kn) + s(temp, k = kn) +
> >     s(light, k = kn) + s(PO4, k = kn) + s(DIN, k = kn) + s(prop.agpla,
> >     k = kn)
> >
> > Parametric coefficients:
> >             Estimate Std. Error t value Pr(>|t|)
> > (Intercept)  7.13588    0.05549   128.6   <2e-16 ***
> > ---
> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >
> > Approximate significance of smooth terms:
> >                 edf Est.rank      F  p-value
> > s(date)       1.944        2 15.997 3.67e-07 ***
> > s(depth)      1.876        2 25.427 1.52e-10 ***
> > s(temp)       1.000        1  2.866   0.0921 .
> > s(light)      1.751        2  4.212   0.0162 *
> > s(PO4)        1.950        2 10.632 4.14e-05 ***
> > s(DIN)        1.805        2 10.745 3.73e-05 ***
> > s(prop.agpla) 1.715        2  2.674   0.0715 .
> > ---
> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >
> >  R-sq.(adj) =  0.479   Deviance explained = 50.9%
> > GCV score = 0.6863   Scale est. = 0.64348   n = 209
> >
> > ______________________________________________
> > R-help en stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html and provide commented, minimal,
> > self-contained, reproducible code.
>
> --
> > Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> > +44 1225 386603  www.maths.bath.ac.uk/~sw283
>
>
>
> ------------------------------
>
> Message: 27
> Date: Fri, 22 Jun 2007 17:53:01 +0200
> From: Martin Maechler <maechler en stat.math.ethz.ch>
> Subject: Re: [R] Boxplot issues
> To: "S Ellison" <S.Ellison en lgc.co.uk>
> Cc: r-help en stat.math.ethz.ch
> Message-ID: <18043.61533.242252.779598 en stat.math.ethz.ch>
> Content-Type: text/plain; charset=us-ascii
>
> >>>>> "SE" == S Ellison <S.Ellison en lgc.co.uk>
> >>>>>     on Fri, 22 Jun 2007 13:02:20 +0100 writes:
>
>    SE> Boxplot and bxp seem to have changed behaviour a bit of late (R 2.4.1). Or maybe I am mis-remembering.
>    SE> An annoying feature is that while at=3:6 will work, there is no way of overriding the default xlim of 0.5 to n+0.5. That prevents plotting boxes on, for example, interval scales - a useful thing to do at times. I really can see no good reason for bxp to hard-core the xlim=c(0.5, n+0.5) in the function body; it should be a parameter default conditional on horizontal=, not hard coded.
>
>    SE> Also, boxplot does not drop empty groups. I'm sure it used to. I know it is good to be able to see where a factor level is unpopulated, but its a nuisance with fractional factorials and some nested or survey problems when many are known to be missing and are of no interest. Irrespective of whether my memory is correct, the option would be useful. How hard can it be to add a 'drop.empty=F' default to boxplot to allow it to switch?
>
>    SE> Obviously, these are things I can fix locally. But who 'owns' boxplot so I can provide suggested code to them for later releases?
>
>
> Legally speaking, I think that's a hard question the answer of
> which may even depend on the country where it is answered.
> I would like to say it is owned by the R Foundation.
>
> Suggested improvements of the R "base code" should be made and
> discussed on the R-devel mailing list. That's exactly the main
> purpose of that list.
> Such propositions typically make it into the code base
> if you are convincing and you provide code improvements that
> convince at least one member of R core that it's worth his time
> to implement, document, *and* test the changes.
>
> Also, as on R-help, it helps to work with small reproducible
> (ideally "cut-n-pastable") R code examples.
>
> Regards,
> Martin Maechler
>
>    SE> Steve Ellison
>
>
>
> ------------------------------
>
> Message: 28
> Date: Fri, 22 Jun 2007 12:19:35 -0400
> From: S?bastien <pomchip en free.fr>
> Subject: Re: [R] Overlaying lattice graphs (continued)
> To: Deepayan Sarkar <deepayan.sarkar en gmail.com>
> Cc: R-help <r-help en stat.math.ethz.ch>
> Message-ID: <467BF697.8050203 en free.fr>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> Hi Deepayan,
>
> The following code creates a dummy dataset which has the same similar as
> my usual datasets. I did not try to implement the changes proposed by
> Hadley, hoping that a solution can be found using the original dataset.
>
> ######### My code
>
> # Creating dataset
>
> nPts<-10            # number of time points
> nInd<-6              # number of individuals
> nModel<-3         # number of models
>
> TimePts<-rep(1:nPts,nInd*nModel)                                    #
> creates the "Time" column
> Coef<-rep(rnorm(6,0.1,0.01),each=nPts,nModel)             # Creates a
> vector of coefficients for generating the observations
> Obs<-10*exp(-Coef*TimePts)                                         #
> creates the observations
>
> for (i in 1:60){
> Pred[i]<-jitter(10*exp(-Coef[i]*TimePts[i]))
> Pred[i+60]<-jitter(5)
> Pred[i+120]<-jitter(10-Coef[i+120]*TimePts[i])
> }
>                  # creates the predicted values
>
> colPlot<-rep(1,nPts*nInd*nModel)
>    # creates the "Plot" column
> colModel<-gl(nModel,nPts*nInd,labels=c("A","B","C"))             #
> creates the "Model" column
> colID<-gl(nInd,nPts,nPts*nInd*nModel)
>      # creates the "ID" column
>
> mydata<-data.frame(colPlot,colModel,colID,TimePts,Obs,Pred)
>              # creates the dataset
> names(mydata)<-c("Plot","Model","Individuals","Time","Observed","Predicted")
>
> # Plotting as indicated by Deepayan
>
> xyplot(Observed + Predicted ~ Time | Individuals + Model,
>      data = mydata,
>      panel = panel.superpose.2, type = c("p", "l"),
>      layout = c(0, nlevels(mydata$Individuals))) #,
>      #<...>)
>
> ####### End of code
>
> This codes is not exactly what I am looking for, although it is pretty
> close. In the present case, I would like to have a Trellis plot with 6
> panels (one for each individual), where the Observations and the
> Predicted are plotted as symbols and lines, respectively. All three
> models should be plotted on the same panel. Unfortunately, it looks to
> me as 3 successives xyplots are created by the code above but only the
> last one remains displayed. I tried to play with
> panel.superpose,panel.superpose.2 and type, without much success.
>
> I also tried the following code that creates 18 panels and distinguish
> all (Individuals,Model) couples... so, not what I want.
>
> xyplot(Observed + Predicted ~ Time | Individuals+Model, data = mydata,
>     type = c("p", "l"), distribute.type = TRUE)
>
> Sebastien
>
>
> Deepayan Sarkar a ?crit :
> > On 6/21/07, S?bastien <pomchip en free.fr> wrote:
> >> Hi Hadley,
> >>
> >> Hopefully, my dataset won't be too hard to changed. Can I modify the
> >> aspect of each group using your code (symbols for observed and lines for
> >> predicted)?
> >>
> >> Sebastien
> >>
> >> hadley wickham a ?crit :
> >> > Hi Sebastian,
> >> >
> >> > I think you need to rearrange your data a bit.  Firstly, you need to
> >> > put observed on the same footing as the different models, so you would
> >> > have a new column in your data called value (previously observed and
> >> > predicted) and a new model type ("observed").  Then you could do:
> >
> > Yes, and ?make.groups (and reshape of course) could help with that.
> > This might not be strictly necessary though.
> >
> > However, I'm finding your pseudo-code confusing. Could you create a
> > small example data set that can be used to try out some real code?
> > Just from your description, I would have suggested something like
> >
> > xyplot(Observed + Predicted ~ Time | Individuals + Model,
> >       data = mydata,
> >       panel = panel.superpose.2, type = c("p", "l"),
> >       layout = c(0, nlevels(mydata$Individuals)),
> >       <...>)
> >
> > If all you want is to plot one page at a time, there are easier ways
> > to do that.
> >
> > -Deepayan
> >
> >> >
> >> > xyplot(value ~ time | individauls, data=mydata, group=model)
> >> >
> >> > Hadley
> >> >
> >> >
> >> > On 6/21/07, S?bastien <pomchip en free.fr> wrote:
> >> >> Dear R Users,
> >> >>
> >> >> I recently posted an email on this list  about the use of
> >> data.frame and
> >> >> overlaying multiple plots. Deepayan kindly indicated to me the
> >> >> panel.superposition command which worked perfectly in the context
> >> of the
> >> >> example I gave.
> >> >> I'd like to go a little bit further on this topic using a more
> >> complex
> >> >> dataset structure (actually the one I want to work on).
> >> >>
> >> >>  >mydata
> >> >>       Plot    Model    Individuals    Time        Observed
> >> >> Predicted
> >> >> 1    1        A           1                  0.05
> >> >> 10                    10.2
> >> >> 2    1        A           1                  0.10
> >> >> 20                    19.5
> >> >> etc...
> >> >> 10  1        B           1                  0.05         10
> >> >>          9.8
> >> >> 11  1        B           1                  0.10         20
> >> >>          20.2
> >> >> etc...
> >> >>
> >> >> There are p "levels" in mydata$Plot, m in mydata$Model, n in
> >> >> mydata$Individuals and t in mydata$Time (Note that I probably use the
> >> >> word levels improperly as all columns are not factors). Basically,
> >> this
> >> >> dataset summarizes the t measurements obtained in n individuals as
> >> well
> >> >> as the predicted values from m different modeling approaches
> >> (applied to
> >> >> all individuals). Therefore, the observations are repeated m times in
> >> >> the Observed columns, while the predictions appears only once for a
> >> >> given model an a given individual.
> >> >>
> >> >> What I want to write is a R batch file creating a Trellis graph,
> >> where
> >> >> each panel corresponds to one individual and contains the
> >> observations
> >> >> (as scatterplot) plus the predicted values for all models (as
> >> lines of
> >> >> different colors)... $Plot is just a token: it might be used to not
> >> >> overload graphs in case there are too many tested models. The fun
> >> part
> >> >> is that the values of p, m, n and t might vary from one dataset to
> >> the
> >> >> other, so everything has to be coded dynamically.
> >> >>
> >> >> For the plotting part I was thinking about having a loop in my code
> >> >> containing something like that:
> >> >>
> >> >> for (i in 1:nlevels(mydata$Model)) {
> >> >>
> >> >> subdata<-subset(mydata,mydata$Model=level(mydata$Model)[i])
> >> >> xyplot(subset(Observed + Predicted ~ Time | Individuals, data =
> >> >> subdata)       #plus additionnal formatting code
> >> >>
> >> >> }
> >> >>
> >> >> Unfortunately, this code simply creates a new Trellis plot instead of
> >> >> adding the model one by one on the panels. Any idea or link to a
> >> useful
> >> >> command will wellcome.
> >> >>
> >> >> Sebastien
> >> >>
> >> >> ______________________________________________
> >> >> R-help en stat.math.ethz.ch mailing list
> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> PLEASE do read the posting guide
> >> >> http://www.R-project.org/posting-guide.html
> >> >> and provide commented, minimal, self-contained, reproducible code.
> >> >>
> >> >
> >> >
> >>
> >> ______________________________________________
> >> R-help en stat.math.ethz.ch mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
>
>
>
> ------------------------------
>
> Message: 29
> Date: Fri, 22 Jun 2007 19:26:56 +0200
> From: "Jose Quesada " <quesada en gmail.com>
> Subject: [R] Matrix library, CHOLMOD error: problem too large
> To: "r-help en lists.r-project.org" <r-help en stat.math.ethz.ch>
> Message-ID: <op.tub2q6d64hcap5 en delllap.ugr.es>
> Content-Type: text/plain; format=flowed; delsp=yes;
>        charset=iso-8859-15
>
>
> I have a pretty large sparse matrix of integers:
> > dim(tasa)
> [1] 91650 37651
>
> I need to add one to it in order to take logs, but I'm getting the
> following error:
>
> > tasa  = log(tasa + 1)
> CHOLMOD error: problem too large
> Error in asMethod(object) : Cholmod error `problem too large'
>
> I have 2 Gb of RAM, and the current workspace is barely 300mb.
> Is there any workaround to this? Anyone has any experience with this error?
>
> Thanks,
> -Jose
>
> --
> Jose Quesada, PhD.
> http://www.andrew.cmu.edu/~jquesada
>
>
>
> ------------------------------
>
> Message: 30
> Date: Fri, 22 Jun 2007 14:04:03 -0400
> From: Duncan Murdoch <murdoch en stats.uwo.ca>
> Subject: Re: [R] Matrix library, CHOLMOD error: problem too large
> To: Jose Quesada <quesada en gmail.com>
> Cc: "r-help en lists.r-project.org" <r-help en stat.math.ethz.ch>
> Message-ID: <467C0F13.4060004 en stats.uwo.ca>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> On 6/22/2007 1:26 PM, Jose Quesada wrote:
> > I have a pretty large sparse matrix of integers:
> >> dim(tasa)
> > [1] 91650 37651
> >
> > I need to add one to it in order to take logs, but I'm getting the
> > following error:
> >
> >> tasa  = log(tasa + 1)
> > CHOLMOD error: problem too large
> > Error in asMethod(object) : Cholmod error `problem too large'
> >
> > I have 2 Gb of RAM, and the current workspace is barely 300mb.
> > Is there any workaround to this? Anyone has any experience with this error?
> >
>
> If tasa is sparse, then tasa+1 will not be sparse, so that's likely your
> problem.  You might have better luck with
>
> log1p(tasa)
>
> if the authors of the Matrix package have written a method for log1p();
> if not, you'll probably have to do it yourself.
>
> Duncan Murdoch
>
>
>
> ------------------------------
>
> Message: 31
> Date: Fri, 22 Jun 2007 11:44:52 -0700
> From: "Horace Tso" <Horace.Tso en pgn.com>
> Subject: [R] Imputing missing values in time series
> To: <r-help en stat.math.ethz.ch>
> Message-ID: <467BB6340200006500006924 en pgn.com>
> Content-Type: text/plain; charset=ISO-8859-15
>
> Folks,
>
> This must be a rather common problem with real life time series data
> but I don't see anything in the archive about how to deal with it. I
> have a time series of natural gas prices by flow date. Since gas is not
> traded on weekends and holidays, I have a lot of missing values,
>
> FDate   Price
> 11/1/2006       6.28
> 11/2/2006       6.58
> 11/3/2006       6.586
> 11/4/2006       6.716
> 11/5/2006       NA
> 11/6/2006       NA
> 11/7/2006       6.262
> 11/8/2006       6.27
> 11/9/2006       6.696
> 11/10/2006      6.729
> 11/11/2006      6.487
> 11/12/2006      NA
> 11/13/2006      NA
> 11/14/2006      6.725
> 11/15/2006      6.844
> 11/16/2006      6.907
>
> What I would like to do is to fill the NAs with the price from the
> previous date * gas used during holidays is purchased from the week
> before. Though real simple, I wonder if there is a function to perform
> this task. Some of the imputation functions I'm aware of (eg. impute,
> transcan in Hmisc) seem to deal with completely different problems.
>
> 2.5.0/Windows XP
>
> Thanks in advance.
>
> HT
>
>
>
> ------------------------------
>
> Message: 32
> Date: Fri, 22 Jun 2007 14:01:52 -0500
> From: Erik Iverson <iverson en biostat.wisc.edu>
> Subject: Re: [R] Imputing missing values in time series
> To: Horace Tso <Horace.Tso en pgn.com>
> Cc: r-help en stat.math.ethz.ch
> Message-ID: <467C1CA0.1080606 en biostat.wisc.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> I think my example should work for you, but I couldn't think of a way to
> do this without an interative while loop.
>
> test <- c(1,2,3,NA,4,NA,NA,5,NA,6,7,NA)
>
> while(any(is.na(test)))
> test[is.na(test)] <- test[which(is.na(test))-1]
>
>  test
>  [1] 1 2 3 3 4 4 4 5 5 6 7 7
>
> Horace Tso wrote:
> > Folks,
> >
> > This must be a rather common problem with real life time series data
> > but I don't see anything in the archive about how to deal with it. I
> > have a time series of natural gas prices by flow date. Since gas is not
> > traded on weekends and holidays, I have a lot of missing values,
> >
> > FDate Price
> > 11/1/2006     6.28
> > 11/2/2006     6.58
> > 11/3/2006     6.586
> > 11/4/2006     6.716
> > 11/5/2006     NA
> > 11/6/2006     NA
> > 11/7/2006     6.262
> > 11/8/2006     6.27
> > 11/9/2006     6.696
> > 11/10/2006    6.729
> > 11/11/2006    6.487
> > 11/12/2006    NA
> > 11/13/2006    NA
> > 11/14/2006    6.725
> > 11/15/2006    6.844
> > 11/16/2006    6.907
> >
> > What I would like to do is to fill the NAs with the price from the
> > previous date * gas used during holidays is purchased from the week
> > before. Though real simple, I wonder if there is a function to perform
> > this task. Some of the imputation functions I'm aware of (eg. impute,
> > transcan in Hmisc) seem to deal with completely different problems.
> >
> > 2.5.0/Windows XP
> >
> > Thanks in advance.
> >
> > HT
> >
> > ______________________________________________
> > R-help en stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> ------------------------------
>
> Message: 33
> Date: Fri, 22 Jun 2007 12:08:26 -0700
> From: "Horace Tso" <Horace.Tso en pgn.com>
> Subject: Re: [R] Imputing missing values in time series
> To: "Erik Iverson" <iverson en biostat.wisc.edu>
> Cc: r-help en stat.math.ethz.ch
> Message-ID: <467BBBBA0200006500006928 en pgn.com>
> Content-Type: text/plain; charset=US-ASCII
>
> Erik, indeed it gets the work done. I was hoping to avoid the dreaded looping, though.....
>
> Thanks.
>
> Horace
>
> >>> Erik Iverson <iverson en biostat.wisc.edu> 6/22/2007 12:01 PM >>>
> I think my example should work for you, but I couldn't think of a way to
> do this without an interative while loop.
>
> test <- c(1,2,3,NA,4,NA,NA,5,NA,6,7,NA)
>
> while(any(is.na(test)))
> test[is.na(test)] <- test[which(is.na(test))-1]
>
>  test
>  [1] 1 2 3 3 4 4 4 5 5 6 7 7
>
> Horace Tso wrote:
> > Folks,
> >
> > This must be a rather common problem with real life time series data
> > but I don't see anything in the archive about how to deal with it. I
> > have a time series of natural gas prices by flow date. Since gas is not
> > traded on weekends and holidays, I have a lot of missing values,
> >
> > FDate Price
> > 11/1/2006     6.28
> > 11/2/2006     6.58
> > 11/3/2006     6.586
> > 11/4/2006     6.716
> > 11/5/2006     NA
> > 11/6/2006     NA
> > 11/7/2006     6.262
> > 11/8/2006     6.27
> > 11/9/2006     6.696
> > 11/10/2006    6.729
> > 11/11/2006    6.487
> > 11/12/2006    NA
> > 11/13/2006    NA
> > 11/14/2006    6.725
> > 11/15/2006    6.844
> > 11/16/2006    6.907
> >
> > What I would like to do is to fill the NAs with the price from the
> > previous date * gas used during holidays is purchased from the week
> > before. Though real simple, I wonder if there is a function to perform
> > this task. Some of the imputation functions I'm aware of (eg. impute,
> > transcan in Hmisc) seem to deal with completely different problems.
> >
> > 2.5.0/Windows XP
> >
> > Thanks in advance.
> >
> > HT
> >
> > ______________________________________________
> > R-help en stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> ------------------------------
>
> Message: 34
> Date: Fri, 22 Jun 2007 21:13:09 +0200
> From: "hadley wickham" <h.wickham en gmail.com>
> Subject: Re: [R] Switching X-axis and Y-axis for histogram
> To: "Donghui Feng" <donghui.feng en gmail.com>
> Cc: r-help en stat.math.ethz.ch
> Message-ID:
>        <f8e6ff050706221213j3dc25aeaw6c6d4df7a4511e06 en mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> It's trivial to do this with ggplot2 (http://had.co.nz):
>
> qplot(rating, data=movies, geom="histogram") + coord_flip()
> qplot(rating, data=movies, geom="histogram", binwidth=0.1) + coord_flip()
>
> Hadley
>
> On 6/22/07, Donghui Feng <donghui.feng en gmail.com> wrote:
> > Dear all,
> >
> > I'm creating a histogram with the function hist(). But
> > right now what I get is column representation (as normal).
> > I'm wondering if I could switch X-axis and Y-axis and
> > get row-representation of frequencies?
> >
> > One more question, can I define the step of each axises
> > for the histogram?
> >
> > Thanks so much!
> >
> > Donghui
> >
> > ______________________________________________
> > R-help en stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> ------------------------------
>
> Message: 35
> Date: Fri, 22 Jun 2007 15:16:09 -0400
> From: "Leeds, Mark \(IED\)" <Mark.Leeds en morganstanley.com>
> Subject: Re: [R] Imputing missing values in time series
> To: "Erik Iverson" <iverson en biostat.wisc.edu>,  "Horace Tso"
>        <Horace.Tso en pgn.com>
> Cc: r-help en stat.math.ethz.ch
> Message-ID:
>        <D3AEEDA31E57474B840BEBC25A8A834401957418 en NYWEXMB23.msad.ms.com>
> Content-Type: text/plain;       charset="us-ascii"
>
> I have a function that does this type of thing but it works off a pure
> vector so it wouldn have to be modified.
> If you make your object a zoo object, the that object has many functions
> associated with it and na.locf would
> Do what you need, I think.
>
>
> -----Original Message-----
> From: r-help-bounces en stat.math.ethz.ch
> [mailto:r-help-bounces en stat.math.ethz.ch] On Behalf Of Erik Iverson
> Sent: Friday, June 22, 2007 3:02 PM
> To: Horace Tso
> Cc: r-help en stat.math.ethz.ch
> Subject: Re: [R] Imputing missing values in time series
>
> I think my example should work for you, but I couldn't think of a way to
> do this without an interative while loop.
>
> test <- c(1,2,3,NA,4,NA,NA,5,NA,6,7,NA)
>
> while(any(is.na(test)))
> test[is.na(test)] <- test[which(is.na(test))-1]
>
>  test
>  [1] 1 2 3 3 4 4 4 5 5 6 7 7
>
> Horace Tso wrote:
> > Folks,
> >
> > This must be a rather common problem with real life time series data
> > but I don't see anything in the archive about how to deal with it. I
> > have a time series of natural gas prices by flow date. Since gas is
> > not traded on weekends and holidays, I have a lot of missing values,
> >
> > FDate Price
> > 11/1/2006     6.28
> > 11/2/2006     6.58
> > 11/3/2006     6.586
> > 11/4/2006     6.716
> > 11/5/2006     NA
> > 11/6/2006     NA
> > 11/7/2006     6.262
> > 11/8/2006     6.27
> > 11/9/2006     6.696
> > 11/10/2006    6.729
> > 11/11/2006    6.487
> > 11/12/2006    NA
> > 11/13/2006    NA
> > 11/14/2006    6.725
> > 11/15/2006    6.844
> > 11/16/2006    6.907
> >
> > What I would like to do is to fill the NAs with the price from the
> > previous date * gas used during holidays is purchased from the week
> > before. Though real simple, I wonder if there is a function to perform
>
> > this task. Some of the imputation functions I'm aware of (eg. impute,
> > transcan in Hmisc) seem to deal with completely different problems.
> >
> > 2.5.0/Windows XP
> >
> > Thanks in advance.
> >
> > HT
> >
> > ______________________________________________
> > R-help en stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help en stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> --------------------------------------------------------
>
> This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
>
>
>
> ------------------------------
>
> Message: 36
> Date: Fri, 22 Jun 2007 12:21:40 -0700
> From: "Horace Tso" <Horace.Tso en pgn.com>
> Subject: Re: [R] Imputing missing values in time series
> To: "Erik Iverson" <iverson en biostat.wisc.edu>,  "Mark (IED) Leeds"
>        <Mark.Leeds en morganstanley.com>
> Cc: r-help en stat.math.ethz.ch
> Message-ID: <467BBED40200006500006931 en pgn.com>
> Content-Type: text/plain; charset=US-ASCII
>
> Mark, thanks for the tips. I thought you financial folks must have run into things like these before. Just wonder why this problem wasn't asked more often on this list.
>
> H.
>
>
> >>> "Leeds, Mark (IED)" <Mark.Leeds en morganstanley.com> 6/22/2007 12:16 PM >>>
> I have a function that does this type of thing but it works off a pure
> vector so it wouldn have to be modified.
> If you make your object a zoo object, the that object has many functions
> associated with it and na.locf would
> Do what you need, I think.
>
>
> -----Original Message-----
> From: r-help-bounces en stat.math.ethz.ch
> [mailto:r-help-bounces en stat.math.ethz.ch] On Behalf Of Erik Iverson
> Sent: Friday, June 22, 2007 3:02 PM
> To: Horace Tso
> Cc: r-help en stat.math.ethz.ch
> Subject: Re: [R] Imputing missing values in time series
>
> I think my example should work for you, but I couldn't think of a way to
> do this without an interative while loop.
>
> test <- c(1,2,3,NA,4,NA,NA,5,NA,6,7,NA)
>
> while(any(is.na(test)))
> test[is.na(test)] <- test[which(is.na(test))-1]
>
>  test
>  [1] 1 2 3 3 4 4 4 5 5 6 7 7
>
> Horace Tso wrote:
> > Folks,
> >
> > This must be a rather common problem with real life time series data
> > but I don't see anything in the archive about how to deal with it. I
> > have a time series of natural gas prices by flow date. Since gas is
> > not traded on weekends and holidays, I have a lot of missing values,
> >
> > FDate Price
> > 11/1/2006     6.28
> > 11/2/2006     6.58
> > 11/3/2006     6.586
> > 11/4/2006     6.716
> > 11/5/2006     NA
> > 11/6/2006     NA
> > 11/7/2006     6.262
> > 11/8/2006     6.27
> > 11/9/2006     6.696
> > 11/10/2006    6.729
> > 11/11/2006    6.487
> > 11/12/2006    NA
> > 11/13/2006    NA
> > 11/14/2006    6.725
> > 11/15/2006    6.844
> > 11/16/2006    6.907
> >
> > What I would like to do is to fill the NAs with the price from the
> > previous date * gas used during holidays is purchased from the week
> > before. Though real simple, I wonder if there is a function to perform
>
> > this task. Some of the imputation functions I'm aware of (eg. impute,
> > transcan in Hmisc) seem to deal with completely different problems.
> >
> > 2.5.0/Windows XP
> >
> > Thanks in advance.
> >
> > HT
> >
> > ______________________________________________
> > R-help en stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help en stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> --------------------------------------------------------
>
> This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
>
>
>
> ------------------------------
>
> Message: 37
> Date: Fri, 22 Jun 2007 21:40:26 +0200
> From: "hadley wickham" <h.wickham en gmail.com>
> Subject: Re: [R] Overlaying lattice graphs (continued)
> To: " S?bastien " <pomchip en free.fr>
> Cc: R-help <r-help en stat.math.ethz.ch>
> Message-ID:
>        <f8e6ff050706221240m240391dfrf83f5eb6d47e1766 en mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hi Sebastian,
>
> I think the following does what you want:
>
> library(ggplot2)
> names(mydata) <- tolower(names(mydata))
>
> obs <- rename(subset(mydata, model=="A", -predicted), c("observed" = "value"))
> obs$model <- factor("observed")
> pred <- rename(mydata[, -5], c("predicted" = "value"))
> all <- rbind(obs, pred)
>
> ggplot(all, aes(x = time, y = value, colour=model)) +
> geom_point(data = subset(all, model != "Observed")) +
> geom_line(data= subset(all, model == "Observed")) +
> facet_grid(. ~ individuals)
>
> Hadley
>
> On 6/22/07, S?bastien <pomchip en free.fr> wrote:
> > Hi Deepayan,
> >
> > The following code creates a dummy dataset which has the same similar as
> > my usual datasets. I did not try to implement the changes proposed by
> > Hadley, hoping that a solution can be found using the original dataset.
> >
> > ######### My code
> >
> > # Creating dataset
> >
> > nPts<-10            # number of time points
> > nInd<-6              # number of individuals
> > nModel<-3         # number of models
> >
> > TimePts<-rep(1:nPts,nInd*nModel)                                    #
> > creates the "Time" column
> > Coef<-rep(rnorm(6,0.1,0.01),each=nPts,nModel)             # Creates a
> > vector of coefficients for generating the observations
> > Obs<-10*exp(-Coef*TimePts)                                         #
> > creates the observations
> >
> > for (i in 1:60){
> > Pred[i]<-jitter(10*exp(-Coef[i]*TimePts[i]))
> > Pred[i+60]<-jitter(5)
> > Pred[i+120]<-jitter(10-Coef[i+120]*TimePts[i])
> > }
> >                   # creates the predicted values
> >
> > colPlot<-rep(1,nPts*nInd*nModel)
> >     # creates the "Plot" column
> > colModel<-gl(nModel,nPts*nInd,labels=c("A","B","C"))             #
> > creates the "Model" column
> > colID<-gl(nInd,nPts,nPts*nInd*nModel)
> >       # creates the "ID" column
> >
> > mydata<-data.frame(colPlot,colModel,colID,TimePts,Obs,Pred)
> >               # creates the dataset
> > names(mydata)<-c("Plot","Model","Individuals","Time","Observed","Predicted")
> >
> > # Plotting as indicated by Deepayan
> >
> >
> > xyplot(Observed + Predicted ~ Time | Individuals + Model,
> >       data = mydata,
> >       panel = panel.superpose.2, type = c("p", "l"),
> >       layout = c(0, nlevels(mydata$Individuals))) #,
> >       #<...>)
> >
> > ####### End of code
> >
> > This codes is not exactly what I am looking for, although it is pretty
> > close. In the present case, I would like to have a Trellis plot with 6
> > panels (one for each individual), where the Observations and the
> > Predicted are plotted as symbols and lines, respectively. All three
> > models should be plotted on the same panel. Unfortunately, it looks to
> > me as 3 successives xyplots are created by the code above but only the
> > last one remains displayed. I tried to play with
> > panel.superpose,panel.superpose.2 and type, without much success.
> >
> > I also tried the following code that creates 18 panels and distinguish
> > all (Individuals,Model) couples... so, not what I want.
> >
> > xyplot(Observed + Predicted ~ Time | Individuals+Model, data = mydata,
> >      type = c("p", "l"), distribute.type = TRUE)
> >
> > Sebastien
> >
> >
> > Deepayan Sarkar a ?crit :
> > > On 6/21/07, S?bastien <pomchip en free.fr> wrote:
> > >> Hi Hadley,
> > >>
> > >> Hopefully, my dataset won't be too hard to changed. Can I modify the
> > >> aspect of each group using your code (symbols for observed and lines for
> > >> predicted)?
> > >>
> > >> Sebastien
> > >>
> > >> hadley wickham a ?crit :
> > >> > Hi Sebastian,
> > >> >
> > >> > I think you need to rearrange your data a bit.  Firstly, you need to
> > >> > put observed on the same footing as the different models, so you would
> > >> > have a new column in your data called value (previously observed and
> > >> > predicted) and a new model type ("observed").  Then you could do:
> > >
> > > Yes, and ?make.groups (and reshape of course) could help with that.
> > > This might not be strictly necessary though.
> > >
> > > However, I'm finding your pseudo-code confusing. Could you create a
> > > small example data set that can be used to try out some real code?
> > > Just from your description, I would have suggested something like
> > >
> > > xyplot(Observed + Predicted ~ Time | Individuals + Model,
> > >       data = mydata,
> > >       panel = panel.superpose.2, type = c("p", "l"),
> > >       layout = c(0, nlevels(mydata$Individuals)),
> > >       <...>)
> > >
> > > If all you want is to plot one page at a time, there are easier ways
> > > to do that.
> > >
> > > -Deepayan
> > >
> > >> >
> > >> > xyplot(value ~ time | individauls, data=mydata, group=model)
> > >> >
> > >> > Hadley
> > >> >
> > >> >
> > >> > On 6/21/07, S?bastien <pomchip en free.fr> wrote:
> > >> >> Dear R Users,
> > >> >>
> > >> >> I recently posted an email on this list  about the use of
> > >> data.frame and
> > >> >> overlaying multiple plots. Deepayan kindly indicated to me the
> > >> >> panel.superposition command which worked perfectly in the context
> > >> of the
> > >> >> example I gave.
> > >> >> I'd like to go a little bit further on this topic using a more
> > >> complex
> > >> >> dataset structure (actually the one I want to work on).
> > >> >>
> > >> >>  >mydata
> > >> >>       Plot    Model    Individuals    Time        Observed
> > >> >> Predicted
> > >> >> 1    1        A           1                  0.05
> > >> >> 10                    10.2
> > >> >> 2    1        A           1                  0.10
> > >> >> 20                    19.5
> > >> >> etc...
> > >> >> 10  1        B           1                  0.05         10
> > >> >>          9.8
> > >> >> 11  1        B           1                  0.10         20
> > >> >>          20.2
> > >> >> etc...
> > >> >>
> > >> >> There are p "levels" in mydata$Plot, m in mydata$Model, n in
> > >> >> mydata$Individuals and t in mydata$Time (Note that I probably use the
> > >> >> word levels improperly as all columns are not factors). Basically,
> > >> this
> > >> >> dataset summarizes the t measurements obtained in n individuals as
> > >> well
> > >> >> as the predicted values from m different modeling approaches
> > >> (applied to
> > >> >> all individuals). Therefore, the observations are repeated m times in
> > >> >> the Observed columns, while the predictions appears only once for a
> > >> >> given model an a given individual.
> > >> >>
> > >> >> What I want to write is a R batch file creating a Trellis graph,
> > >> where
> > >> >> each panel corresponds to one individual and contains the
> > >> observations
> > >> >> (as scatterplot) plus the predicted values for all models (as
> > >> lines of
> > >> >> different colors)... $Plot is just a token: it might be used to not
> > >> >> overload graphs in case there are too many tested models. The fun
> > >> part
> > >> >> is that the values of p, m, n and t might vary from one dataset to
> > >> the
> > >> >> other, so everything has to be coded dynamically.
> > >> >>
> > >> >> For the plotting part I was thinking about having a loop in my code
> > >> >> containing something like that:
> > >> >>
> > >> >> for (i in 1:nlevels(mydata$Model)) {
> > >> >>
> > >> >> subdata<-subset(mydata,mydata$Model=level(mydata$Model)[i])
> > >> >> xyplot(subset(Observed + Predicted ~ Time | Individuals, data =
> > >> >> subdata)       #plus additionnal formatting code
> > >> >>
> > >> >> }
> > >> >>
> > >> >> Unfortunately, this code simply creates a new Trellis plot instead of
> > >> >> adding the model one by one on the panels. Any idea or link to a
> > >> useful
> > >> >> command will wellcome.
> > >> >>
> > >> >> Sebastien
> > >> >>
> > >> >> ______________________________________________
> > >> >> R-help en stat.math.ethz.ch mailing list
> > >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> >> PLEASE do read the posting guide
> > >> >> http://www.R-project.org/posting-guide.html
> > >> >> and provide commented, minimal, self-contained, reproducible code.
> > >> >>
> > >> >
> > >> >
> > >>
> > >> ______________________________________________
> > >> R-help en stat.math.ethz.ch mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >> http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.
> > >>
> >
>
>
>
> ------------------------------
>
> Message: 38
> Date: Fri, 22 Jun 2007 12:47:51 -0700
> From: "Deepayan Sarkar" <deepayan.sarkar en gmail.com>
> Subject: Re: [R] Overlaying lattice graphs (continued)
> To: " S?bastien " <pomchip en free.fr>
> Cc: R-help <r-help en stat.math.ethz.ch>
> Message-ID:
>        <eb555e660706221247v59cf4f4cpdaee99e0a1372c84 en mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> On 6/22/07, S?bastien <pomchip en free.fr> wrote:
> > Hi Deepayan,
> >
> > The following code creates a dummy dataset which has the same similar as
> > my usual datasets. I did not try to implement the changes proposed by
> > Hadley, hoping that a solution can be found using the original dataset.
> >
> > ######### My code
> >
> > # Creating dataset
> >
> > nPts<-10            # number of time points
> > nInd<-6              # number of individuals
> > nModel<-3         # number of models
> >
> > TimePts<-rep(1:nPts,nInd*nModel)                                    #
> > creates the "Time" column
> > Coef<-rep(rnorm(6,0.1,0.01),each=nPts,nModel)             # Creates a
> > vector of coefficients for generating the observations
> > Obs<-10*exp(-Coef*TimePts)                                         #
> > creates the observations
> >
> > for (i in 1:60){
> > Pred[i]<-jitter(10*exp(-Coef[i]*TimePts[i]))
> > Pred[i+60]<-jitter(5)
> > Pred[i+120]<-jitter(10-Coef[i+120]*TimePts[i])
> > }
> >                   # creates the predicted values
> >
> > colPlot<-rep(1,nPts*nInd*nModel)
> >     # creates the "Plot" column
> > colModel<-gl(nModel,nPts*nInd,labels=c("A","B","C"))             #
> > creates the "Model" column
> > colID<-gl(nInd,nPts,nPts*nInd*nModel)
> >       # creates the "ID" column
> >
> > mydata<-data.frame(colPlot,colModel,colID,TimePts,Obs,Pred)
> >               # creates the dataset
> > names(mydata)<-c("Plot","Model","Individuals","Time","Observed","Predicted")
>
> The way you have structured your data makes no sense to me. In
> particular, your 'Observed' data is the same set of 60 numbers
> repeated 3 times, and this is not reflected in the data structure at
> all. What would you want to happen if the numbers were not repeated?
> Would you always plot the first 60, or would plot all of them?
>
> If I understand what you are trying to do, this might be a more
> transparent approach:
>
>
> nPts<-10   # number of time points
> nInd<-6    # number of individuals
>
> TimePts <- rep(1:nPts, nInd)
> Coef <- rep(rnorm(6,0.1,0.01), each = nPts)
> Obs <- 10 * exp(-Coef * TimePts)
> colID <- gl(nInd, nPts)
>
> mydata <- data.frame(Time = TimePts, Observed = Obs, Individuals = colID)
>
> fmA <- lm(Observed ~ Time, mydata)
> fmB <- lm(Observed ~ poly(Time, 2), mydata)
> fmC <- lm(Observed ~ poly(Time, 2) * Individuals, mydata)
>
> mydata$PredA <- predict(fmA)
> mydata$PredB <- predict(fmB)
> mydata$PredC <- predict(fmC)
>
> xyplot(Observed + PredA + PredB + PredC ~ Time | Individuals,
>       data = mydata,
>       type = c("p", "l", "l", "l"),
>       distribute.type = TRUE)
>
> -Deepayan
>
>
>
> ------------------------------
>
> Message: 39
> Date: Fri, 22 Jun 2007 21:55:52 +0200
> From: "hadley wickham" <h.wickham en gmail.com>
> Subject: Re: [R] Stacked barchart color
> To: owenman <solberg en speakeasy.net>
> Cc: r-help en stat.math.ethz.ch
> Message-ID:
>        <f8e6ff050706221255j37e15382n4179d5b276486c0b en mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hi Owen,
>
> The bars should be stacked in the order specified by the factor.  Try
> using factor(..., levels=...) to explicitly order them the way you
> want.  If that doesn't work, please provide a small replicable example
> and I'll look into it.
>
> Hadley
>
> On 6/18/07, owenman <solberg en speakeasy.net> wrote:
> >
> > Hi Hadley,
> > Great, I am starting to get it.  It's working for me, but there is one more
> > thing I am having trouble with.  The ordering of the stacked bars seems to
> > be dictated by the name of the color, I guess because of the fill=color
> > argument in aes().  In other words, if I set up my colors like this:
> > y$color = c("gray1","gray35","gray45","gray65")  the bars get stacked in the
> > opposite order than if I set up the colors like this:  y$color =
> > c("gray65","gray45","gray35","gray1").  How can I control the order of the
> > bars independent of the name of the colors?   Thanks so much in advance!
> > Really neat package you've made.
> >
> > FYI, my plot command now looks like this:
> >
> > p = ggplot(y, aes(x=locus, y=Freq, fill=color))
> > p = p + geom_bar(position="fill")
> > p = p + scale_fill_identity(labels=levels(y$Fnd), grob="tile", name="Fnd
> > Results")
> > p = p + coord_flip()
> >
> > And the data table is similar as before:
> >
> > > y
> >       Fnd locus        Freq  color
> > 1  signeg  DPB1 0.013071895  gray1
> > 2     neg  DPB1 0.581699346 gray35
> > 3     pos  DPB1 0.379084967 gray45
> > 4  sigpos  DPB1 0.026143791 gray65
> > 5  signeg  DPA1 0.068181818  gray1
> > 6     neg  DPA1 0.659090909 gray35
> > 7     pos  DPA1 0.250000000 gray45
> > 8  sigpos  DPA1 0.022727273 gray65
> >
> >
> >
> > hadley wrote:
> > >
> > > Hi Owen,
> > >
> > > The identity scale won't create a legend, unless you tell it what
> > > labels it should use - there's an example at
> > > http://had.co.nz/ggplot2/scale_identity.html.  Otherwise, if you have
> > > a continuous scale and you want something that works in black and
> > > white, p + scale_fill_gradient(low="white", high="black") might be
> > > easier.
> > >
> > > Hadley
> > >
> > >
> > >>
> > >> > y$color = factor(y$Fnd)
> > >> > y$color = c("black","darkgray","lightgray","white")
> > >> > y
> > >>       Fnd locus        Freq color
> > >> 1  signeg     A 0.087248322     black
> > >> 2     neg     A 0.711409396  darkgray
> > >> 3     pos     A 0.201342282 lightgray
> > >> 4  sigpos     A 0.000000000     white
> > >> 5  signeg     C 0.320754717     black
> > >> 6     neg     C 0.603773585  darkgray
> > >> 7     pos     C 0.075471698 lightgray
> > >> 8  sigpos     C 0.000000000     white
> > >> 9  signeg     B 0.157534247     black
> > >> 10    neg     B 0.732876712  darkgray
> > >> 11    pos     B 0.109589041 lightgray
> > >> 12 sigpos     B 0.000000000     white
> > >>
> > >> > p = ggplot(y, aes(x=locus, y=Freq, fill=color)) +
> > >> > geom_bar(position="fill") + scale_fill_identity()
> > >> > p
> > >>
> > >>
> > >>
> > >>
> > >> hadley wrote:
> > >> >
> > >> >
> > >> > Hi Dieter,
> > >> >
> > >> > You can do this with ggplot2 (http://had.co.nz/ggplot2) as follows:
> > >> >
> > >> > library(ggplot2)
> > >> >
> > >> > barley1 <- subset(barley, site=="Grand Rapids" & variety %in%
> > >> > c("Velvet","Peatland"))
> > >> > barley1[] <- lapply(barley1, "[", drop=TRUE)
> > >> >
> > >> > qplot(variety, yield, data=barley1, geom="bar", stat="identity",
> > >> > fill=factor(year))
> > >> >
> > >> > barley1$fill <- c("red","green","blue","gray")
> > >> > qplot(variety, yield, data=barley1, geom="bar", stat="identity",
> > >> > fill=fill) + scale_fill_identity()
> > >> >
> > >> > See http://had.co.nz/ggplot2/scale_identity.html and
> > >> > http://had.co.nz/ggplot2/position_stack.html for more details.
> > >> >
> > >> > Hadley
> > >> >
> > >> >
> > >>
> > >>
> > >> --
> > >> View this message in context:
> > >> http://www.nabble.com/Stacked-barchart-color-tf3909162.html#a11149419
> > >> Sent from the R help mailing list archive at Nabble.com.
> > >>
> > >>
> > >> ______________________________________________
> > >> R-help en stat.math.ethz.ch mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >> http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.
> > >>
> > >
> > > ______________________________________________
> > > R-help en stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
> >
> >
> > --
> > View this message in context: http://www.nabble.com/Stacked-barchart-color-tf3909162.html#a11182581
> >
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > ______________________________________________
> > R-help en stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> ------------------------------
>
> Message: 40
> Date: Fri, 22 Jun 2007 22:07:37 +0200
> From: "hadley wickham" <h.wickham en gmail.com>
> Subject: Re: [R] Visualize quartiles of plot line
> To: "Arne Brutschy" <abr-r-project en xylon.de>
> Cc: R-help en stat.math.ethz.ch
> Message-ID:
>        <f8e6ff050706221307s72ebc36v31537365bc7ff667 en mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> On 6/17/07, Arne Brutschy <abr-r-project en xylon.de> wrote:
> > Hi,
> >
> > thanks for your tips - all of them worked. After a bit of fiddling, I
> > managed to get what I wanted.
>
> Glad to hear it.
>
> > hadley wickham wrote:
> > h> You might want to read the introductory chapters in the ggplot book,
> > h> available from http://had.co.nz/ggplot2, which will give you more of a
> > h> background.  Please let me know places where you think the
> > h> documentation is inconsistent so I can try and make them better.
> > I already did. :) A general problem: the examples are nice and easy to
> > get, but it's often hard to apply them to my own specific problem.
> > It's more a problem of the general structure: what has to go where.
> > Most of the methods are using qplot, but what do I have to do if I'm
> > trying create a more complex plot. Hmm, it's hard to describe.
> >
> > Example: I know how I set the title when using qplot (qplot(....
> > main="asdf"). Where do I have to put it when I'm using gplot? Stuff
> > like this is unclear...
>
> p <- ggplot(...) + ...
> p$title <- "Title goes here"
>
> It is currently hard to figure this out in the current documentation though.
>
> > A more general problem is, that the manual pages are very, eh,
> > minimalistic documented. The overall reference page is good and nicely
> > structured. But the big idea is sort of missing. All components are
> > linked, but the basics like layout, ggplot, aes etc are harder to find
> > - and their help pages are the shortest. Especially the small details
> > are hard to figure out. Lists of attributes etc..
>
> Yes, that's definitely something I'm working on for the book.
> Unfortunately, I don't have
>  that much time and it is a lot of work.  Every comment helps though.
>
> > Hmm, I know this is not really helpful. I can't describe my problems
> > properly, I guess. Perhaps the documentation simply has to improve
> > based on users questions. :\
> >
> > How old is this package? I think it's really, really great, but are
> > there many users? Is there an additional mailinglist or forum where I
> > can get more information?
>
> It's pretty young still, although the precursor ggplot package has
> been around for about a year.  I really have no idea how many users
> there are.  For questions, either email me or R-help.
>
> > Some more questions:
> >
> > Why doesn't ggplot2 work with layout()? I'm using viewport now, which
> > works fine for me, but there should be a note in the docs perhaps.
>
> Because it works with the grid drawing package - see the last chapter
> in the ggplot book for some details on how to use the grid
> equivalents.
>
> > How do I change the legend. The auto-creation of it might be nice,
> > but I want a) to add a title b) change the order to ascending and c)
> > add a short description like:
> >
> >   DeltaConfig
> >   [ ] 0 best
> >   [ ]
> >   [ ] 5
> >   [ ]
> >   [ ]10 worst
> >
> > I don't know if this is possible, but it would be nice to explain what
> > the colors/values might mean if it's not clear from the beginning
> > (ligke diamonds.size). The only thing I found was the attribute
> > legend.justifcation in ggopt, which isn't fully documented.
>
> The legends aren't very customisable at the moment - look at the
> examples for the scale functions to see what you can do.  You can see
> the name of the title easily, and you can change the labels by
> changing the level of the factors, or setting the breaks argument.  I
> agree there could be more options.  If you could provide me with a
> picture of what you want, I'll add it to my to do list to think about.
>
> > Additionally, how can I change the order of the facets? I currently
> > have a plot with a smoother for each model (all in the same plot),
> > which sorts the models like this: dyn,dl4,dl3 Below that, I have a
> > facet with point-plots for each model which sorts them the other way
> > round, which is a bit confusing.
>
> Again, change the order of the underlying factor.
>
> > BTW, what's the "strip" and the associated attributes?
>
> The strip is the labelled associated with the facet.
>
> > Again, I think this package is great - nice work! All the above isn't
> > meant as general critisism, but is being said in order to improve the
> > documentation..
>
> I do appreciate your comments and they definitely help me to make a
> better product.
>
> Thanks,
>
> Hadley
>
>
>
> ------------------------------
>
> Message: 41
> Date: Fri, 22 Jun 2007 16:14:52 -0400
> From: "Patrick Ayscue" <payscue en gmail.com>
> Subject: [R] "heatmap" color still a spectrum for binary outcomes?
> To: r-help en stat.math.ethz.ch
> Message-ID:
>        <ea3a49790706221314n270f8036i72fb4356c579303 en mail.gmail.com>
> Content-Type: text/plain
>
> I have a matrix of a time series binary response variable for around 200
> individuals I would like to display.  I am approaching success using the
> "heatmap" function in the "stats" package without dendorgrams, however, am
> running into trouble in that the colors get lighter with more positive
> outcomes in a column (time point).  Is there a way to make the colors
> uniform irrespective of the number of similar values in the column? or this
> part of the heatmap function?
>
> Other suggestions for representing the data graphically are certainly
> welcome as well.
>
> Thanks,
> Patrick
>
>        [[alternative HTML version deleted]]
>
>
>
> ------------------------------
>
> Message: 42
> Date: Fri, 22 Jun 2007 14:19:43 -0600
> From: "Spilak,Jacqueline [Edm]" <Jacqueline.Spilak en EC.gc.ca>
> Subject: [R] Barchart legend position
> To: <r-help en stat.math.ethz.ch>
> Message-ID:
>        <4A6AB38B55B49C44A22E021A83CBEDDB015EB9A1 en sr-pnr-exch3.prairie.int.ec.gc.ca>
>
> Content-Type: text/plain
>
> I am using barchart to make charts for some data with a lot more
> functions and labels and such in the command.
>
> barchart(Freq ~ factor(HH), data = dataset1, group= year)
>
> So I have my data grouped by year and I get a legend at the top of
> graph, which is great cause I need the legend for the different years
> but it is a weird spot.  So how can I manipulate the legend, ie. Move
> it, shrink it, do anything with it. I have searched the help archives
> and found nothing, and I have looked up the legend section in ?barchart
> but that has not helped or I am doing something wrong.  Any help is
> greatly appreciated.
> Jacquie
>
>        [[alternative HTML version deleted]]
>
>
>
> ------------------------------
>
> Message: 43
> Date: Fri, 22 Jun 2007 17:25:14 +0100
> From: Michael Hoffman <b3i4old02 en sneakemail.com>
> Subject: [R] Lattice: hiding only some strips
> To: r-help en stat.math.ethz.ch
> Message-ID: <f5gt5i$f6k$1 en sea.gmane.org>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> I am using R 2.4.0 and lattice to produce some xyplots conditioned on a
> factor and a shingle. The shingle merely chops up the data along the
> x-axis, so it is easy to identify which part of the shingle a panel is
> in by looking at the x-axis markings. I only want to have a strip at the
> top for the factor.
>
> Is this possible? I looked into calculateGridLayout() and it seems to me
> that there isn't an easy way to do it without rewriting that function
> (and others).
>
> Many thanks
> --
> Michael Hoffman
>
>
>
> ------------------------------
>
> Message: 44
> Date: Fri, 22 Jun 2007 16:42:13 -0400
> From: S?bastien <pomchip en free.fr>
> Subject: Re: [R] Overlaying lattice graphs (continued)
> To: hadley wickham <h.wickham en gmail.com>
> Cc: R-help <r-help en stat.math.ethz.ch>
> Message-ID: <467C3425.1030000 en free.fr>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hadley,
>
> I have some troubles to run your code with ggplot version 0.4.1. Is the
> package ggplot2 mandatory ?
>
> Sebastien
>
> hadley wickham a ?crit :
> > Hi Sebastian,
> >
> > I think the following does what you want:
> >
> > library(ggplot2)
> > names(mydata) <- tolower(names(mydata))
> >
> > obs <- rename(subset(mydata, model=="A", -predicted), c("observed" =
> > "value"))
> > obs$model <- factor("observed")
> > pred <- rename(mydata[, -5], c("predicted" = "value")...
>
> [Mensaje recortado]



More information about the R-help mailing list