analogue Change Log See the git commit log for a full Changelog: https://github.com/gavinsimpson/analogue/commits/master From 0.17-0 this Changelog is provided for archival purposes only. See the git log and file `NEWS` from now on for changes to the package. Version 0.17-0 * Released to CRAN. Version 0.16-4 * Specifically import all non-base functions into the package NAMESPACE. Version 0.16-3 * Update reference material containing new version of analogue Version 0.16-2 * Incorrect link to GPL Reported by Brian Ripley Version 0.16-1 * Fix testthat usage. Reported by Hadley Wickham * Stratiplot: now handles a matrix as argument `x` via an S#3 method for class `"matrix"`. * tran: add a `"none"` transformation which just returns its input. This is useful if you are writing code and comparing transformations and need to temporarily turn them off. Version 0.16-0 * Released to CRAN 18th November 2014 Version 0.15-0 * tran: now has `method = "colCenter"` for centering by columns. Also gains: `method = "log1p"` for accurate computation of log(1 + x) where |x| << 1 using `log1p()`. `method = "expm1"` for accurate computation of exp(x) - 1 where |x| << 1 using `expm1()`. * plot.sppResponse: now documented. * Stratiplot: new argument `yticks` allows user-specified tick locations on the y axis (age/depth). * Plot3d, plot3d.prcurve: These functions are now removed from analogue. `plot3d.prcurve` is available in the new analogueExtra package. Version 0.14-0 * Released to CRAN 28th August 2014 Version 0.13-6 * Plot3d: new name for the previous `plot3d.prcurve()` method. Renamed to avoid having to export a method from the namespace, which allows **rgl** to be relegated to Suggests. * plot3d.prcurve: deprecated this S3 method and removed **rgl** from the package dependencies. **rgl** is now in Suggests, which means it is not needed to install the package. This seems to be causing some problems for Mac OS users. * evenSample: new function returns the number of samples per gradient segment. Has a `plot()` method. * Pollen, Biome, Location, Climate: data sets from the North American Modern Pollen Database updated to version 1.7.3. * Imports: lattice moved from Depends to Imports. analogue now exports the generics densityplot, dotplot, and histogram imported from the lattice namespace. Version 0.13-5 * n2: new utility function to compute Hill's N2 for sites or species. * optima: added bootstrap estimates of species WA optima. * roc: fix a bug in the computation of AUC from the U statistic. Version 0.13-4 13 Feb 2014 * crossval.pcr: Fixed a number of bugs in the method for PCR related to k-fold cross validation, which were causing errors. Fix the verbose printing of the progress bar, which would reset between repeats in k-fold CV. * predict.pcr: would set argument `ncomp` incorrectly (in the wrong form) if not supplied. * performance.crossval: A new method for objects of class `"crossval"`. * ChiSquare: Wasn't returning the list it created including transformation parameters. Version 0.13-3 Opened 11 Feb 2014 * timetrack: A number of additions added and improvements made: o New `predict()` method allows additional passive points to be located in the timetrack space. o New `points()` method to allow drawing of points for training or passive samples on an existing plot. o The `plot()` method can now suppress plotting of all points, for a clean canvas with axes/labelling ready to accept additional plotting function calls. These changes were made following a query by Andrew Medeiros. Version 0.13-2 Opened 1 Jan 2014 * prcurve: uses `dev.hold()` & `dev.flush()` to smooth graphics flicker during fitting with `plotit = TRUE`, Version 0.13-1 Opened 24 December 2013 * plot3d.prcuve: was not using the `data` & `ordination` objects stored within the fitted `prcurve` object. Version 0.13-0 Development branch opened 16 December 2013 * predict.pcr: internal function was calling `fitPCR()` via `analogue:::fitPCR()`, which is not required nor was it intended. Reported by Brian Ripley. * predict.prcurve: new function to predict locations on the curve for new observations on the same set of variables. Useful for adding passive species. * fitted.prcurve: new function to return the fitted locations on the principal curve or the fitted values of the response. Version 0.12-0 * Released to CRAN December 13th 2013 Version 0.11-99 * Preparing for the 0.12-0 release. * NEWS: analogue now has an `inst/NEWS.Rd` file highlight the major changes in the upcoming 0.12-0 release. Version 0.11-6 * distance3: removed - redundant attempt to improve `distance()`. * distance: `newDistance()` -> `distance()` `distance()` -> `oldDistance()` This implements the change suggested in 0.11-5. `distance()` is now using the compiled C versions of the dissimilarity code. * oldDistance: (was `distance()`) fixed a bug in the x-only case where `method = "kendall"`. * Vignette: Updated some details regarding C versions of dissimilarity coefs. * performance: a tweak to the print method to zap values that are effectively 0. Only affects vectors of performance statistics not data frames of stats. * predict.pcr: Apply transformation function and perform predictions for LOO, n k-fold, and bootstrap predictions * crossval.pcr: leave-one-out CV was incorrectly averaging over components. Now does bootstrap and n k-fold CV. Version 0.11-5 * newDistance: (yet another) new distance() replacement to interface with fast C versions of the dissimilarity code. This one *will* replace `distance()` in version 0.12-0, where upon it will be renamed to `distance()` and the current `distance()` will be renamed to `oldDistance()`. * Unit tests: started adding unit tests using the **testthat** package. Hence **analogue** now `Suggests: testthat`. Version 0.11-4 * logitreg: fitting is now possible using Firth's bias reduction technique via the brglm package. Changes to the labels in the output of the `summary` method. * plot3d.prcurve: dynamic 3D plot of the data in PCA space with the fitted principal curve superimposed. * smoothGAM: new smooth function plugin for `prcurve`. Allows fitting principal curves via individual GAM models using `mgcv::gam` as the engine. The main advantage is that data sets with non-Gaussian errors can now be handled more appropriately, such as handling count data correctly. `smoothGAM` is much slower than `smoothSpline` currently, although there is potential for speeding this up via pre-computing some of the GAM terms. As an example, the Abernethy Forest example in `?prcurve` takes ~10 seconds on my 2-year old laptop. * prcurve: improvements to the printed output during fitting (i.e. with `trace = TRUE`) displays a progress bar during initial estimates of smooth complexities. Residual variance printed to fewer decimal places. Now returns components `ordination` and `data`, the PCA ordination (resulting from `rda()`) and the original data used to fit the curve, respectively. This simplifies the `plot` and `lines` methods for example. * residuals.prcurve: new `residuals` method for principal curves extracting or computing various types of residual for a fitted curve. * plot.prcurve, lines.prcurve: much improved following the addition of new components in the object returned by `prcurve`. No longer need to supply the original data used to fit the curve. The scaling to use for the plot can now be specified via new argument `scaling`. This makes the `lines` method more broadly useful. * chooseTaxa: would drop empty dimensions if conditions resulted in just a single taxon being selected. Reported by Michael Burstert. The warning about `NA`s was also being issued even if `na.rm = TRUE` was used. Now fixed. * Streamlined some of the documentation to avoid runnning the same code many times. * plot.sppResponse: accepts a logical vector for argument `which`. * wa: now warns if species with no information are removed from the analysis, which proceeds as it always has. Version 0.11-3 * chooseTaxa: new argument `value` controls whether the data for the selected species or a logical vector indicating which columns (species) met the selection criteria. New argument `na.rm = FALSE` to control whether or not `NA`s are excluded from the calculation of abundance and occurrence. Suggested by Michael Burstert. * scores.prcurve: now preserves the rownames on the `lambda` component. * sppResponse: new generic function for species responses along gradients. A method is provided for objects of class "prcurve" is the only available method currently. A `plot()` method for `sppResponse()` objects is also provided. * rankDC: new function to compute the rank correlation between gradient distances (e.g. environmental variables) and distances in species composition. Has both base and Lattice graphics plot methods (the latter via `dotplot()`). * Stratiplot: new arguments `labelAt` and `labelRot` allow the placement and rotation variable labels to be controlled, when not using a strip. Version 0.11-2 * timetrack: plot method now allows plotting of "lc" or "wa" site scores for the base ordination. The latter is the default to maintain backwards compatability. The `formula` argument was not well implemented; using it would mean that `X`, the main species data, would not be transformed, and you couldn't use direct variables as these would not be found. Now `formula` takes a one-sided formula describing the constraints. Variables will be looked up inside the object passed to `env`. As such, `env` needs to be a data frame or an object accepted as the `data` argument in `model.frame()`. The `fitted` method has changed slightly. The `type` argument has been renamed `which`. * scores: new methods for objects of class "timetrack" and class "prcurve". * analog: gains a method for objects of class "distance". * distance: gains a new attribute ('type') which contains an indicator of whether the distance matrix is symmetric (computed on a single matrix) or asymmetric (dissimilarities between samples of two matrices). * distanceX: experimental replacement for distance() which uses fast C code for computing dissimilarities via a .Call interface based on base::dist(). Currently only the single matrix code has an R interface and `method = "mixed"` is not hooked up at the moment. It is intended that the next version of analogue (0.12-x) will have this version of the code replace the old mainly R-based distance(). * predict.wa: deshrinking step sometimes produced a 1 column matrix, which would result in an error. This empty dimension is now dropped. Version 0.11-1 * scores.prcurve: new function to extract "axis" scores for samples on the fitted principal curve. * lines.prcurve: new lower-level plotting function allows a fitted principal curve to be added to an existing PCA plot. * prcurve, smoothSpline: `prcurve` now returns a component `smooths`, a list containing the fitted smoothers, one per variable. As a result `smoothSpline` now also returns the fitted `smooth.spline` model. * gradientDist: the "prcurve" method was ordering the samples such that they were smooth. No need for the `order` argument now either. * Namespace: simplified the import from lattice. Added more imports from grid as this package no-longer in Dependencies. * Dependencies: grid moved to Imports: * distx.c: Uninitialised variable use reported on MacOS X under the clang compiler. Reported by Brian Ripley. * Maintainer: Updated the maintainer email address. Version 0.11-0 * timetrack: fitted method gains argument `choices` with default `1:2` for extracting the ordinary or passive samples scores on `choices` axes. `plot` method can now draw the passive samples as a line, points or both. The `plot` method also gains an argument `order` that can be used to reorder the passive samples into correct temporal ordering. * Dependencies: MASS and mgcv moved from Depends: to Imports: reflecting the relatively light use of functions from these packages in analogue. The functions needed are imported in analogue's NAMESPACE file. * Tests: code in wa() generates slightly different results under R 3.0.0 (to be). Differences are in the 8th or 9th decimal place so irrelevant, but I was using the reference output for the examples was checking to that level of precision. For tests only, the Example in ?wa uses options(digits = 5). Version 0.10-0 * Release: Version 0.9-11 plus a minor documentation fix was released to CRAN 2 Jan 2013. Version 0.9-11 * Was loading compiled code both via the package namespace and a .onAttach() call. The latter was removed. Reported by Kurt Hornik. Version 0.9-10 * stdError: Several changes and enhancements: Calculation assumed weights summed to 1. New formula as described in Simpson (2012) is now used. (Reported by Steve Juggins) Now have a choice whether to use the weighted SD or not. For predictions based on the mean of the k-closest analogues it would be odd to then use a weighted SD to compute the standard error. Gained a print method. * caterpillarPlot: can now be called by the shorter name caterpillar(). * plot.logitreg: bug would cause plotting to fail if you plotted a single model. Version 0.9-9 * prcurve: added a print method. * Stratiplot: y-axis padded by 4% of range as per base graphics default behaviour. * plot.wa: don't rerun example(wa) just to plot. * caterpillarPlot: tolerance ranges for taxa drawn with lwd = 2. Version 0.9-8 * panel.Stratiplot: `type = "h"` now drawn using `lwd = 3` and with `lineend = "butt"`. The line width can be controled by new argument `lwd.h`. Version 0.9-7 * Stratiplot: In 0.9-5 a change was made to stop stripping NAs when using the formula interface. This means a way of dealing with NAs in the default plotting method is required especially when plotting using lines or poylgons, as Lattice will honour the NA and draw lines and polygons with gaps. Stratiplot.default gains argument `na.action` which defaults to `"na.omit"`. Note this is different to the formula method where we *want* NAs to propogate through to the default method for plotting. This change allows for datasets collected on different sediment intervals from the same core to be combined in a single diagram. Version 0.9-6 * caterpillarPlot: now no longer draws a box round the plot. Bug fixed so that the data.frame method correctly identies the `env` argument to label the plot with. Updated help file. Version 0.9-5 * Stratiplot: the formula interface was stripping rows with NAs. This wasn't intended but was the result of not implementing all of the standard non-standard evaluation idiom used by many of R's modelling functions. This is now fixed and the default for the `na.action` argument has been changed to `"na.pass"`. The default `Stratiplot()` method was working as expected. * Vignette: typo fixed (reported by Marta Rufio). Version 0.9-4 * logitreg: the returned object has changed. The list of logistic regression models is now returned as component `models`. New methods `fitted()` and `predict()` provided for objects of class "logitreg" compute the fitted probabilities for the training set samples and for new (e.g. fossil) samples respectively. The probabilities are in respect to the analogue- ness of samples to the groups in the training set (e.g. vegetation biomes in the case of pollen data). These changes allow an analysis similar in spirit to that of Gavin et al (2003, Quaternary Research 60; 356--367) in their Figure 8. Here though logistic regression fits are used rather than the ROC method they employ. Similar methods could be provided for objects fitted by `roc()` but require a little more thought about how to model the likelihood ratios derived from the ROC curves. Version 0.9-3 * wa: deshrinking via a monotonic cubic regression spline is now available via `deshrink = "monotonic"`. This uses functions from the *mgcv* package of Simon Wood and as a result, *analogue* now Depends on that package too. The exact nature of the dependency may change before 0.10 is released. This idea goes back to ter Braak & Juggins (1993; Hydrobiologia *269/270*, 485--502) and Marchetto (1994; Journal of Paleolimnology *12*, 155--162), but the implementation here uses monotonic constraints after Wood (1994; SIAM Journal on Scientific Computing *15*(5), 1126--1133 and follows Steve Juggins' implementation borrowing code from `?pcls` in *mgcv*. * predict.wa: example was enclosed in \dontrun{} without reason. This example is now run. Version 0.9-2 * wa: small tolerances can now be replaced by the mean tolerance of the set of tolerances that are not small. * splitSample: several bug fixes and sanity checks. Version 0.9-1 * splitSample: new function to sample a test set from across an environmental gradient by breaking gradient into a series of chunks and sampling approximately equally from within each chunk. Version 0.9-0 * caterpillarPlot: new function that draws a caterpillar plot of species WA optima and tolerances. Methods for data frames and wa() fits are available alongside the default method. Version 0.8-2 * Dependencies: analogue now requires R >= 2.15.0 * Replaced remaining instances of `.Internal`; now use `.colSums` and `.rowSums` from R 2.15.0 * Deleted jss.bst from inst/doc * Vignettes: these are now in vignettes not inst/docs Version 0.8-1 * cma: if cutoff meant that all analogues returned for all sites, code would return an array instead of the usual list. This is now fixed in all methods. * Replaced instances of `.Internal(sample(....))` with `sample.int(....)` at request of Brian Ripley. Version 0.8-0 * Updated Example test checks and packaged for release to CRAN Jan 11, 2012. Version 0.7-7 * mat: new argument `kmax` can be used to limit the number of analogues considered as models when fitting MAT transfer functions. By default, `mat()` considers models with 1 through to n-1 analogues (n = number of sites). `kmax` can control this upper limit which will speed up fitting models, especially for large training sets. Invariably one wouldn't want to average over entire training sets to produce predictions, or even over large numbers of analogues. As such I may set an upper limit for the default value of `kmax` before this is released to CRAN. * cumWmean, cummean: as a result of the above addition of `kmax`, these two functions now take a `kmax` argument also. The default behaviour is unchanged however. * chooseTaxa: `type = "OR"` was not working due to a typo. It returned the same as `type = "AND"`. Version 0.7-6 * Stratiplot: Handling of absolute data types was broken. Fix applied that should allow this to work if there are only absolute scale variables or a mix or relative and absolute data. All reletaive data should be unaffected. * panel.Stratiplot: gains arguments `gridh` and gridv` which control the number of horizontal and vertical grid lines used on each panel. These correspond to the `h` and `v` arguments of `panel.grid` in the Lattice package. The default is `-1` for both, which attempts to align the grid lines with the tick marks. Version 0.7-5 * weightedCor: implements one of the tests from Telford & Birks (2011, QSR) based on the weighted correlation of WA optima and constrained ordination species scores. Has a plot method. * rdaFit: Non-user (currently) function that implements RDA without all of the overhead of vegan::rda. As such it doesn't compute PCA axes and does not return all the components described by ?cca.object in package vegan. This function is used principally in weightedCor(). Has a scores() method. rdaFit() is not documented as the exact details of the function and its capabilities remain to be determined. Version 0.7-4 * gradientDist: new function to extract locations along an ordination axis. Methods for prcurve() and cca(). * varExpl: new function to extract the amount of variance explained by ordination axes. Currently methods for prcurve() and cca() are available. * Namespace: analogue now has an explicit name space in preparation for R 2.14.0-to-be. Hence analogue now depends on Vegan >= 1.17-12. Version 0.7-3 * pcr: coef(), fitted(), residuals(), eigenvals(), performance(), and screeplot() methods added. Version 0.7-2 * pcr: new function pcr() performs principal components regression. Designed to allow transformations in the spirit of Legendre & Gallagher (2001) that allow PCA to be usefully applied to species data. Version 0.7-1 * crossval: new function to perform leave-one-out, k-fold, n k-fold, and bootstrap cross-validation on transfer function models. A method for wa() models is provided. * tests: package now has a test that the examples continue to return correct output. Version 0.7-0 * timetrack: new function to passively project sediment core samples within an ordination of training or reference set samples. Both unconstrained and constrained ordinations are supported using the Vegan package. 'fitted' and 'plot' methods are available. * prcurve: new function to fit principal curves to sediment core samples. A 'plot' method is also provided. The function uses functionality from the princurve package, which is now a dependency. Several support functions are also provided; 'smoothSpline' is a wrapper to 'smooth.spline' for fitting splines to individual species in order to fit the principal curve. 'initCurve' implements several methods for initialising the principal curve. * Stratiplot: if 'zones' are supplied, a legend on the right-hand side of the diagram can be drawn by setting argument 'drawLegend' to TRUE (the default). Currently, only simple blocks that demarcate the zone boundaries are drawn and labelled using argument 'zoneNames'. First attempt to allow both relative (percentages or proportions) and absolute variables, or mixtures thereof, in a single plot. The user is free to specify which variables should be treated as relative or absolute, and variables marked as absolute will be drawn with fixed-width panels, the size of which can be controlled via argument 'absoluteSize' (default is 0.5 * largest panel width). Consider this functionality unstable at the moment. * residLen: was not 'join'-ing the training set and passive data correctly and would fail if species were found in one but not the other data set. * tran: improvements to the underlying code. * distance: resilience to NA in "gower", "alt.gower", "mixed". * cma: added methods for 'mat' and 'predict.mat' objects. These allow you to retrieve the k-closest analogues for training set and prediction data respectively. * dissimilarities: new method for 'mat' objects. * datasets: package datasets have been resaved with optimal compression determined via resaveRdaFiles(). This has reduced the package tarball size considerably. As a result, however, analogue now requires R version 2.10.0 or later. * predict.wa: bug in bootstrap and k-fold CV methods when tolerance down-weighting was used. * fixUpTol: erroneous error criterion would cause CV of WA models with tolerance down-weighting to stop with an error. * waFit: new function that encapsulates the main WA computations. This is currently used by wa() and with the intention of being used in all functions that computed WA transfer function models. * Examples: Streamlined some further examples to use Imbrie & Kipp data set, and to not re-run the same code again. Improves package check times by a second or two on my PC. Version 0.6-26 * abernethy: New data set containing the classic Abernethy Forest data of Birks and Mathewes (1978) * Stratiplot: Preserves the names component as far as is possible, even to the extent of processing the names after the manipulations arising from the formula interface. Bug in padding of the y-axis now fixed; default is to add 1% of the range y-axis to the y-axis limits specified. Bug in computing length of variable labels when 'strip = FALSE' now fixed. * panel.Stratiplot: Add capability to draw zones on stratigraphic plots via new argument 'zones' which takes the numeric levels of the zone boundaries on the scale of the plot y-axis. How the zone markers are drawn can be controlled via several graphical parameters. See ?panel.Stratiplot. * chooseTaxa: Explicitly preserves row and column names. * DESCRIPTION: prematurely added princurve as a dependency in previous version. Version 0.6-25 * chooseTaxa: new function to select species on basis of number of occurrences and maximum abundance. Function is an S3 generic with a default method. Version 0.6-24 * Dependencies: package now depends on package 'grid'. * Stratiplot: gains ability to draw variable labels above the plot panels so that the plots conform to common standards. If you prefer the 'strips' of Lattice plots, set 'strip = TRUE' to get the old behaviour. Stratiplot was fixinging the min(ylim) value at 0 and contained redundant calls to set the y-axis limits. The behaviour has been rationalised and a new 'ylim' argument added. The default behaviour uses the range of the y-data for 'ylim'. * panel.Stratiplot: fix warning messages (from Grid) due to inappropriate colour specification. Reference lines in Stratiplot now plot correctly again. * plot.roc: was resetting the plotting region at the end of plotting even when there was no need to do so. * residuals: Residuals were defined as \hat{x}_i - x_i to match fitted vs. observed scatterplots. Definition of residuals in wa() and related functions has been changed to the more common definition of x_i - \hat{x}. Reported by Andreas Plank and Steve Juggins. * plot.wa: Following changed definition of residuals, plot.wa() now plots observed values on the y-axis and fitted values on the x-axis for 'which = 1'. * summary.predict.mat: print method was incorrectly extracting the model estimates for training set samples. * predict.wa: fix minor bug with CV when tolerance DW was used. * Package: reduced package check time in examples, by using the Imbrie & Kipp data. Version 0.6-23 * tran: 'rootroot' transformation was same as 'cuberoot' from changes made in r140. Now fixed. * wa: 'formula' method was not passing tolerance-related arguments to default method. As such, the newer code to handle small tolerance values was not being invoked when using the formula interface to wa(). Code was also tidied a bit. * fixUpTol: was inappropriately matching one of it's arguments. * roc: In some circumstances, the generation of the points at which the ROC curve was evaluated resulted in more points than the other statistics. Fixed to use the points established by 'table', used to generate these other statistics. This affected the plot method. * plot.roc: Allow user to specify the line types used to draw the plots. Two line types can be specified for plots comparing analogue with non-analogue statistics. New argument 'abline.lty', which defaults to "dashed", controls plotting of the optimal ROC dissimilarity threshold. * plot.minDC: argument 'lty.quantile' was not being used by the graphical function that drew the quantiles of the pairwise D[ij]s. * plot.bayesF: New argument 'abline.lty', which defaults to "dashed", controls plotting of the optimal ROC dissimilarity threshold. Version 0.6-22 * tran.formula: bug in my implementation of the standard non-standard evaluation technique used within the formula method for 'tran'. Diagnosis and fix by Prof. Brian Ripley. Version 0.6-21 * distx.c, distxy.c: Warning due to non ISO C-compliant 'mistake' in experimental code for Gower's Mixed coefficient. Version 0.6-20 * Stratiplot: was not respecting the sort variable under certain conditions. * panel.Stratiplot: typo in Rd file was causing warnings in R version 2.10.0-beta. Version 0.6-19 * join: gains ability to return the inner join of the supplied data frames. This is the intersection of the set of variables in the supplied data frames, the set of variables common to the supplied data frames. Version 0.6-18 * join: new arguments 'type' and 'value'. 'type' controls which join is performed. Options are (currently) "outer" (default) and "left". The left join is used to prepare two or more data sets for ordinating the first and subsequently passively projecting the other data sets into this ordination. The outer join is used to prepare data for transfer functions such as MAT and WA. 'value' allows the user to supply a numeric value to be used to replace 'NA's. Version 0.6-17 * predict.wa: deshrinking method was not being honoured when expanding predictions. * Stratiplot: gains option ('rev') to reverse the y-axis limits. Can now also sort/order the columns of 'x' (the species or variables) as weighted averages of 'y' (to emphasise the change in composition along 'y'), or using a supplied variable. The latter is useful if you want to sort the variables by their optima with an environmental variable. Now also provides a guess as to the y-axis label if none is supplied. Version 0.6-16 * roc: For large problems the calculation of AUC and its standard error could overflow the largest number R currently handles. roc() now has two new arguments, 'thin' and 'max.len', which allow the number of points on the ROC curve to be thinned to a smaller number, which should allow the computations to be performed. The original problem was reported by Diana Stralberg. Version 0.6-15 * tran: new 'formula' method allows simple selection or exclusion of variables from the set to be 'tran'sformed. Version 0.6-14 * tran: new transformation and standardization methods for the power and 4th root transformation, the log ratio transformation for compositional data, plus row (sample) centring. Version 0.6-13 * predict.wa: Now handles WA with tolerance down-weighting for bootstrap CV and benefits from the changes introduced in previous version. Version 0.6-12 * deshrink, deshrinkPred: New utility functions for deshrinking WA estimates. These replace the '*.deshrink' and 'deshrink.pred' internal functions used to this end to date. This provides a more extensible solution. Version 0.6-11 * tran: now converts input data to matrix using 'data.matrix' which deals with factor variables appropriately. * predict.wa, wa, WATpred: Now use faster C code for computing predictions from WA models with tolerance down-weighting. Version 0.6-10 * predict.wa: Predictions with tolerance DW now works for CV = "LOO" * wa.formula: Argument list updated to match wa.default. * fixUpTol: New internal utility function the encapsulates code to modify working tolerances within WA model fitting. * w.tol: Internal function w.tol now uses a faster C version of the code. Version 0.6-9 (Closed Sun 7 June 2009) * predict.wa: Predictions without CV can now be made for WA models fitted using tolerance DW. * wa: Now returns the options for tweaking tolerances as part of model object in component 'options.tol', which is a names list. * Utility functions: WApred() and WATpred() internal functions for predictions using WA or WA with tolerance DW. Version 0.6-8 (Closed Mon 5 May 2009) * residLen: new function to compute squared residual length diagnostic for passive samples in a constrained ordination. Used as a test of whether core samples are well fitted in a transfer function model. Several utility functions to compute fitted values from an ordination and corresponding residual lengths are provided, which me be useful for authors of other functions. 'plot' and 'hist' methods produce density plots and histograms for 'residLen' objects using base graphics. 'densityplot' and 'histogram' methods for 'residLen' objects using Lattice graphics. * stdError: new function to compute the weighted standard deviation of the environment values for the k closest analogues in MAT models. This can be used as an uncertainty measure for MAT fitted values or transfer function predictions. Methods are available for 'mat' and 'predict.mat'. * predict.mat: now returns the dissimilarity matrix between the training set samples and samples in 'newdata'. * getK: new method for 'predict.mat'. * CITATION: updated as per request from Kurt Hornik and CRAN. Version 0.6-7 (Closed Mon 13 Apr 2009) * optima, tolerance: new methods to coerce objects of these classes to data frames. * distance: method = "kendall" was incorrectly computing the min of the x and y components in the dissimilarity. Version 0.6-6 (Released to CRAN: Wed 25th Feb 2009) * optima, tolerance: New print methods for both functions. Returned objects now have additional attributes. * join: new methods for 'head' and 'tail' to return the first/last few rows from each of the joined data sets. Handles cases where 'split = FALSE' by calling the 'data.frame' method. Version 0.6-5 * wa: now computes tolerances and can perform tolerance downweighting in WA transfer functions. Contains several options to manage working tolerances used in the WA computations, including how to deal with species that have very small (narrow) tolerances. The actual tolerances and working values are returned from wa(). * optima, tolerance: two new user visible functions to compute weighted average optima and tolerance ranges from species abundances and associated environmental data. Version 0.6-4 * Datasets: Version 1.7 of the North American Modern Pollen Database has been added to 'analogue'. The data are contained in four datasets: Pollen, Biome, Climate and Location, containing the pollen counts on 134 taxa, vegetation classification, 32 climatic variables and location (latitude/longitude) respectively on 4833 sampling locations in North America and Greenland. * plot.logitreg: adjusted the correction to the degrees of freedom in the calculation of the confidence intervals. * roc: now returns the observed prior probability that two samples are analogues for each group. Also returns the index of the point along the ROC curve where the slope of the curve is maximal (the point corresponding to the optimal dissimilarity). * bayesF: now returns the posterior probabilities as well as posterior odds of true analogue and true non-analogues for points along the ROC curve. Documentation of the object returned from 'bayesF' has been updated to match the changes introduce in version 0.6-0. * wa: documentation for wa did not state that the 'tol.dw' argument was currently ignored. Tolerance down weighting is not currently implemented in wa and the documentation now states this clearly. Reported by Andreas Plank (R-Forge Bug ID 287). Version 0.6-3 * logitreg: new function to evaluate the probability that two samples are analogues conditional upon the dissimilarity between the two samples. Essentially fits logistic regression models to the data used to produce the statistics drawn on a ROC curve. Methods for 'summary' and 'plot' are currently available. * analog: was converting 'x' and 'y' objects to matrices before calling distance(). This broke handling of factor variables in 'method = "mixed"' with distance(). * distance: objects created by distance() now have an explicit class "distance", and inherit from class "matrix". * roc: component 'statistics' has reordered columns. * plot.roc: superficial changes to ordering of plot components. * Depends: Package now depends on MASS. No longer need dependency on brglm. Version 0.6-2 * Stratiplot: new graphics function for plotting stratigraphic diagrams, with 'default' and 'formula' methods. Uses the Lattice package for plotting. * panel.Stratiplot: lattice panel function for drawing stratigraphic diagrams. * panel.Loess: modified version of standard lattice panel function 'panel.loess' for drawing LOESS smooths on stratigraphic diagrams. * Documentation: fixes and tweaks to several Rd files to fix parse errors caught with the new Rd parser coming in R 2.9.0. Version 0.6-1 * ImbrieKipp: made the training set environment and sediment core data set names easier to manage. The three environmental variables are now in seperate data sets ('SumSST', 'WinSST', and 'Salinity') as named, numeric vectors of the same name as the data sets. * mat: example now uses the ImbrieKipp data resulting in large speed-up. * Requires: package now depends on package 'brglm' for use in modelling probability of analogue or not. For future 'logitReg()' function. Version 0.6-0 * roc: new version of roc, which correctly computes the no-analogue part of the ROC analysis. Now roc returns information on individual grops as well as an overall or combined ROC curve for the data. The number of close analogues to use in computing the ROC curve can now also be specified. These changes have altered bayesF() and the plot methods for bayesF roc. bayesF now computes Bayes factors for all groups as well as for the overall ROC analysis. The plot method for bayesF will now plot the Bayes factors for all groups or for a single, named group. plot.roc has been updated to work with the new roc object, and by default, the plots refer to the overall ROC curve. Which group is plotted is controlled by new argument 'group'. There is now a summary method for roc that displays summary data for the individual ROC curves. * fuse: new function to fuse (combine) two or more dissimilarity objects. * ImbrieKipp: New data sets containing the classic Imbrie and Kipp (1971) training set. * tran: tran was clobbering dimnames. These are now preserved. * .first.lib: package startup now uses packageStartupMessage() to display the startup message. * distance: speed up in calculating range and maximum statistics for those dissimilarity coefficients that incorporate these terms. distance() now also returns the dissimilarity coefficient used as attribute "method" * mat: was converting 'x' to matrix too early, which upset some of the DC methods. mat also now passes arguments in '...' on to distance. This allows additional options required for some dissimilarity coefficients to be provided. * print.mat, summary.mat: quantiles of dissimilarities are now much more efficiently calculated. * plot.mcarlo: now works correctly for both types of plot, and computes ranges so that histogram and density estimates fit into plotting region. Version 0.5-3 * tran: new function to apply common transformations and standardizations applicable to palaeoecolgical data. * predict.wa: added k-fold ("nfold") cross-validation. Version 0.5-2 * wa: classical deshrinking did not work, but returned the original 'env' variable. Currently a bit inelegant implementation. * wa: implemented deshrink = "none" just for comparison and for connoisseurs. * wa: deshrink = "expanded" is now public and user-callable. * join: now checks for inherits(foo, "data.frame") to confirm if all objects to join are (or inherit from) data frames. This allows join to work on objects of class "join" when split = FALSE is used. Version 0.5-1 * New developer: Jari Oksanen has joined the analogue team! * predict.wa: was not returning some attributes of the WA model fitted. This was causing some print and other methods to fail. * expand.deshrink: implemented simple expansion of variances as a deshrinking method a bit like in vegan:::wascores. The function has similar API as other deshrinking functions: takes only WA and obs values as input, and returns expanded scores and two linear coefficients to perform the deshrinking. Slope is given by the expansion ratio and intercept is defined so that the line goes through mean(x), mean(y) point. The vegan function equalizes weighted variances, but this function only uses simple variances: incorporating weights would mean changing call API. At the moment the function is not yet used anywhere, but just sits there waiting for possible use. * wa, mat models: residuals are now calculated as predicted - observed. This reverses the sign from the previous version. There was inconsistency in the way residuals were being calculated in MAT models and help functions. Now resolved. * plot.mat: now plots the absolute value of the average or maximum bias statistics, rather than the actual value. This ensures that the "optimal" model is the one with the lowest value on the plot. * Internal: The way deshrinking was handled internally has been substantially streamlined, via the *.deshrink and deshrink.pred internal functions. Version 0.5-0 * wa: new function wa() with default and formula interfaces for fitting Weighted Averaging transfer function models. plot, fitted, residuals, coef, minDC, performance (see below), predict and bootstrap methods are provided. * performance: new extractor function to retrieve model performance statistics. Currently, methods provided for wa, predict.wa, and bootstrap.wa objects. * reconPlot: new method for predict.wa objects. * RMSEP: new method for bootstrap.wa objects. * Vignette: analogue now has a vignette covering the analogue methods implemented in the package. This is based on the paper Simpson G.L. (2007) Analogue Methods in Palaeoecology: Using the analogue Package. Journal of Statistical Software, 22(2), 1--29. * plot.minDC: Bug in drawing the axis for the quantiles. Version 0.4-4 * Updated the Version: field in DESCRIPTION to meet new standards introduced in R 2.6.0 for licence files. Reported by Kurt Hornik. * join() now returns a object of class "join" or c("join", "data.frame") depending on argument split. * distance() is now generic and has a new method for objects that inherit from class "join".. Version 0.4-3 * distance() would work even if factors in x and y had different levels. This would result in incorrect dissimilarities for method = "mixed". distance() now issues an error if one or more factors have different levels in x and y. Use join() to get correct factors and levels. Reported by Birgit Lemcke. * join() was not correctly merging data frames with factors. Factors were converted to internal values, not levels via sapply(). Now uses data.frame(lapply(...)) to maintain factors intact. * distance() was not setting the row / column names in the case where both x and y were supplied. * distance() was incorrectly trying to set row / column names in the case where a single dissimilarity was being calculated. * Documentation fixes. Version 0.4-2 * New fitted method for bootstrap.map. Returns the bootstrap fitted values for the training set. * getK<- changed to setK<- as this makes much more sense. The extractor function getK remains the same. * Fixed a couple of bugs in residuals.bootstrap.mat and print.residuals.bootstrap.mat that affected how the results were printed. Now does what it was supposed to do. * Fixed minor bug in the code that updated the call in analog. * Added automagical printing of version number on loading of the package. * Numerous documentation tweaks and updates have been applied, which simplify package checking and which provide better documentation of certain comples returned objects. Version 0.4-1 * Fixed silly bug in RMSEP.bootstrap.mat. Version 0.4-0 * Changed the components of returned objects from mat, bootstrap.mat, predict.mat. This has has knock-on effects for several other functions. These have been updated to work with the new objects/components. * Speeded up bootstrap and predict.mat considerably. * Speeded up distance for some coefficients and where 'y' is missing, by using dist() and vegdist() from package 'vegan'. Dependency now on 'vegan'. * k() and k()<- renamed to getK() and getK()<-. * getK.bootstrap.mat is now able to extract the k for the model or the predictions. In either case, the bootstrap or the model k can be selected. See ?getK. * New argument 'split' in join(), defaults to TRUE. join can now unsplit the merged data sets back into individual data frames, though now with common columns (i.e. species). * Bug in cummean() and cumWmean() meant a site could be selected as analogue for itself now fixed. * Bug in mcarlo.mat and mcarlo.analog meant it was not reading the stored dissimilarity method correctly. * maxBias() speeded up through use of tapply() instead of aggregate(). Results in speed ups for mat() and bootstrap(). * Screeplot() renamed screeplot(). Now works off the screeplot generic function in R >= 2.5.0. * screeplot method for bootstrapped models now draws lines in different colours. * As a result of the adoption of screeplot(), analogue now depends on R >= 2.5.0. * cma.analog and it's print and summary methods changed so that they return an object even if all samples have no close modern analogues. * New RMSEP method for mat objects. Returns the LOO CV RMSEP for a MAT model. * Fixed minor bug in analog.default and how it recorded the call. Version 0.3-4 * New roc method for "analog" objects. * New mcarlo method for "analog" objects. * mat() now has a formula method and interface. * cma() is now more efficient, but does not return the same object components as before. $distances and $samples have been replaced by $close, a list of the close modern analogues for each fossil sample, with each component a named vector of close modern analogues and their distances. * A much changed reconPlot(), with a now-working default method that is used by other reconPlot methods. reconPlot.predict.mat updated to reflect changes. * Reverted the class of bootstrap() to "bootstrap.mat". * Removed Encoding: UTF-8 from package DESCRIPTION file. * Cut down some of the examples as they now take a while to run with the larger data sets and because a vignette is in the works they no longer need to be so comprehensive. Version 0.3-3 * Updated the example data sets to more complete versions. See ?rlgh, ?swapdiat and ?swappH for more details. * Changes to predict.mat to return minimum DC's and quantiles of training set DC's. * Minor tweaks to plot.mat - now display a bit more info such as 'k' for chosen model and whether it is weighted or not. * New function minDC() with print and plot methods, for extracting and plotting minimum dissimilarity for fossil samples. A default method and methods for classes "predict.mat" and "analog" are provided. * New function RMSEP for extracting or calculating RMSEP for transfer functions. * Modified output from print.analog, print.cma to be more compact (former) and more descriptive (latter). * cma() now returns the number of analogues per sample as close or close than argument "cutoff". cma() also now automatically determines "cutoff" if none supplied. * plot.cma() was plotting quantile lines for all x$quants whether they were greater than x$cutoff or not. Fixed to plot only x$quants <= x$cutoff. A check is made to determine if any(x$quants <= x$cutoff), and plotting of the qunantile lines is supressed if FALSE. * If 'y' was missing from distance() it was checking for and deleting any species (columns) that were all zero. * plot.mat() was not using the stored value of k in its plots. Now that k() can change the stored value plot.mat should use this rather than calculate its own k. * plot.roc() was not drawing 'which = 3 ' correctly. * Fixed up the citation file. Version 0.3-2 * New method for Screeplot for objects of class "bootstrap". Plots apparent and bootstrap statistics in screeplot format. * Begun to generalise bootstrap. bootstrap.mat now returns an object of class "bootstrap". print, summary, residuals and print.summary methods for "bootstrap.mat" have been change to methods for "bootstrap". This is all in preparation for adding other transfer function models to analogue in later versions, for which bootstrapping is also used. WARNING: the object returned from bootstrap.mat has changed subtly and will change periodically as new transfer functions models are added to allow for differences between models. The ultimate aim is to have a reasonable generic object "bootstrap" regardless of the transfer function model used. Version 0.3-1 - The New Year edition * Added 'stats' and 'graphics' to Depends: in the DESCRIPTION. Requested by the CRAN Maintainers. * New generic functions 'k' and 'k<-' for extracting and replacing the number of analogues stored in models. Currently for 'mat' objects only. * New dissimilarity coefficient in distance(), for Gower's general coefficient of similarity (expressed as a distance/dissimilarity) for mixed mode data, including factors. Use method = "mixed". * Realised that there were a number of different variants on Gower's coefficient out there. To be consistent with package 'vegan', method = "gower" now computes the same coefficient as vegan. The alternative formulation used in Version 0.3-0 and earlier is now available as method = "alt.gower". * distance() now works with missing values for methods "gower", "alt.gower" and "mixed" only. * Renamed ToDo file to TODO, and updated the information enclosed. * Add acknowledgments file THANKS. * Numerous documentation fixes. Version 0.3-0 * First version released to CRAN. * Minor documentation fixes prior to release. * Fixed CITATION file, which had old package name. A hang over from version 0.1-5. Version 0.2-7 * Added new function bayesF() to calculate Bayes factors, or likelihood ratios from the results of roc(). Includes simple print and plot methods, the latter being used in plot.roc to provide a 5th plot of roc results. * Added a new plot to plot.roc() - showing the probability of analogue (A+). This is now the default 4th plot drawn by default, replacing the likelihood ratio plots, which are harder to interpret. * Documentation tweaks to many functions. * Removed attributes from returned objects of functions analog(), cma(), mat(). Former attributes are returned as part of the restured object now. Updated all functions that made use of these attributes. * The analog method of cma() has new argument "prob"; a vector of probabilities with values in [0,1], for which quantiles of the distribution of training set dissimilarities will be calculated. * plot.cma() has new arguments; "draw.quant", "col.quant" and "lty.quant". These detrmine whether quantile lines are drawn on the stripchart, and the colour and line type used if they are drawn. * Restored dimnames to some elements of the returned object from bootstrap(). * Streamlined print.summary.cma(), which now uses print.cma() instead of duplicating code. * Fixed print.summary.predict.mat to return the training set assessment. * Fixed print.predict.mat - wasn;t displaying the bootstrap k. * Altered summary.analog and its print method. Summary no longer uses attributes to store information that is subsequently printed. * Added a package overview help page - access using: package?analogue Version 0.2-6 * Added new dissimilarity method "gower", for Gower's coefficient. Note this version does not implement the mixed version of Gower's coefficient. A future version of distance() will include method "gowerMixed" for the mixed data version (i.e. for mixed +/-, factor and quantitative data). Version 0.2-5 * Completely rewrote the mat method for roc(). Based on Programmer's Niche article by T. Lumley in R News (Vol. 4(1) 33--36). Uses the optimisations in the article to calculate the ROC curve itself. Now much faster, and produces a more compact return object than before. * Added a 4th plot to plot.roc(), which draws two definitions of the slope of the ROC curve as likelihood ratios. * Added documentation for plot method of roc(), including descriptions of what each plot shows. * New function reconPlot with default and predict.mat methods. Draws stratigraphic plots of reconstructions, with or without error bars. * mcarlo() and it's 'default' and 'mat' methods have been largely re-written to make them more efficient. mcarlo.mat() now access data from the 'mat' object and calls mcarlo.default(), so only one set of calculations now needs to be maintained. * New arguments "diag" and "is.dcmat" for mcarlo(). * Added new dissimilarity methods "manhattan", and "kendall" to calculate the Manhattan metric and Kendall's coefficient, respectively, in distance(). * 'method = "information"' was not working correctly if p_{ij} or p_{ik} were zero. * Minor fix to distance(), allows 'method = "chi.distance"' to work now. Minor tweaks to documentation to add equation for chi^2 distance metric. Still some equations need adding in correct notation. * Minor updates to documentation and code for analog(), mat() and mcarlo() to reflect additional dissimilarity coefficients now available in distance(). * Fixed some formatting issues in bootstrap.Rd and updated the documentation of the returned object to match code changes in previous versions. * predict.mat was defaulting to doing bootstrap predictions, which can be time consuming. Default is now to return normal predictions. Updates to the example for predict.mat to reflect this change. * Updated the documentation for predict.mat of the returned object to match code changes in previous versions. * General update of all documentation pages. Version 0.2-4 * Reverted the changes to fitted.mat and residuals.mat as these functions no longer worked like similar methods for other classes in R. * Altered plot.mat to use fitted and residuals methods for mat. Simplified extractions to generate one of the plots considerably. Also reverted changes imposed by fiddling with predict/fitted earlier. * Minor tweak to distance() to allow it to calculate dissimilarity between two individual samples only. For use in mcarlo() for simulation/permutation of dissimilarities. * New function mcarlo(), with default and "mat" methods. Experimental functions for simulating dissimilarities in order to determine critical values for various coefficients for use in identifying analogues. * New function roc(), with default and "mat" methods. Fits Receiver Operator Characteristic (ROC) curves following the framework of Wahl (2005) to identify the critical values of dissimilarity values. Also has a plot method for drawing the actual ROC curves. Version 0.2-3 * some issues with predict.mat() and print method associated with fixes for 0.2-2 ironed out. Others remain to be fixed - especially when not bootstrapping; need a consistent object representation. * fitted.mat now returns fitted values for all possible k-closest analogues. The kth model that minimises the RMSE (Apparent) is returned is user-supplied k not given. * residuals.mat now returns residuals for all possible k-closest analogues. The kth model that minimises the RMSE (Apparent) is returned is user-supplied k not given. * predict.mat and its print and summary methods now work again properly after changes made in 0.2-2. * summary.mat updated to work with new extractor functions. * plot.mat updated to work with new extractor functions. Version 0.2-2 * bootstrap.mat(), predict.mat() and print and summary methods now fixed to return stats for all k-closest models. Needs docs for bootstrap.mat() updating; currently the reconstructions are commented out. * join() was dropping the rownames of the joined objects. FIXED Version 0.2-1 * New function plot.cma() to plot results of a call to cma(). Uses stripchart() currently. Needs to be made more robust and adaptable to larger sample sizes. Version 0.2-0 * Minor documentation tweaks. Release 0.2-0 ready. Version 0.1-9 * Added new function residuals.bootstrap.mat() and print method. * predict.mat() now doesn't set k to be the model with lowest RMSE. If missing(k) in predict.mat(), k is set to NULL and bootstrap.mat will choose k giving lowest RMSEP assessed by bootstrap. If not using bootstrap resampling in predict.mat(), k is still set to the the model with lowest RMSE if not supplied. Version 0.1-8 * Fixed a little bug in predictions for new samples in bootstrap.mat() - was dropping the closest analogue. Uses the newly fixed cumWmean() and cummean() functions and argument "drop = FALSE". * Fixed up bootstrap.mat() to have a cleaner return object that is easier to maintain and IMHO use. * bootstrap.mat() now uses new code to evaluate predictions for new samples for all k, to match the previous changes to bootstrap.mat(). Removed extraneous code from previous versions. * summary.bootstrap.mat() and summary.predict.mat() updated to refer to the new returned object from bootstrap.mat(). * Updated documentation for bootstrap() and predict.mat() and fixed up examples. * Removed old file analogy-internal.Rd - hang over from older package. Version 0.1-7 * bootstrap.mat now uses the new code to return all values. The swap example is taking c. 18 secs to run on my laptop (1.8 Ghz P3m), with 1000 bootstraps. Not too bad. Final code tidy required then release as Version 0.2-0. Version 0.1-6 * Prepared ground work for bootstrap.mat to bootstrap for all k, not just user supplied k. Allows you to choose size of MAT model based on bootstrap RMSEP and other stats. Code works in bootstrap.mat() with argument 'boot.train = TRUE', just needs resulting returned object simplifying and removal of old code that duplicates one set of calcs, and methods written to display/plot the results of bootstrap on the training set. * cumWmean() and cummean() adapted for use in bootstrap.mat() for choosing k. New argument 'drop = TRUE'; controls whether spurious zero distance is ignored or not in calcuating cumulative stats. Needed for bootstrapping training set for all k. Version 0.1-5 * Changed package name to analogue Version 0.1-4 * Added new distance/dissimilarity coefficient to calculate Chi squared distance, sensu Lebart & Fenelon (1971) [Statistique et informatique appliquees. Dunod, Paris, 426 pp], the distance preserved in correspondence analysis. To use this, use: method = "chi.distance". Version 0.1-3 * Data set rlgh was incorrectly saved. Version 0.1-2 * Fixed a serious bug in join(), where rows were getting dropped if they had exactly the same counts in them. Solution provided by Sundar Dorai-Raj - see source for join() for further details. * join() now accepts any number of data frames as input, not just two as originally. This is as a result of the fix to join() above. * Updated all examples using join() to match new arguments of join(). Version 0.1-1 * First Development Release