[Bioc-devel] NEWS - July 2007
jmacdon at med.umich.edu
Sun Aug 12 04:01:52 CEST 2007
Removed R_affx_cdf_deprecated.cpp, wasn't need and it gave
errors on R-2.6.0 due to CHAR changes
Optimized writeCdfHeader() for memory. For a CDF with
1,200,000+ units just writing the unit names would consume
1-1.5GB RAM. Now it writes unit names in chunks keeping the
memory overhead around 100-200MB. o Made convertCdf() more
memory efficient. o BUG FIX: The error message in
isCelFile() when the file was not found was broken. o
Updated to v1.9.2 on BioC devel.
Version: 1.8.1 [2007-07-26] o Now affxparser install on OSX
Version: 1.7.6 [2007-03-28] (never committed until v1.9.2) o
Modified findCdf() such that it is possible to set an
alternative function for how CDFs are located.
Yeast ORF IDs were being processed using a logically flawed
mechanism which sometimes would toss out part of the
ID. This is now fixed.
Seth noticed some bad looking bugs, which I have fixed here
and then tested to make sure that the code was not dependent
on any of these bugs to run...
Added lines for new Agilent and GO pkgs.
I believe that this adds the rest of the supported rat,
mouse and human Agilent packages. Next I will add other
I also updated the line/db for the modified GO.sqlite
db. This GO.sqlite file is now an order of magnitude smaller
than it was before.
Added paths for arabidopsis builds.
These appeared to be partially finished. So I
have added code to finish the sqlite portion.
Since they are avaialable now, I have added them
Removed ALLENTREZID, ENTREZID and ENTREZID2GO maps from GO_DB
schema. Version bump.
pkg template names now have the ".DB" suffix instead of just
A few more edits needed on the GO.DB man pages (only
strictly required edits for now, these man pages will really
need to be revisited at some point...)
started fixing createAnnObjs.YEAST2_DB()
Finished fixing createAnnObjs.YEAST2_DB(). Version bump.
added a "Proposal for schema improvements" section
Fixed createAnnObjs.AG_DB(). Version bump
added pkg template for KEGG + a new entry in the master
index file (inst/extdata/ANNDBPKG-INDEX.TXT)
Added missing packages.
Added many new packages for custom arrays. Wherever
possible, these packages have also been labeled for
some rearrangement to the master index file
Added support for the KEGG_DB schema. Version bump.
chipName entry not needed for Qiagen chip
Fixed createAnnObjs.YEAST_DB(). Version bump.
more polishing to the YEAST_DB section
createAnnObjs.YEAST_DB() so now the COMMON2SYSTEMATIC map
uses the right table.
Fixed the ENZYME and ENZYME2PROBE maps for AG_DB
schema. Version bump.
Added missing ACCNUM map for AG_DB schema.
Added the YEAST2.DB pkg template for YEAST_DB-based pkgs
(annotations for yeast chips). Version bump
Added the AG.DB pkg template for AG_DB-based pkgs
(annotations for Arabidopsis chips). Version bump.
Changed package names. New files are also rdy.
To coincide with changing all the pkg names, I have also
pushed over pkgs that *have* the new names in them so that
things should hopefully line up without incident.
Propagated new db schema names all over the
place. Reorganized the master index file. Version bump.
Added support for new RODENTCHIP_DB schema.
Added the species field for each species-based pkg to the
Added support for new HUMAN_DB and RODENT_DB
schemas. Version bump.
Some adjustments to the reverse maps created by
createAnnObjs.HUMAN_DB() and createAnnObjs.RODENT_DB().
PFAM and PROSITE maps (IpiAnnDbMap objects) are not
Added man pages to the KEGG.DB template (taken and adapted
Forgot to put the man page for the MAPCOUNTS map in the
KEGG.DB template (this page is inaccurate and will need to
be fixed in _all_ templates).
Forgot to put the man pages for "db_conn" and "db_file" in
the KEGG.DB template. Fixed example for "db_conn" man page
in the GO.DB template.
One more thing to do when we rework our DB schemas.
Changed expected name of the eg map files.
Just changed the expected name of the eg mapping files to
match what is being now produced by the pipeline...
Work on NChannelSet, have to fix the problem with default
"numberofgraphs" and "on" for AffyBatch.
problem of "con" and "numberofgraphs" fixed
problem with "man" fixed
Modifications on the way of splitting the density plots and
on the option to log transform the data.
Mix between boxplots and density plots figure numbers and
outfile argument is now named prefix
subdirectory issues solved, add spatial effect plot for
Add an index in the beginning of the report. Add the qcstat
plot from simpleaffy. Add precisions in the help and the
Small modifications on the background and foreground
modified the file handling code
getArrayData() can return log-ratios ("M"), average
log-intensities ("A") or residuals ("residR") or ("residG")
by changing the which argument. - plotting functions
(boxplotBeads, imageplot) able to plot bead residuals,
log-ratios or ave log-ratios - new function beadResids()
calculates residuals for each bead on an array. -
readIllumina() gives more meaninful warning messages when no
files are found. - readIlumina(): default textType argument
changed from ".csv" to ".txt"
new function readBGX() to read in Illumina's .bgx files for
expression BeadArrays - fixed typo in readIllumina()
fixed typo in readBGX.Rd file
createBeadSummaryData() modified to handle version 2
BeadChips with 2 strips per array. Human 6 arrays have all
the beads spread across 2 strips for Human6, rather than
different beads on each strip as per version 1. - bug
fixes: beadResids() and getArrayData() - removed extra
bracket in boxplotBeads.Rd documentation
modified C code for findBeadStatus to handle case when mad=0
normaliseIllumina, offers normalisation for ExpressionSetIllumina
- readBeadSummaryData, default options changed to handle
BeadStudio 3 output (skip=8, added 'quote=""' to read.table
caused R to hang for some BeadStudio version 3 output))
sep arguments to be correct for version 3 outputetc.)
- vsn2 used in normalizeSingleArray, instead of vsn
- displayTIFFImage and plotCoord removed
- plotMA and plotXY updated to now check for and exclude missing
values from plotting
- createBeadSummaryData now checks that the arrays it combines
(when 'imagesPerArray=2') are the right ones. New argument
'what', which allows log-ratios (M), average intensites (A), red
(R), green (G) or red and green (RG) intensities to be
summarised. Default 'imagesPerArray' set to 1 instead of 2.
- getArrayData: typos in error messages fixed. New 'which' option
top get residuls from log-ratios.
- readIllumina: default background correction option changed to
- imageplot: changed default nrow and ncol to 100 (was 18 and 2
- beadResids() simplified after changes to getArrayData
- updated man pages
lost serial argument to bgx - not very useful
Changed IACT and MCSE to muiact and mumcse
changed a cat() to warning(), renamed basepath to rundir,
renamed dirname to inputdir, changed default rundir to
. instead of inside tempdir() to avoid losing valuable runs
on R quit
cleanup + fixed bug where thought i was using 0-indexed var
instead of 1-indexed var. all C code is 0-indexed now
Fixed bug with normgenes parameter to analysis functions
Additional accessors for eSet
* fData, fvarMetadata, fvarLabels access underlying data,
varMetadata, and varLabels for featureData
* Remove duplicated documentation and code of varMetadata
* Minor tweak to remove "" from eSet 'show'
Harmonize dimnames of ExpressionSet assayData elements, if
* 'harmonize' means to ensure that all dimnames are
consistent with names from phenoData, featureData
* Only possible if elements differ with NULL dimnames, not
if elements have different dimnames; in this case, signal an
error about conflicting dimnames
selectChannels was not copying all appropriate data
Documentation tidy on ExpressionSet
Fix bug in combine method when no rows are shared
Thanks to Laurent Gautier for the bug report and patch.
Revised patch on combine data.frame's
Row.names can be stored as integer, but 'merge' reports them
as character; force to integer if both incoming data frames
have integer row names.
Added comment to (reverted) switch(), where ordered=,BLAH
applies identity requirement BLAH both to ordered factors
and as default (for non-factors).
Revised unit test name to reflect underlying issue.
More lenient contraints to 'combine' data frames
Change 'identical' to 'all.equal(...,
check.attributes=FALSE)' so that row names on incoming data
frames can be stored differently (e.g., as character and
fixups and improvements to peek
Fixed getFeature function when using MySQL and the query
contains a chromosome name; Fixed error in Rnw file that was
caused by renaming of an attribute group in the Ensembl
Pattern size limit bumped from 10000 to 20000 letters for
Boyer-Moore. Version bump.
Added new BStringPartialMatches class. No version bump.
Added the "lcprefix", "lcsuffix" and "lcsubstr" new generics
(methods not implemented yet). No version bump.
Implemented "lcprefix", "lcsuffix" and added
"pmatchPattern". No version bummp yet.
Reimplemented "lcprefix" and "lcsuffix" in C (this boosts
"pmatchPattern"). Version bump.
use _Biostrings_ prefix instead of Biostrings_ for C visible
symbols that are not .Call entry points
started using Biostrings_ prefix for all .Call entry points
Revisited the match reporting mechanism shared by the
various matching algos
More refactoring of the match reporting mechanism.
Added a C routine for normalizing the views of a
BStringViews object and used it to boost the "masking a
BString object by content" operation.
Made matchPattern() and countPattern() work on a
BStringViews subject. Added some examples to matchPattern()
doc. Version bump.
Got rid of some [TODO]s in the documentation.
added some examples
- Replaced internal helper function isLooseNumeric() by
isNumericOrNAs() with better semantic.
- Improved the "letter" generic and documented it.
- Got rid of some TODOs in BStringViews-constructors.Rd.
Improved examples in man/BSgenome-class.Rd. Version bump.
reworked the slots of the "BSgenome" class in order to
provide more information about the provenance of the genome
Added missing accessor methods for the "BSgenome"
class. All BSgenome slots can now be accessed (read only)
via a dedicated accessor method. Documented these new
adjust how ties are handled.
Deprecate condGeneIdUniverse, add 'cond' arg to
geneIdUniverse now does the right thing given a result
object; that is, a conditional gene ID univ. is returned for
the results from a conditional test. There is an new
argument, cond, which defaults to TRUE. When called on a
non-conditional result, cond is ignored.
giving min.expected=NULL prevents column removal in
na.rm set to TRUE when finding qunatile values for
Bug fix to remove duplicated mcrs in the output.
Added: feature extraction -- hull, texture (Haralick), edge
features. Corrected: tile rewritten in C, hundreds times
faster; stackObjects got 'rotate' argument to allow for
rotational alignment of objects while stacking. Multiple bug
fixes. New recompiled DLL. Hopefully corrected encoding in
moments.Rd (that BioC check was complaining about)
Done extraction of features: moments, hull, Haralick, edge,
Zernike moments! All new bug fixes.
optimized Zernike code, 6 times faster on N = 12
Completed a full set of feature extraction routines (still
new might be added in the future), for images with zero
objects matrices are returned instead of NULL, updated man
pages for feature extraction routines, optimized
performance. New dll
R CMD check bug fix (missing link + possibly uninitialized
value in C -- was Ok)
minor bug fix in stackObjects: failed if there was 1 single
object because subsetting generated a vector from a matrix
that did not have dims
different bug fixes in feature extraction, small bugs mostly
concerning images with no objects or 1 object only
manual 'ext' selection in 'stackObjects', 'combine' on a
more options in stackObjects: added index to stack only
updated Windows DLL for the Windows build that takes into
account all recent changes in the C code
background reset to black on image 'rotate', need to
provide mechanism for specifying the background in future
workaround in man pages for bug in 2.6 rev 42284, result
> a <- Image(0, c(2,2))
> class(a+a) 
bug fixes in write.image: quality and file names
small bug fix in 'image' correcting wrong aspect ratio
Zernike pseudo moments; added Arith methods to comply with
added rat database handling code
added exons.in.range and transcripts.in.range and did a bit
of code refactoring
updated plot functions to allow line type to be specified
better representation for exons with missing data in
specify lty for exon edges
explore and analyze *omics data with R and GGobi
Untested fixes for NA testing of doubles
When testing whether a double value is NA, use ISNA()
instead of equality testing against NA_REAL. At least on
some platforms, the equality test will not work. This is
only an issue for doubles.
Modified filter constructors to accept parentId.
added a slightly friendlier error message
Extract parentId's from gatingML gates and store them in
parentId of filter objects.
Added a resolveParents function that combines related
(parent-child) gates for a given list of gates.
added compensation examples as a prelude to implementation
of code to read them
a start on parsing compensation ML
I copied the code for drawing rectangleGate and adapted it
Fixed bug in rowpAUCs that lead to subtle inaccuracies of
fixed a bug that was triggered on 64 bit machines
updated the data set, fixed a few bugs in man pages and
Require R 2.5.0 and recent version of Biobase
Fixed a bug with 'xlim', 'ylim' in smoothScatter: they are
now also propagated to the 'image' function
(1) added Armitage.R and Armitage.Rd to test linear trend;
included the phrase 'Armitage.R' to the file 'DESCRIPTION';
added function names 'Armitage', 'Armitage.default',
'ArmitageTest', 'genotypeCoding', 'genotypeCoding.default'
to the file 'NAM ESPACE'.
(2) fixed two bugs in 'convert.cpp' which caused error and
warning messages when using Rdev to compile GeneticsBase.
The first bug is the unused variable 'maxlen'. The second
bug is caused by recent changes in R related to CHARSXPs.
plot_EvG modified to have better x axis
fixed bug in plot functions
put in some code for leaves etc
did the inEdges thing
more shell-like runfile.sh in unit test
* setName, setIdentifier no longer required; use NA as
* default NullCollection rather than AdHocCollection
Correctly copy setName, rather than setIdentifier, when
Add GOCollection functionality
- GeneSet(GOCollection(ids, evidenceCode=codes)) consults GO
for appropriate EntrezIds
- Also bug fixes in mapIdentifiers
toBroadSet bug fixes
- Insist on BroadCollection
- use accessor for BroadCollection
- use con=stdout() as default connection
- GOCollection error message
- GeneColorSet construtor guessing phenotypes better
- GeneSet show method
Added vignette sketch
- Documentation tidy (incomplete) on GOCollection
Quieten mapIdentifiers, geneIdType<-
* added verbose option
* use "*.db" for AnnotationDbi packages
* setIdentifier=.uniqueIdentifier() as default for
Generalize accessor construction methods to accept 'where'
modified functions to be less verbose, and modified silcheck
and msscheck to have more similar arguments
22 July 2007: limma 2.11.9
- improvements to the numeric computations of
dnormexp.saddle(), which is used by
- new function normexp.m2loglik.saddle(), which is the same
as normexp.m2loglik() but using the saddle-point
approximation. - normexp.fit() has a new argument 'methods'
- default for n.pts in normexp.fit() changed to NULL,
meaning use all the points. The rule used to choose the
quantiles if n.pts is improved to give more nearly unbiased
9 July 2007: limma 2.11.8
- contrasts.fit() now warns if row names of contrast matrix
don't match column names of contrasts.
- plotMA3by2 has a new argument 'device' to specify the
Added new importance measures. Changed logic.fs to logicFS,
and logic.vim to vim.logicFS. New vignette will follow
after another major update.
Add the correction of the STDEV column values of the
BeadStudio output file (transfer the standard error of the
mean as the standard deviation)
Mfuzz option added to see.genes
Fixed bug in getProbeDataAffy, and converted 'what' list to
contain integers and characters instead of 'numeric' and
'character', which doesn't do anything.
Add revcompDNA and revcompRNA functions
These are implemented in C and return the reverse complement
for a given RNA or DNA sequence.
countbases returns a matrix of base counts and works for
either DNA or RNA based on the 'dna' argument.
adding predicted accuracy to crlmm
controlling memory usage
Major restyling of oneChannelGUI:
Reorganization of functions in the available menus. Loading
of exon .CEL files, via APT tools, is now free of bugs.
DABG p-value calculation is now possible via APT tools.
MiDAS alternative splicing p-value calculation and filtering
are now free of bugs.
Loading of GEO Matrix Series files is now possible.
RanKProd methods were graphically interfaced. Adding
annotation for gene-level core exon data, this will remain
until Bioconductor annotation libs willbe available Major
revision of oneChannelGUI vignette, an annotaiton vignette
Massive error polishing. Vignette update.
Improving error messages Fixing some bugs in the
classification and filtering menu
Exon data analysis: Splice Index is now calculated using APT
tools together with MiDAS p-values, since it very efficient.
Rank product method (RankProd package) was adapted for the
identification of alternative splicing events. Filters
based on MiDAS and rank product methods were implemented to
allow the selection of top ranking alterna tive splicing
events. Adding two modules for metha analysis: merging up
to 3 data set to the data loaden on oneChannelGUI in
NormalizedAffyData evaluation of the itegrative correletion
coefficient as implemented in metaArray package.
Fix type errors in the mannul files
Check for valid paths at start of makePdInfoPackage
Rework runfile.sh for more shell-like script
minor bugfix in makePdInfoPackage-methods, createPackage
call used ... inappropriately
minor change to add biocViews as a constant in template
DESCRIPTION. should propagate from seed but didn't see
immediately how to do that
minor failure in size:k:enz string construction fixed
took out Rdev as default R command, replaced with R
Adding prototype for SNP6 platform
I have fixed a bug in the code...should not be dividing by 2
I have modified genBPGraph either to return a directed or
undirected graph now
fixed bug in batch import of multiple FCS files
fixed bug in subsetting of cytoSets
fix very minor ties issue. Also note that current functions
do not handle matrices with NA values (to be fixed in the
a little more code cleaning
now handle NA values. Also how the quantiles are estimated
when data is missing or target length differs from matrix
dimensions has been altered.
DEResult - new class. Using for results of a differential
expression analysis. Has methods show, statistic, FC,
statisticDescription, DEMethod, pLikeValues, topGenes,
topGeneIDs, numberOfProbesets, numberOfGenes, numberOfContrasts
mgmos, mmgmos, justmgMOS and just mmgMOS - bugfix and changed
defaults. Will now create an exprReslt object. gsnorm parameter
now has median as default, i.e. median-scaling normalisation will
be applied. Use none to specify no normalisation.
mmgmos - new input parameter. New input parameter addConstant is
an experimental feature. Using the default value gives identical
results to previous version of mmgmos.
pumaDE - changed return value. Now returns an object of class
calculateLimma - changed return value. Now returns an object of
calculateFC - new function. Calculates differentially expressed
genes using fold change.
calculateTtest - new function. Calculates differentially
expressed genes using standard t-tests.
calcAUC - new function. Calculates area under an ROC curve
numFP - new function. Calculates number of false positives for a
given proportion of true positives
removeUninformativeFactors - new function. Remove uninformative
factors from the phenotype data of an ExpressionSet.
createDesignMatrix - various changes. Now removes uninformative
factors, ensures there is at least one factor. Can now handle
createContrastMatrix - various changes. Now removes uninformative
factors. Can now handle ExpressionSetIllumina objects. Now also
creates X_vs_other contrasts for factors with 3 or more levels,
in addition to previous contrasts created.
pumaPlots - bugfix. Replaced die() with stop()
plotROC - new arguments. includedProbesets, yaxisStat, xaxisStat,
downsampling, showLegend, showAUC
plot.pumaPCARes - newplot.pumaPCARes - newplot.pumaPCARes -
newplot.pumaPCARes - newplot.pumaPCARes - newplmb - various
changes. New parameter cl for a snow package cluster object. Can
now handle ExpressionSetIllumina objects. No longer relies on
se.exprs method. Noplot.pumaPCARes - newplot.pumaPCARes -
newplot.pumaPCARes - newplot.pumaPCARes - newplot.pumaPCAR
Ensures there is at least one factor.
pumaPCARes - removed plot method. This is now handled by
zzz.R - change to library.dynam call. zzz.R - change to
library.dynam call. zzz.R - change to library.dynam call. zzz.R -
change to library.dynam call. zzz.R - change to library.dynam
call. zzz.R - change to library.dynam call. zzz.R - change to
library.dynam call. zzz.R -ameter sorted to indicate whether
results returned should be sorted by PPLR or not. Also now
calculates means of different conditions rather than sums, to
account for unbalanced natures of 1 vs others contrasts.
Vignette updated to use DEResult class and methods, and includes
details of 1 vs others contrasts.
Added quantsmooth.seg for analysis of long sequences
Some simple docbook support, principally in Xweave
added a script for visualizing enrichment-transcript
relations as a graph using Rgraphviz and pointed to that
script in man pages
added one new visualization type (plus one minor variation
thereof) for chipAlongChrom
added a small convenience script for generating a targets
text file from the file SampleKey.txt provided by NimbleGen
* change the structure of interactors matrix: remove column
"IntAct ID" and use "IntAct ID" as row names
* use "IntAct ID" to reference interactors in the
interaction (or complex) list; "UniProt ID" is used
* improve the extraction of "IntAct ID": only consider
"secondaryRef" element before, now consider "primaryRef" as
improve the show method
I have added a vignette which will double for the supp mat
of the paper.
I have updated the SuppMat.Rnw file
I have made some minor modifications to the code as well as
the beginnings of a new function
I have written a new function to take XML files and generate
I have fixed a bug in the vignette
I have updated the parser and added a new function
I have fixed a bug in list2Matrix
I have updated the package so that intactGraph and
intachHyperGraph are superclasses to graphNEL and hypergraph
I have updated the parser functions
A package for refdb-based bibliography management
I have moved list2Matrix to Rintact and changed the
Added findDelta, a function for computing the number of
genes called differentially expressed for a given FDR, and
Freedom for chisqClass and qvalue.cal. Necessary because of
upcoming changes in package logicFS.
Now a better version of the fold change is available (see
use.dm). However, old version is still default.
Functions and classes for DNA copy number profiling of
deleted a few commented lines in AllClasses.R
removed fData and fData<- methods as these are now in
Biobase. allowed x.axis to be suppressed by argument xaxt=n
removed fData help file
changed generic for plotSnp
removed generic definitions for fData
corrected unmatched braces in help files chromosome and
added useLayout argument to plotSnp
added two arguments to plotCytoband: outer and cytobandAxis
move packages in Depends to Suggests, updated show methods
changed copyNumber method for SnpQSets to
applied a bugfix patch from James Bullard
Variational Bayesian Multinomial Probit Regression with
Gaussian Process Priors. (Neural Computation 18,
much improved vignette "incremental.Rnw"
some updates to the write up
Now the optimisation is done with respect to parameters a,b:
h(x) = as inh((x+a)/b) instead of h(x) = asinh(a+b*x) as in
vsn 1.X and 2.X. I hope that this will better the
performance in cases where the multiplicative error
dominates over the additive.
Also the vignette "incremental.Rnw" with the likelihood
combinations is now much more detailed.
Fixed some bugs and typos so now the package passes R CMD
check and the vignette looks OK.
Generalized the vignette to a new and better
Oh boy, lot's of changes. The optimisation is now done for
the parametrisation f(b)*x+a, and f() can be chosen quite
freely. Still needs to be tested more.
added the function scalingFactorTransformation
The "test" scripts now all work satisfactorily. Still need
to update the vignette and convergence2.Rnw
I have updated the vignette to reflect latest changes
Updated sagmbSimulateData - is now a bit less extreme.
Now it passes R CMD check: Fixed man pages to reflect recent
changes in parameter names and semantics. Now no longer
depend on limma (but use RGList via Namespace)
started adding justvsn method for NChannelSet. Not done
yet. Hope to continue tonight from home.
Added a justvsn method for NChannelSet.
Increased buffer size to 100000 in findMZBoxes() to handle
also files with a vast number of peaks
Fixed bug in MSW.getRidge() to catch empty ridgeLists
- The scale on which the peak was localised is also returned
- additional logical argument fitgauss, gaussian fits are no
- longer mandatory
- Integration method can be choosen: descent on the mexican
hat filtered data or on the real data. Method 2 is honest,
while method 1 (default) is more robust to noise . - runs
James W. MacDonald, MS
UMCCC cDNA and Affymetrix Core
University of Michigan
1500 E Medical Center Drive
Ann Arbor MI 48109
More information about the Bioc-devel