[BioC] FlowCore/FlowViz issues

Roger Leigh rleigh at codelibre.net
Tue Mar 1 16:36:23 CET 2011


On Tue, Sep 14, 2010 at 03:18:31PM -0700, Nishant Gopalakrishnan wrote:
> Hi Roger,
>
> Answers to your questions are posted below.

Dear Nishant,

Many thanks for your very helpful answers.  Many apologies for the
delay in replying--I got sidetracked writing up my thesis and didn't
have time to get back to this until recently.

> 1) Gating with an ellipsoidGate
>
> Is there any explanation regarding how to construct a covariance matrix
> from the actual
> dimension I want ?. Is there an alternative constructor to create a gate
> from real
> dimensions?
>
> Ans:
> As mentioned by Josef, the decision to represent gates using covariance
> matrix
> was made because the standards for definition of gates for flow
> cytometry (gatingML specification)
> decided to make use of covariance matrices as input.
>
> I do agree that an alternative constructor that takes in the half axes,
> center
> point etc would be more intuitive to use. I will try to add a method for
> that.
> Until then please make use of the spreadsheet provided by Josef.

This is the helper function I used; feel free to adopt it if you
haven't written your own yet:

# Covariance matrix in Mahalanobis space (required for ellipsoidGate)
cov.matrix <- function (a, b, angle) {
  theta <- angle * (pi/180)

  c1 <- ((cos(theta)^2)/a^2) + ((sin(theta)^2)/b^2)
  c2 <- sin(theta) * cos(theta) * ((1/a^2) - (1/b^2))
  c3 <- ((sin(theta)^2)/a^2) + ((cos(theta)^2)/b^2)

  m1 <- matrix(c(c1, c2, c2, c3), byrow=TRUE, ncol=2)
  m2 <- solve(m1)

  m2
}

# Convenience constructure for ellipsoidGate
ellipsoidGate.simple <- function (distances, mean, rotation=0, parameters,
                                  filterId="defaultEllipsoidGate") {
  cov <- cov.matrix(distances[1], distances[2], rotation)
  rownames(cov) <- parameters
  colnames(cov) <- parameters

  gate <- cells <- ellipsoidGate(filterId=filterId, .gate=cov, mean=mean)

  gate
}

> 2) Plotting on a log scale
> When plotting, I have the log values on the axes.  However, is it
> possible to either:
> a) plot on logarithmic axes with the appropriate log binning, o
> b) change the axis labels to be a log scale (even though the
>                axes are linear)?
> I can do this with regular R plots, but I'm not sure if it's possible
> with lattice
> graphics.
>
> Ans:
> Yes it is possible with lattice graphics as well.
> Please have a look at figure 8.4 and 8.5 and the accompanying code for
> generating the plots from Deepayans book at
> http://lmdvr.r-forge.r-project.org/
> That should probably get close to what you are looking for.

Yes, that was almost exactly what I needed.  I also looked at the
implementation of these examples in latticeExtra.  Unfortunately, I
couldn't get these to work directly for some reason, but they did
when I removed some of the checks.  These are the functions I'm
currently using for log axes on log scales in xyplots and densityplots:

logTicks <- function (lim, loc = c(1, 5)) {
     ii <- floor(log10(range(lim))) + c(-1, 2)
     main <- 10^(ii[1]:ii[2])
     r <- as.numeric(outer(loc, main, "*"))
     r[lim[1] <= r & r <= lim[2]]
}

l.yscale.components.log10ticks <-
function (lim, logsc = FALSE, at = NULL, ...)
{
    ans <- yscale.components.default(lim = lim, logsc = logsc,
        at = at, ...)
    logbase <- 10
    tick.at <- logTicks(logbase^lim, loc = 1:9)
    tick.at.major <- logTicks(logbase^lim, loc = 1)
    major <- tick.at %in% tick.at.major
    ans$left$ticks$at <- log(tick.at, logbase)
    ans$left$ticks$tck <- ifelse(major, 1, 0.5)
    ans$left$labels$at <- log(tick.at, logbase)
    ans$left$labels$labels <- as.character(tick.at)
    ans$left$labels$labels[!major] <- ""
    ans$left$labels$check.overlap <- FALSE
    ans
}

l.xscale.components.log10ticks <-
function (lim, logsc = FALSE, at = NULL, ...)
{
    ans <- xscale.components.default(lim = lim, logsc = logsc,
        at = at, ...)
    logbase <- 10
    tick.at <- logTicks(logbase^lim, loc = 1:9)
    tick.at.major <- logTicks(logbase^lim, loc = 1)
    major <- tick.at %in% tick.at.major
    ans$bottom$ticks$at <- log(tick.at, logbase)
    ans$bottom$ticks$tck <- ifelse(major, 1, 0.5)
    ans$bottom$labels$at <- log(tick.at, logbase)
    ans$bottom$labels$labels <- as.character(tick.at)
    ans$bottom$labels$labels[!major] <- ""
    ans$bottom$labels$check.overlap <- FALSE
    ans
}

> 3) Plotting with lattice/cairo_pdf is broken
>
> try the above statements, but include
> print(xyplot(`SS.Log` ~ `FS.Lin`, set, filter=cells, xlab="FS",
>         ylab=expression(log[10]~(SS))))
> instead and see if it works.
> You need to call print on trellis objects to get it to work.

This works just fine with the addition of print, thanks.

> 4) Overlaying density plots
> By default, density plots are stacked down the y-axis with a user-
> specified overlap.  Is is possible to directly overlay one-
> dimensional histograms on top of each other to allow direct
> comparison?
>
> Ans:
> I do not think that there is an existing function available in flowViz
> that let
> you stack them up.
>
> If you want to use lattice, you will have to reshape your data a bit to
> get it to work.
>
> nms <- sampleNames(dat)
> k <- lapply(nms, function(x){
>          nr <- nrow(exprs(dat[[x]]))
>          data.frame("FSC-H" = exprs(dat[[x]])[,"FSC-H"], samp =rep(x,=
 nr),
>              check.names =FALSE)
>         })
> df <- do.call(rbind, k)
> densityplot( ~ `FSC-H`, df, groups = samp)
>
> Here is a simple way of doing that without lattice
> data(GvHD)
> dat <- GvHD[1:3]
> plot(density(exprs(dat[[1]])[,"FSC-H"], n =512), col= col[1])
> col <- c("#FF0000", "#00FF00", "#0000FF")
> for (i in 1:3)
>  polygon(density(exprs(dat[[i]])[,"FSC-H"], n =512), col = col[i] )

Thanks, I'll give some of these approaches a try; for now I'm using
densityplots with zero overlap.

> 5) Updating phenoData
> I need to add certain information to my flowSet, so I'm currently
> doing this by getting the phenoData, adding the missing bits and
> setting it back:
>
> sampleNames(flowset) <- c("unstained", "isotype", "ng2")
>
> pd <- pData(phenoData(flowset))
> pd["isotype"] <- c(FALSE, TRUE, FALSE)
> pd["description"] <- c("Unstained", "Isotype", "NG2")
>
> pData(phenoData(flowset)) <- pd
> varMetadata(phenoData(set))["labelDescription"] <- c("Name", "Isotype",
>         "Description")
>
> This is quite long-winded.  Is there a convenience function present
> (or planned?) to add a single parameter which could update the
> phenodata and varmetadata at the same time?  It would ensure they
> never get out of sync.
>
> Ans: I am not quite sure I understand your questions here .
> There are update method for pData, phenoData, varLabels, varMetadata etc
> that directly work with flowSets. Please see if these methods or their
> update methods (<-) are what you are looking for .

Sorry, I probably didn't describe this too clearly.  If I have some
descriptive metadata I wish to add to a flowSet, for example

isotypecontrol <- c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE)
treatment <- c("c1", "c1", "c2", "c2", "c2", "c2"

What would be the recommended way to append these two extra descriptive
items to the phenodata slots?  The existing methods would appear to
require multiple calls to append the two columns to the pData and then
to update the varMetadata.  e.g.

pData(set)["isotype"] <- isotypecontrol
pData(set)["treatment"] <- treatment
varMetadata(set)["isotype","labelDescription"] <- "Isotype Control"
varMetadata(set)["treatment","labelDescription"] <- "Treatment"

is quite long-winded, and if you fail to update the labelDescription
you get errors.  Is there a convenience function or method to add
everything in one go to ensure that you always have internal
consistency in the flowSet phenoData/varMetadata? e.g.

addphenoData("isotype", isotypecontrol, description="Isotype Control")

> There's a convenience each_col to apply a function to each column in the
> flowSet, but is it possible to just access a single column, or set of
> specified columns?  I wrote this little helper:
>
> capply <- function (col, func) {
>       function(x) { func(x at exprs[,col]) }
>   }
>
>   which can then be used like so:
>
>   fsApply(flowset, capply("FL.1.Log", median))
>   fsApply(flowset, capply("FL.1.Log", sd))
>
>
>    to get the median and sd for a single channel.  However, it
>   doesn't work for mean, and I'm not sure why:
>
>   fsApply(set2, capply("FL.1.Log", mean))
>   Error in FUN(if (use.exprs) exprs(y) else y, ...) :
>     could not find function "func"
>
>     If I call it directly, it works just fine:
>
>     mean(flowset[[1]]@exprs[,"FL.1.Log"])
>     [1] 0.4648052
>
>     It's not clear do me what the difference is here between the
>     two forms.  Is there a built-in or better way to achieve the
>     same thing?
>
>  ANS: If you are looking for median and median statistics for a flowSet
>  wont the summary method give you that ??
>  data(GvHD)
>  summary(GvHD)

It does, but it requires a number of steps to get to the data you
want, and it's only a subset of the data I really need.  There's no
SD or %CV for example, or % falling in different gates compared to
an unfiltered dataset (which requires comparing the filtered and
unfiltered sets).

>  If you want to modify your code to get it to work, try this
>  fsApply(GvHD, function(x){
>            median(exprs(x)[,"FSC-H"])
>          })

This also works.  But it's still rather long-winded, especially when
you need to do it 10 times!  I think I'll
have to write a number of wrapper functions to make the analysis
look readable (and also more robust since it will reduce the chance
of mistakes!).

> 7) Getting at filter summary data
>
> > > positive <- filter(flowset, pos)
> > > summary(positive)
> filter summary for frame 'isotype'
>  positive+: 111 of 9186 events (1.21%)
>
>  filter summary for frame 'pos'
>   positive+: 12070 of 13246 events (91.12%)
>
>   Is it possible to actually get at the raw data here?  i.e.
>   the raw event number and/or the percentages.  I'm currently
>   doing it the "hard way", by:
>
>   newset <- Subset(flowset, pos)
>   fsApply(newset, capply("FL.1.Log", length)) / fsApply(flowset,
>           capply("FL.1.Log", length)) * 100
>
>   But there must be an easier and more efficient way of extracting this
>   information!  Looking at the filter object, I can only extract the
>   strings as printed.
>
>   Ans:
> 
>   Perhaps this example gives you what you are looking for.
>   dat <- read.FCS(system.file("extdata","0877408774.B08",
>      package="flowCore"))
>   rg <- rectangleGate(filterId="myRectGate", list("FSC-H"=c(200, 600),
>      "SSC-H"=c(0, 400)))
>   res <- filter(dat, rg)
>
>   ##-------------
>   tmp <- as(res, "logical")
>   inside <- sum(tmp)
>   totalCount <- length(tmp)
>   pr <- inside*100/totalCount

Ah, this does work just fine thanks.  However, given that this value
is summarised for you:

summary(f[[1]])
CellGate+: 18662 of 23673 events (78.83%)

having a method to directly extract it rather than having to
reimplement the wheel (and having knowledge of the object's internals)
would, I think, be a very useful improvement.  Just the event numbers
would be sufficient.


Another issue I've found in the interim:

xyplot(..., smooth=FALSE, outline=TRUE)
prints the outline inconsistently.  If I, for example, do an
xyplot on a whole flowSet, I often see the first plot missing
the outline completely, while all the following plots have
an outline.  I also see this issue when doing a single plot
but not consistently.  So far, it's always just the first plot,
so it looks like a bug somewhere, but I haven't found what
triggers it.


Kind regards,
Roger Leigh

--
Centre for Immunology and Infection                 Tel:  +44 (0)1904 328912
Department of Biology and HYMS                      Fax:  + 44 (0)1904 328844
University of York
PO Box 373
YORK  YO10 5YW



More information about the Bioconductor mailing list