[R] fda modeling
Spencer Graves
spencer.graves at structuremonitoring.com
Mon May 28 19:14:05 CEST 2012
Hi, Troels:
I'm still trying to understand the structure of your data.
Please check the discussion below. If what I suggest is correct, it
should make the analysis much more routine and therefore easier
requiring less time to analyze.
On 5/21/2012 1:33 PM, Troels Ring wrote:
> Dear friends - We have 25 rats, 14 of these subjected to partial
> removal of kidney tissue, 11 to sham operation, and then followed for
> 6 weeks. So far we have data on 26 urine metabolites measured by NMR 7
> times during the observation.
So you collected urine samples at 7 different times on each rat
throughout the experiment, separated out 26 different metabolites and
measured each of those 7 using Nuclear Magnetic Resonance (NMR)? What
were the ages of the rats at the time of the operation and at the times
that each of the 7 urine samples were collected? In particular, were
the 7 urine samples equally spaced? If yes, that could simplify the
analysis. The greater the time differences between samples and between
rats, the more difficult the analysis potentially.
What were the ages of the 25 rats? Were they all from the same
litter? If no, how were they related? The worst possible case is that
you have 14 from one litter and 11 from another. If that's the case,
then any difference you see between the two groups could be a litter
effect. If they are one rat from each of 25 litters, that would
simplify the statistical analyses. Scientifically, the best might be to
have at 4 or 5 rats from each of 6 litters, assigned with at least 2
rats to experiment and 2 to control from each litter. You probably
don't have that, but the litter effect is likely to be important and
that needs to be part of the analysis, I think.
> I have smoothed the measurements by b.splines in fda including a
> roughness penalty, and inspecting the mean curves for nephrectomized
> and sham animals indicate differences for several of the metabolites.
> Now the real idea is to use the NMR measurements to understand what
> goes on in the kidneys since we know the partial removal of kidney
> tissue will result in progressive damage in the kidneys - the nature
> of that is what we want to understand. We have a blood sample from the
> rats just prior to sacrifice, and the creatinine concentration there
> is a good proxy for "renal function".
So you have one measure of creatinine for each rat measured just
prior to sacrifice?
> So the course of concentrations of the metabolites are thought to be
> valuable in understanding the physiology. Some of these are thought to
> be correlated. We have two groups where sham animals have better renal
> function than partially nephrectomized, but there is variation in both
> groups which is also interesting - some animals progress more rapidly
> after "the same operation" than others - we would like to know why.
> The data are available (eventually - the resulting blood tests still
> are missing) if anyone would like to have a look but the main issue is
> if it is at all feasible to make fda work on such a problem.
I suggest you forget about fda at least initially and start with
simpler, more traditional tools. Later, you may or may not want to
return to fda. I suggest you proceed as follows:
I. DATA CLEANING: Make normal probability plots of
everything: I'd start with making one normal probability plot for each
of the 26 metabolites. Normally distributed data with approximately the
same mean and standard deviation will look approximately like a straight
line. The scientist's dream with this is the image of two lines with a
gap in the middle, with the two lines corresponding exactly to the two
groups (nephrectomized vs. controls). It's more likely that you will
see mostly one distribution with a few observations away from a
moderately straight line in the middle. If you see this, you should
check the records and samples for the deviant observations to see if you
can find, e.g., a data entry error or a problem with mishandling a
sample. If you can't fix any observation that way, you should replace
the numbers with NA (not available = missing). Another possibility is
you see several little clusters corresponding to the litters. Or you
might see curvature to the line; with curvature, if all the numbers are
positive, you should try normal plots of the logarithms. If that helps
straighten out the lines, you should analyze the logarithms not the raw
numbers. I usually do this with something like qqnorm(x, datax=TRUE).
The use of "datax" means that with one or more outliers, the slope of
the center portion will be closer to 45 degrees and therefore more
easily processed with the naked eye.
II. UNIVARIATE ANALYSES: After data cleaning, I'd then
use something like lme{nlme} to analyze each response variable
(metabolite or creatinine) separately. I recommend lme, because it is
exactly what is needed for this kind of thing AND there is a great book
available to describe how to do it: Pinheiro and Bates (2000)
Mixed-Effects Models with S and S-PLUS (Springer). This book has
companion script files in the nmle package (similar to those with fda),
which are quite valuable for understanding the book, because there were
some changes in nlme, so in a very few cases, the code in the book
doesn't work in R, but the code in the companion script file does.
There is better software available today and there may be better books,
but for what you have, I would probably not mess with anything else.
The techniques described early in this book should help you analyze
between treatment and between litter effects for all the different
variables AND the impact of one or more variables on others.
III. MULTIVARIATE ANALYSES:
(a) Analyzing the variations over time of each
metabolite for each rat in each litter and treatment group can be
challenging. In essence, you have 26 time series of length 7 for each
rat. This is very much the problem that pushed Doug Bates into studying
mixed-effects models: Earlier, he had studied nonlinear modeling, as
documented in Bates and Watts (1988) Nonlinear Regression & Its
Applications (Wiley). Many of his datasets were metabolites collected
over time like the data you have. The second half of Pinheiro and Bates
describes how to model mixed effects within nonlinear models like you
have. If you can NOT get simple linear or nonlinear models for each
metabolite over time and the fda models provide something useful, you
might look at the fdaMixed package available from CRAN. I haven't used
it, but the name makes it sound like it might help you.
(b) You probably will also want to do either
principle components or factor analysis of the 26 different
metabolites: The first few principle components or factors will likely
represent the major modes of behavior among the metabolites. This could
reduce the analysis from 26 different matabolites to 2 or 5 different
primary modes of variation in the biochemistry -- possibly clustering
the metabolites to simplify the analysis and strengthen the
interpretation. Then take the principle components or factors back to
nlme to complete the analysis.
I apologize for encouraging you too much to study fda
techniques. The above describes a standard analysis protocol that has
been used with great success by many people. Many of my data analysis
failures involved jumping straight to a multivariate analysis before
doing the simple things first ;-)
Hope this helps.
Spencer Graves
> Best wishes
> Troels Ring,
> Nephrology
> Aalborg, Denmark
>
More information about the R-help
mailing list