[R-sig-phylo] fitContinuous in geiger
Annemarie Verkerk
annemarie.verkerk at mpi.nl
Wed May 18 09:50:35 CEST 2011
Hi all,
I’m having some trouble with the function fitContinuous in the geiger
library. I'm using fitContinuous to estimate a lambda score as an
indication for the presence of phylogenetic signal. As a sidenote, I'm
doing this with language data - so language trees based on shared
vocabulary and it is a linguistic typological trait that I'm trying to
get estimates of lambda of. Another sidenote is that I have similar
problems in BayesTraits but no problems using phylosignal in picante for
estimating lambda.
At the moment, there are 14 taxa in my sample. I have a tree set of 1000
trees. The first data set + trees are attached. My data values are all
values between 0 and 1, basically things like '0.326547'. (This is
because they come from a principal components analysis; they are scores
on the first principal component that explains about 80% of the
variation.) I've been using capped values with two numbers after the
period just for easy usage, so '0.33'. However, the results that I get
are strange.
My first dataset looks like this:
language value
t1 0.32
t4 0.52
t6 0.95
t9 0.75
t10 0.77
t12 0.46
t14 0.61
t2 0.35
t3 0.29
t5 0.25
t7 0.89
t8 0.88
t11 0.79
t13 0.35
Then I do the fitContinuous analysis over my sample of trees (1000
trees) and these are my scores:
median of lambda:
[1] 1
mean of lambda:
[1] 0.9999985
sd of lambda
[1] 4.60849e-05
So: almost all values of lambda are 1.
median of log-likelyhood
[1] 5.206887
mean of log-likelyhood
[1] 5.210839
sd of log-likelyhood
[1] 0.4215943
The log-likelyhood is positive? That is very strange…? These results
basically make it seem as if the algorithm has crashed.
Then, I multiply my values with 100:
language value
t1 32
t4 52
t6 95
t9 75
t10 77
t12 46
t14 61
t2 35
t3 29
t5 25
t7 89
t8 88
t11 79
t13 35
results:
median lambda:
[1] 0.9874361
mean lambda:
[1] 0.9839095
sd lambda:
[1] 0.01622255
median log-likelihood:
[1] -65.66331
mean log-likelihood:
[1] -65.73675
sd log-likelihood:
[1] 1.716778
Now the number of lambda scores of '1' is lower, although it is not
really gone yet, there are still around a 200-300 instances of '1'. The
log-likelyhood is now -65, so at least it's negative.
When I multiply my original data points with 1000, this is my data set:
value
language value
t1 320
t4 520
t6 950
t9 750
t10 770
t12 460
t14 610
t2 350
t3 290
t5 250
t7 890
t8 880
t11 790
t13 350
results:
median lambda:
[1] 0.8640076
mean lambda:
[1] 0.8561964
sd lambda:
[1] 0.05001523
median log-likelihood:
[1] -2055.763
mean log-likelihood
[1] -2067.052
sd log-likelihood
[1] 213.44
There are no no more lambda scores of ‘1’ in the data, but the log
likelood is a really big number, and I'm not sure what that would mean
in this context?
So, even though the range of variation stays exactly the same with these
multiplications, there are quite important differences between the
results these three sets of data give me. It was suggested to me that
the algorithm might be doing something to my data values, for instance
cap them, round them off or not taking into account certain decimals,
and that might be the reason for these different results. Would anyone
have any idea about why this happens and how I can deal with it in an
informative way?
Thanks so much for any help that you might be able to offer,
Annemarie Verkerk
--
Annemarie Verkerk, MA
Evolutionary Processes in Language and Culture (PhD student)
Max Planck Institute for Psycholinguistics
P.O. Box 310, 6500AH Nijmegen, The Netherlands
+31 (0)24 3521 185
http://www.mpi.nl/research/research-projects/evolutionary-processes
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: data.csv
URL: <https://stat.ethz.ch/pipermail/r-sig-phylo/attachments/20110518/86e211ec/attachment.pl>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: trees.trees
URL: <https://stat.ethz.ch/pipermail/r-sig-phylo/attachments/20110518/86e211ec/attachment-0001.pl>
More information about the R-sig-phylo
mailing list