[R-sig-phylo] fitContinuous in geiger

Annemarie Verkerk annemarie.verkerk at mpi.nl
Wed May 18 09:50:35 CEST 2011


Hi all,

I’m having some trouble with the function fitContinuous in the geiger 
library. I'm using fitContinuous to estimate a lambda score as an 
indication for the presence of phylogenetic signal. As a sidenote, I'm 
doing this with language data - so language trees based on shared 
vocabulary and it is a linguistic typological trait that I'm trying to 
get estimates of lambda of. Another sidenote is that I have similar 
problems in BayesTraits but no problems using phylosignal in picante for 
estimating lambda.

At the moment, there are 14 taxa in my sample. I have a tree set of 1000 
trees. The first data set + trees are attached. My data values are all 
values between 0 and 1, basically things like '0.326547'. (This is 
because they come from a principal components analysis; they are scores 
on the first principal component that explains about 80% of the 
variation.) I've been using capped values with two numbers after the 
period just for easy usage, so '0.33'. However, the results that I get 
are strange.

My first dataset looks like this:

language value
t1 0.32
t4 0.52
t6 0.95
t9 0.75
t10 0.77
t12 0.46
t14 0.61
t2 0.35
t3 0.29
t5 0.25
t7 0.89
t8 0.88
t11 0.79
t13 0.35

Then I do the fitContinuous analysis over my sample of trees (1000 
trees) and these are my scores:

median of lambda:
[1] 1
mean of lambda:
[1] 0.9999985
sd of lambda
[1] 4.60849e-05

So: almost all values of lambda are 1.

median of log-likelyhood
[1] 5.206887
mean of log-likelyhood
[1] 5.210839
sd of log-likelyhood
[1] 0.4215943

The log-likelyhood is positive? That is very strange…? These results 
basically make it seem as if the algorithm has crashed.

Then, I multiply my values with 100:

language value
t1 32
t4 52
t6 95
t9 75
t10 77
t12 46
t14 61
t2 35
t3 29
t5 25
t7 89
t8 88
t11 79
t13 35

results:

median lambda:
[1] 0.9874361
mean lambda:
[1] 0.9839095
sd lambda:
[1] 0.01622255

median log-likelihood:
[1] -65.66331
mean log-likelihood:
[1] -65.73675
sd log-likelihood:
[1] 1.716778

Now the number of lambda scores of '1' is lower, although it is not 
really gone yet, there are still around a 200-300 instances of '1'. The 
log-likelyhood is now -65, so at least it's negative.

When I multiply my original data points with 1000, this is my data set:

value
language value
t1 320
t4 520
t6 950
t9 750
t10 770
t12 460
t14 610
t2 350
t3 290
t5 250
t7 890
t8 880
t11 790
t13 350

results:

median lambda:
[1] 0.8640076
mean lambda:
[1] 0.8561964
sd lambda:
[1] 0.05001523

median log-likelihood:
[1] -2055.763
mean log-likelihood
[1] -2067.052
sd log-likelihood
[1] 213.44

There are no no more lambda scores of ‘1’ in the data, but the log 
likelood is a really big number, and I'm not sure what that would mean 
in this context?

So, even though the range of variation stays exactly the same with these 
multiplications, there are quite important differences between the 
results these three sets of data give me. It was suggested to me that 
the algorithm might be doing something to my data values, for instance 
cap them, round them off or not taking into account certain decimals, 
and that might be the reason for these different results. Would anyone 
have any idea about why this happens and how I can deal with it in an 
informative way?

Thanks so much for any help that you might be able to offer,
Annemarie Verkerk


-- 
Annemarie Verkerk, MA
Evolutionary Processes in Language and Culture (PhD student)
Max Planck Institute for Psycholinguistics
P.O. Box 310, 6500AH Nijmegen, The Netherlands
+31 (0)24 3521 185
http://www.mpi.nl/research/research-projects/evolutionary-processes

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: data.csv
URL: <https://stat.ethz.ch/pipermail/r-sig-phylo/attachments/20110518/86e211ec/attachment.pl>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: trees.trees
URL: <https://stat.ethz.ch/pipermail/r-sig-phylo/attachments/20110518/86e211ec/attachment-0001.pl>


More information about the R-sig-phylo mailing list