On the last point, as log-likelihood is only defined up to a constant
(depending on the dominating measure and hence the data), it is more a
question of legitimate use of different definitions by different
programmers.  In this case it is a question of using method inherited from
glm that an inappropriate in R (but not in S) and I will fix it for R 

On the first point: Akaike only defined AIC for a nested series of models.
AIC can be used to compare non-nested models, but

1) The theory needs to make assumptions which may well not hold, for 
example when comparing lme models with lm or gee models when under one 
model the MLEs fitting the other are on the boundary of the space.

2) The sampling variability of the difference in AIC is large in the
non-nested case.

