[R] Recursive partitioning algorithms in R vs. alia
jude.ryan at ubs.com
jude.ryan at ubs.com
Tue Jun 23 00:56:52 CEST 2009
I have used all 3 packages for decision trees (SAS/EM, CART and R). As
another user on the list commented, the algorithms CART uses are
proprietary. I also know that since the algorithms are proprietary, the
decision tree that you get from SAS is based on a "slightly different"
algorithm so as to not violate copyright laws. When I first started
using R (rpart) I benchmarked it (in terms of results obtained) for my
particular problem at the time against Salford Systems CART. R gave me
an identical tree with the splitting value being different in the 2nd or
3rd decimal place from what I recall. I did not have SAS/EM at that
particular company and so could not benchmark it. Salford Systems CART
does have additional types of splitting criteria such as "towing" etc.,
but again, these may be of value in certain types of problems. The
splitting criteria found in R are good enough.
I do have SAS/EM right now but prefer R to SAS/EM since R can be
programmed and SAS/EM cannot. This may not be relevant for decision
trees but for neural networks, for example, if I want to build hundreds
of neural networks (since there are no variable selection methods for
neural networks) with different predictors and different number of
neurons, I can do this easily in R but cannot do this in SAS/EM. SAS/EM
does have a variable selection node but that is independent of the
neural network node, so, from what I understand, you have to select the
variables and then pass them to the neural network node.
In general, you get "prettier" output with CART and SAS/EM for trees.
However, there are packages in R that can give you prettier output than
rpart does. One GUI that you may want to explore, that works with R, is
Rattle. This builds trees, neural network, boosting, etc. and you can
see the generated R code as well.
In terms of handling large volumes of data, SAS/EM is probably the best.
However, if you have a 64 bit operating system with lots of RAM, and use
random sampling, R should suffice. It is debatable whether the extra
features like pretty output and variable importance is worth the huge
costs you have to pay for those products, unless you really need these
features. With R you can do what you want, and that is build a good
tree. From what I have read, variable importance measures can be biased
as they are affected by factors such as multicollinearity, variables
with many categories, etc., so their usefulness is questionable
(however, end-users may love them).
SAS/EM is by far the most expensive product, and Salford Systems CART is
pretty expensive as well. So depending on your needs, R may be good
enough or the best, because you can program it, and the latest
methodologies will always be implemented in R first. For comparisons of
the programming capabilities of SAS (macros) versus R you may want to
look at what Frank Harrell and Terry Thearneu (who wrote rpart) have to
say. Both are experts in SAS and R.
Hope this helps.
Jude
Carlos wrote:
Dear R-helpers,
I had a conversation with a guy working in a "business intelligence"
department at a major Spanish bank. They rely on recursive partitioning
methods to rank customers according to certain criteria.
They use both SAS EM and Salford Systems' CART. I have used package R
part in the past, but I could not provide any kind of feature comparison
or the like as I have no access to any installation of the first two
proprietary products.
Has anybody experience with them? Is there any public benchmark
available? Is there any very good --although solely technical-- reason
to pay hefty software licences? How would the algorithms implemented in
rpart compare to those in SAS and/or CART?
Best regards,
Carlos J. Gil Bellosta
http://www.datanalytics.com <http://www.datanalytics.com/>
___________________________________________
Jude Ryan
Director, Client Analytical Services
Strategy & Business Development
UBS Financial Services Inc.
1200 Harbor Boulevard, 4th Floor
Weehawken, NJ 07086-6791
Tel. 201-352-1935
Fax 201-272-2914
Email: jude.ryan at ubs.com
-------------- next part --------------
Please do not transmit orders or instructions regarding a UBS
account electronically, including but not limited to e-mail,
fax, text or instant messaging. The information provided in
this e-mail or any attachments is not an official transaction
confirmation or account statement. For your protection, do not
include account numbers, Social Security numbers, credit card
numbers, passwords or other non-public information in your e-mail.
Because the information contained in this message may be privileged,
confidential, proprietary or otherwise protected from disclosure,
please notify us immediately by replying to this message and
deleting it from your computer if you have received this
communication in error. Thank you.
UBS Financial Services Inc.
UBS International Inc.
UBS Financial Services Incorporated of Puerto Rico
UBS AG
UBS reserves the right to retain all messages. Messages are protected
and accessed only in legally justified cases.
More information about the R-help
mailing list