[R] Finding dependencies and clusters in live survey data with a mix of independent variable types

sweepingplains at gmail.com sweepingplains at gmail.com
Thu May 8 08:59:40 CEST 2008


I have a set of live data about customer satisfaction and desires of a
live ecommerce site. There are only 311 survey responses. There were
approximately 154 questions. A large fraction of these questions were
questions with numerical answers (e.g. on a scale of 1 to 10 how
satisfied are you with our service,  how many months have you been a
customer for, how old are you, how many computers do you own). A
second large fraction of the questions had binary answers (e.g. do you
own an ipod,  do you think blogging will be more or less popular in 5
years time than it is now, do you use online video sites). The
remaining data were multinomial answers (e.g. from which of these
sources did you first find out about this site,  which of these most
closely describes the industry you are in).

I am mostly interested in finding subsets of customers for whom some
subset of survey answers best correlate with their answer to the
question "On a scale of 1 to 10, how would you rate our overall
service?"  I am also interested in identifying market segments of
like-minded individuals with similar interests and views and find out
what they, as a group most want from the service in the future.

I am aware of how to perform multiple linear regression using R but I
am not sure how to
1. handle the binary variables and multinomial variables as
independent variables
2. find a set of canonical independent variables which most closely
correlate in combination to the "overall service rating" data
3. find market segments among the data by looking for clusters of like
interests and views

Are any of the above suitable for analysis by R? If so, do there exist
example programs available which achieve similar things that I can
study as guides?

Thanks in advance for your contemplation.

Charlie



More information about the R-help mailing list