[R] Odp: How to use R to perform prediction based on history data

Petr PIKAL petr.pikal at precheza.cz
Tue Aug 18 15:27:45 CEST 2009


Hi

r-help-bounces at r-project.org napsal dne 15.08.2009 04:27:39:

> Say I have a csv file, each row contains several fields, one of them
> are whether the row is success.
> 
> In history data, I have all the fields including the result of whether
> it is success. In future data, I only have fields without the result.
> 
> For example:
> 
> history data:
> 
> Field1 Field2 Field3     Field4  ResultField
> 1231    CA       TRUE    443        TRUE
> 23231  NC       TRUE    123        FALSE
> 1231    CA        FALSE    243        TRUE
> 
> The future data:
> Field1 Field2   Field3     Field4
> 23231  NC       TRUE    123
> 
> 
> 
> I am newbie in R and statistics, I just feel R could have some
> mechanism to give the probably of success rate based on history data.
> 
> I tried to read in the csv data, and try to call "factor" on the list,
> but I am seeing error message:
> Error in sort.list(unique.default(x), na.last = TRUE) :
> 
> Any idea are highly welcome.

Well, the first idea seems to be that you could buy or borrow some book of 
introductory statistics and look into some R intro documents (it is in doc 
folder of your R installation) there are also books of introductory 
statistics which use R as a programming language.

If you do not know much about statistics and R you possibly could toss a 
coin and fill your ResultField accordingly (it could be quicker and maybe 
more foolproof :-).

I would not call myself an expert in statistics but in data like that you 
could try

?lm, ?glm or other modelling procedure with predict ability for future 
data and/or tree based package like ??mvpart

Regards
Petr





> 
> Thanks in advance.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list