[R] Help needed! Pre-processing the dataset before splitting - model building - model tuning - performance evaluation
Rui Barradas
ru|pb@rr@d@@ @end|ng |rom @@po@pt
Wed Sep 25 10:00:34 CEST 2024
Às 06:04 de 24/09/2024, Bekzod Akhmuratov escreveu:
> Below is the link for a dataset on focus. I want to split the dataset into
> training and test set, use training set to build the model and model tune,
> use test set to evaluate performance. But before doing that I want to make
> sure that original dataset doesn't have noise, collinearity to address, no
> major outliers so that I have to transform the data using techniques like
> Box-Cox and looking at VIF to eliminate highly correlated predictors.
>
> https://www.kaggle.com/datasets/joaofilipemarques/google-advanced-data-analytics-waze-user-data
>
> When I fit the original dataset into regression model with Minitab, I get
> attached result for residuals. It doesn't look normal. Does it mean there
> is high correlation or the dataset in have nonlinear response and
> predictors? How should I approach this? What would be my strategy if I use
> in Python, Minitab, and R. Explaining it in all softwares are appraciated
> if possible.
>
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Hello,
R-Help is a list of questions and answers about R code, not to suggest
analysis strategies. Anyhow, I suggest that you read the Python notebook
at the bottom of the Kaggle page, it addresses your main question and if
you have doubts translating the Python code to R code, ask us more
specific questions on those doubts.
Hope this helps,
Rui Barradas
--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus.
www.avg.com
More information about the R-help
mailing list