[R] Help needed in data cleaning

Web Web webweb8537 at gmail.com
Fri Dec 18 20:14:46 CET 2015


Hello,
           I need some help in data cleaning using R. my CSV file looks as
follows.

"id","gender","age","category1","category2","category3","category4","category5","category6","category7","category8","category9","category10"1,"Male",22,"movies","music","travel","cloths","grocery",,,,,2,"Male",28,"travel","books","movies",,,,,,,3,"Female",27,"rent","fuel","grocery","cloths",,,,,,4,"Female",22,"rent","grocery","travel","movies","cloths",,,,,5,"Female",22,"rent","online-shopping","utiliy",,,,,,,

I need to reformat as follows.

id gender age category            rank1 Male    22  movies
  11 Male    22  music                21 Male    22  travel
   31 Male    22  cloths               41 Male    22  grocery
    51 Male    22  books                NA1 Male    22  rent
      NA1 Male    22  fuel                 NA1 Male    22  utility
         NA1 Male    22  online-shopping      NA
...................................5 Female    22  movies
NA5 Female    22  music              NA5 Female    22  travel
   NA5 Female    22  cloths             NA5 Female    22  grocery
      NA5 Female    22  books              NA5 Female    22  rent
         15 Female    22  fuel               NA5 Female    22  utility
           NA5 Female    22  online-shopping    2

So far My efforts are as follows.

mini <- read.csv("~/MS/coding/mini.csv", header=FALSE)
mini_clean <- mini[-1,]
df_mini <- melt(df_clean, id.vars=c("V1","V2","V3"))
sqldf('select * from df_mini order by  "V1"')

Now I want to know what is the best way to fill all missing categories for
all users.

Thanks
Nash

	[[alternative HTML version deleted]]



More information about the R-help mailing list