[R] data after write() is off by 1 ?
Brian Feeny
bfeeny at mac.com
Tue Nov 20 20:45:50 CET 2012
A followup to my own post, I believe I figured this out, but if I should be doing something different please correct:
> prediction.out <- levels(prediction)[prediction]
> write(prediction.out, file="prediction.csv")
This gives me my correctly adjusted values
Brian
On Nov 20, 2012, at 2:30 PM, Brian Feeny wrote:
> I am new to R, so I am sure I am making a simple mistake. I am including complete information in hopes
> someone can help me.
>
> Basically my data in R looks good, I write it to a file, and every value is off by 1.
>
> Here is my flow:
>
>> str(prediction)
> Factor w/ 10 levels "0","1","2","3",..: 3 1 10 10 4 8 1 4 1 4 ...
> - attr(*, "names")= chr [1:28000] "1" "2" "3" "4" ...
>> print(prediction)
> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
> 2 0 9 9 3 7 0 3 0 3 5 7 4 0 4 3 3 1 9 0 9 1 1
>
> ok, so it shows my values are 2, 0, 9, 9, 3 etc
>
> # I write my file out
> write(prediction, file="prediction.csv")
>
> # look at the first 10 values
> $ head -10 prediction.csv
> 3 1 10 10 4
> 8 1 4 1 4
> 6 8 5 1 5
> 4 4 2 10 1
> 10 2 2 6 8
> 5 3 8 5 8
> 8 6 5 3 7
> 3 6 6 2 7
> 8 8 5 10 9
> 8 9 3 7 8
>
> The complete work of what I did was as follows:
>
> # First I load in a dataset, label the first column as a factor
>> dataset <- read.csv('train.csv',head=TRUE)
>> dataset$label <- as.factor(dataset$label)
>
> # it has 42000 obs. 785 variables
>> str(dataset)
> 'data.frame': 42000 obs. of 785 variables:
> $ label : Factor w/ 10 levels "0","1","2","3",..: 2 1 2 5 1 1 8 4 6 4 ...
> $ pixel0 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ pixel1 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ pixel2 : int 0 0 0 0 0 0 0 0 0 0 ...
> [list output truncated]
>
> # I make a sampling testset and trainset
>> index <- 1:nrow(dataset)
>> testindex <- sample(index, trunc(length(index)*30/100))
>> testset <- dataset[testindex,]
>> trainset <- dataset[-testindex,]
>
> # build model, predict, view
>> model <- svm(label~., data = trainset, type="C-classification", kernel="radial", gamma=0.0000001, cost=16)
>> prediction <- predict(model, testset)
>> tab <- table(pred = prediction, true = testset[,1])
> true
> pred 0 1 2 3 4 5 6 7 8 9
> 0 1210 0 3 1 0 5 7 2 5 8
> 1 0 1415 2 0 2 1 0 7 5 0
> 2 0 2 1127 12 3 0 2 7 2 0
> 3 0 0 7 1296 0 10 0 2 15 6
> 4 1 1 8 2 1201 2 4 3 5 16
> 5 3 1 0 13 0 1100 3 1 2 3
> 6 3 0 3 0 5 9 1263 0 1 0
> 7 0 2 9 6 6 1 0 1296 1 13
> 8 3 5 7 11 1 2 0 2 1190 4
> 9 1 1 2 3 17 2 0 4 4 1190
>
>
> Ok everything looks great up to this point..........so I try to apply my model to a "real" testset, which is the same format as my previous
> dataset, except it does not have the label/factor column, so its 28000 obs 784 variables:
>
>> testset <- read.csv('test.csv',head=TRUE)
>> str(testset)
> 'data.frame': 28000 obs. of 784 variables:
> $ pixel0 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ pixel1 : int 0 0 0 0 0 0 0 0 0 0 ...
> $ pixel2 : int 0 0 0 0 0 0 0 0 0 0 ...
> [list output truncated]
>
>> prediction <- predict(model, testset)
>> summary(prediction)
> 0 1 2 3 4 5 6 7 8 9
> 2780 3204 2824 2767 2771 2516 2744 2898 2736 2760
>> print(prediction)
> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
> 2 0 9 9 3 7 0 3 0 3 5 7 4 0 4 3 3 1 9 0 9 1 1
> 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
> 5 7 4 2 7 4 7 7 5 4 2 6 2 5 5 1 6 7 7 4 9 8 7
> [list output truncated]
>
>> write(prediction, file="prediction.csv")
> $ head -10 prediction.csv
> 3 1 10 10 4
> 8 1 4 1 4
> 6 8 5 1 5
> 4 4 2 10 1
> 10 2 2 6 8
> 5 3 8 5 8
> 8 6 5 3 7
> 3 6 6 2 7
> 8 8 5 10 9
> 8 9 3 7 8
>
>
> I am obviously making a mistake. Everything is off by a value of 1.
>
>
> Can someone tell me what I am doing wrong?
>
> Brian
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list