[R] data after write() is off by 1 ?

Tue Nov 20 20:45:50 CET 2012

A followup to my own post, I believe I figured this out, but if I should be doing something different please correct:

> prediction.out <- levels(prediction)[prediction]
> write(prediction.out, file="prediction.csv")

This gives me my correctly adjusted values

Brian

On Nov 20, 2012, at 2:30 PM, Brian Feeny wrote:

> I am new to R, so I am sure I am making a simple mistake.  I am including complete information in hopes
> someone can help me.
> 
> Basically my data in R looks good, I write it to a file, and every value is off by 1.
> 
> Here is my flow:
> 
>> str(prediction)
> Factor w/ 10 levels "0","1","2","3",..: 3 1 10 10 4 8 1 4 1 4 ...
> - attr(*, "names")= chr [1:28000] "1" "2" "3" "4" ...
>> print(prediction)
>    1     2     3     4     5     6     7     8     9    10    11    12    13    14    15    16    17    18    19    20    21    22    23 
>    2     0     9     9     3     7     0     3     0     3     5     7     4     0     4     3     3     1     9     0     9     1     1 
> 
> ok, so it shows my values are 2, 0, 9, 9, 3 etc
> 
> # I write my file out
> write(prediction, file="prediction.csv")
> 
> # look at the first 10 values
> $ head -10 prediction.csv 
> 3 1 10 10 4
> 8 1 4 1 4
> 6 8 5 1 5
> 4 4 2 10 1
> 10 2 2 6 8
> 5 3 8 5 8
> 8 6 5 3 7
> 3 6 6 2 7
> 8 8 5 10 9
> 8 9 3 7 8
> 
> The complete work of what I did was as follows:
> 
> # First I load in a dataset, label the first column as a factor
>> dataset <- read.csv('train.csv',head=TRUE)
>> dataset$label <- as.factor(dataset$label)
> 
> # it has 42000 obs. 785 variables
>> str(dataset)
> 'data.frame':	42000 obs. of  785 variables:
> $ label   : Factor w/ 10 levels "0","1","2","3",..: 2 1 2 5 1 1 8 4 6 4 ...
> $ pixel0  : int  0 0 0 0 0 0 0 0 0 0 ...
> $ pixel1  : int  0 0 0 0 0 0 0 0 0 0 ...
> $ pixel2  : int  0 0 0 0 0 0 0 0 0 0 ...
>  [list output truncated]
> 
> # I make a sampling testset and trainset
>> index <- 1:nrow(dataset)
>> testindex <- sample(index, trunc(length(index)*30/100))
>> testset <- dataset[testindex,]
>> trainset <- dataset[-testindex,]
> 
> # build model, predict, view
>> model  <- svm(label~., data = trainset, type="C-classification", kernel="radial", gamma=0.0000001, cost=16)
>> prediction <- predict(model, testset)
>> tab <- table(pred = prediction, true = testset[,1])
>    true
> pred    0    1    2    3    4    5    6    7    8    9
>   0 1210    0    3    1    0    5    7    2    5    8
>   1    0 1415    2    0    2    1    0    7    5    0
>   2    0    2 1127   12    3    0    2    7    2    0
>   3    0    0    7 1296    0   10    0    2   15    6
>   4    1    1    8    2 1201    2    4    3    5   16
>   5    3    1    0   13    0 1100    3    1    2    3
>   6    3    0    3    0    5    9 1263    0    1    0
>   7    0    2    9    6    6    1    0 1296    1   13
>   8    3    5    7   11    1    2    0    2 1190    4
>   9    1    1    2    3   17    2    0    4    4 1190
> 
> 
> Ok everything looks great up to this point..........so I try to apply my model to a "real" testset, which is the same format as my previous
> dataset, except it does not have the label/factor column, so its 28000 obs 784 variables:
> 
>> testset <- read.csv('test.csv',head=TRUE)
>> str(testset)
> 'data.frame':	28000 obs. of  784 variables:
> $ pixel0  : int  0 0 0 0 0 0 0 0 0 0 ...
> $ pixel1  : int  0 0 0 0 0 0 0 0 0 0 ...
> $ pixel2  : int  0 0 0 0 0 0 0 0 0 0 ...
>  [list output truncated]
> 
>> prediction <- predict(model, testset)
>> summary(prediction)
>   0    1    2    3    4    5    6    7    8    9 
> 2780 3204 2824 2767 2771 2516 2744 2898 2736 2760 
>> print(prediction)
>    1     2     3     4     5     6     7     8     9    10    11    12    13    14    15    16    17    18    19    20    21    22    23 
>    2     0     9     9     3     7     0     3     0     3     5     7     4     0     4     3     3     1     9     0     9     1     1 
>   24    25    26    27    28    29    30    31    32    33    34    35    36    37    38    39    40    41    42    43    44    45    46 
>    5     7     4     2     7     4     7     7     5     4     2     6     2     5     5     1     6     7     7     4     9     8     7 
>  [list output truncated]
> 
>> write(prediction, file="prediction.csv")
> $ head -10 prediction.csv 
> 3 1 10 10 4
> 8 1 4 1 4
> 6 8 5 1 5
> 4 4 2 10 1
> 10 2 2 6 8
> 5 3 8 5 8
> 8 6 5 3 7
> 3 6 6 2 7
> 8 8 5 10 9
> 8 9 3 7 8
> 
> 
> I am obviously making a mistake.  Everything is off by a value of 1.
> 
> 
> Can someone tell me what I am doing wrong?
> 
> Brian
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.