[R] Problems with reading data by readWorksheetFromFile of XLConnect Package

David Winsemius dwinsemius at comcast.net
Fri May 3 17:54:51 CEST 2013


On May 2, 2013, at 11:00 PM, jpm miao wrote:

> Hi Anthony,
> 
>   Thank you very much. It works very well. However, after this line
> 
>> temp <- sapply( temp , as.numeric )
> 
>   the data becomes a series of numbers instead of a matrix. Is there any
> way to keep it a matrix?

Perhaps (assuming this were a data.frame to be coerced:

temp <- matrix( sapply( temp , as.numeric ), dim(temp)[1]) 

But the persistence of the "-"'s is puzzling. You should (as always) have posted the output from dput(temp).



  Thanks,
> 
> Miao
> 
> 
> 
> 
>> temp<-readWorksheetFromFile("130502temp.xlsx", sheet=1, header=FALSE,
> startRow=2, endRow= 11, startCol=2, endCol=5)
>> temp <- sapply( temp , function( x ) gsub( ',' , '' , x ) )
>> temp
>      Col1     Col2   Col3    Col4
> [1,] "647853" "1413" "57662" "27897"
> [2,] "491400" "1365" "40919" "20411"
> [3,] "38604"  "-"    "5505"  "985"
> [4,] "576"    "-"    "20"    "54"
> [5,] "80845"  "21"   "10211" "4494"
> [6,] "36428"  "27"   "1007"  "1953"
> [7,] "269915" "587"  "32988" "12779"
> [8,] "224494" "-"    "30554" "9184"
> [9,] "11858"  "587"  "-"     "686"
> [10,] "3742"   "-"    "81"    "415"
>> temp <- sapply( temp , as.numeric )
> Warning messages:
> 1: In lapply(X = X, FUN = FUN, ...) : NAs introduced by coercion
> 2: In lapply(X = X, FUN = FUN, ...) : NAs introduced by coercion
> 3: In lapply(X = X, FUN = FUN, ...) : NAs introduced by coercion
> 4: In lapply(X = X, FUN = FUN, ...) : NAs introduced by coercion
> 5: In lapply(X = X, FUN = FUN, ...) : NAs introduced by coercion
>> temp
> 647853 491400  38604    576  80845  36428 269915
> 647853 491400  38604    576  80845  36428 269915
> 224494  11858   3742   1413   1365      -      -
> 224494  11858   3742   1413   1365     NA     NA
>    21     27    587      -    587      -  57662
>    21     27    587     NA    587     NA  57662
> 40919   5505     20  10211   1007  32988  30554
> 40919   5505     20  10211   1007  32988  30554
>     -     81  27897  20411    985     54   4494
>    NA     81  27897  20411    985     54   4494
>  1953  12779   9184    686    415
>  1953  12779   9184    686    415
>> temp[ is.na( temp ) ] <- 0
>> temp
> 647853 491400  38604    576  80845  36428 269915
> 647853 491400  38604    576  80845  36428 269915
> 224494  11858   3742   1413   1365      -      -
> 224494  11858   3742   1413   1365      0      0
>    21     27    587      -    587      -  57662
>    21     27    587      0    587      0  57662
> 40919   5505     20  10211   1007  32988  30554
> 40919   5505     20  10211   1007  32988  30554
>     -     81  27897  20411    985     54   4494
>     0     81  27897  20411    985     54   4494
>  1953  12779   9184    686    415
>  1953  12779   9184    686    415
> 
> 
> 2013/5/2 Anthony Damico <ajdamico at gmail.com>
> 
>> try adding colTypes = 'numeric' to your readWorkSheetFromFile() call
>> 
>> 
>> 
>> if that doesn't work, try a few other steps
>> 
>> 
>> # view what data types your file is being read in as
>> sapply( temp , class )
>> 
>> 
>> # convert all fields to character if they're factor variables.. but i
>> don't think you need this, readWorksheet defaults to `character`
>> temp <- sapply( temp , as.character )
>> 
>> 
>> # you can also convert a subset like this
>> temp[ , c( 1 , 3:4 ) ] <- sapply( temp[ , c( 1 , 3:4 ) ] , as.character )
>> 
>> 
>> 
>> # remove commas from character strings
>> temp <- sapply( temp , function( x ) gsub( ',' , '' , x ) )
>> 
>> # convert all fields to numeric
>> temp <- sapply( temp , as.numeric )
>> 
>> # convert all NA fields to zeroes if you prefer
>> temp[ is.na( temp ) ] <- 0
>> 
>> 
>> 
>> 
>> 
>> On Wed, May 1, 2013 at 11:55 PM, jpm miao <miaojpm at gmail.com> wrote:
>> 
>>> Hi,
>>> 
>>>   Attached are two datasheet to be read.
>>>   My raw data "130502temp.xlsx" contains numbers with ' symbols, and they
>>> can't be read as numbers. Even if I copy and paste as numbers to form a
>>> new
>>> file "130502temp_number1.xlsx", they could not be read smoothly.
>>> 
>>>   1. How can I read the datasheet as numbers?
>>>   2. How can I treat the notation "-" as (1) "NA" or (2) zero?
>>> 
>>>   Thanks,
>>> 
>>> Miao
>>> 
>>> 
>>> 
>>> 
>>>> temp<-readWorksheetFromFile("130502temp.xlsx", sheet=1, header=FALSE,
>>> startRow=2, endRow= 11, startCol=2, endCol=5)
>>> 
>>>> temp
>>> 
>>>      Col1  Col2   Col3   Col4
>>> 
>>> 1  647,853 1,413 57,662 27,897
>>> 
>>> 2  491,400 1,365 40,919 20,411
>>> 
>>> 3   38,604     -  5,505    985
>>> 
>>> 4      576     -     20     54
>>> 
>>> 5   80,845    21 10,211  4,494
>>> 
>>> 6   36,428    27  1,007  1,953
>>> 
>>> 7  269,915   587 32,988 12,779
>>> 
>>> 8  224,494     - 30,554  9,184
>>> 
>>> 9   11,858   587      -    686
>>> 
>>> 10   3,742     -     81    415
>>> 
>>>> temp[2,2]
>>> 
>>> [1] "1,365"
>>> 
>>>> temp[2,2]+3
>>> 
>>> Error in temp[2, 2] + 3 : non-numeric argument to binary operator
>>> 
>>>> temp_num<-readWorksheetFromFile("130502temp_number1.xlsx", sheet=1,
>>> header=FALSE, startRow=2, endRow= 11, startCol=2, endCol=5)
>>> 
>>>> temp_num[2,2]
>>> 
>>> [1] "1,365"
>>> 
>>>> temp_num[2,2]+3
>>> 
>>> Error in temp_num[2, 2] + 3 : non-numeric argument to binary operator
>>> 
>>>> as.numeric(temp_num[2,2])+3
>>> 
>>> [1] NA
>>> 
>>> Warning message:
>>> 
>>> NAs introduced by coercion
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>>> 
>> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list