[R] Warning message: NAs introduced by coercion

David L Carlson dc@rl@on @ending from t@mu@edu
Wed Jan 9 20:23:26 CET 2019


Now you have pushed a numeric matrix to the function with a column of missing values. No wonder you do not get any results. 

David C

-----Original Message-----
From: N Meriam [mailto:meriam.nef using gmail.com] 
Sent: Tuesday, January 8, 2019 3:44 PM
To: David L Carlson <dcarlson using tamu.edu>
Cc: Michael Dewey <lists using dewey.myzen.co.uk>; r-help using r-project.org
Subject: Re: [R] Warning message: NAs introduced by coercion

Yes, sorry. I attached the file once again.
Well, still getting the same warning.

> class(genod) <- "numeric"
Warning message:
In class(genod) <- "numeric" : NAs introduced by coercion
> class(genod)
[1] "matrix"

Then, I run the following code and it gives this:

> filn <-"simTunesian.gds"
> snpgdsCreateGeno(filn, genmat = genod,
+                  sample.id = sample.id, snp.id = snp.id,
+                  snp.chromosome = snp.chromosome,
+                  snp.position = snp.position,
+                  snp.allele = snp.allele, snpfirstdim=TRUE)
> # calculate similarity matrix
> # Open the GDS file
> (genofile <- snpgdsOpen(filn))
File: C:\Users\DELL\Documents\TEST\simTunesian.gds (1.4M)
+    [  ] *
|--+ sample.id   { Str8 363 ZIP_ra(42.5%), 755B }
|--+ snp.id   { Int32 15752 ZIP_ra(35.1%), 21.6K }
|--+ snp.position   { Int32 15752 ZIP_ra(34.7%), 21.3K }
|--+ snp.chromosome   { Float64 15752 ZIP_ra(0.18%), 230B }
|--+ snp.allele   { Str8 15752 ZIP_ra(0.16%), 108B }
\--+ genotype   { Bit2 15752x363, 1.4M } *
> ibs <- snpgdsIBS(genofile, remove.monosnp = FALSE, num.thread=1)
Identity-By-State (IBS) analysis on genotypes:
Excluding 0 SNP on non-autosomes
Working space: 363 samples, 15,752 SNPs
    using 1 (CPU) core
IBS:    the sum of all selected genotypes (0,1,2) = 3658952
Tue Jan 08 15:38:00 2019    (internal increment: 42880)
[==================================================] 100%, completed in 0s
Tue Jan 08 15:38:00 2019    Done.
> # maximum similarity value
> max(ibs$ibs)
[1] NaN
> # minimum similarity value
> min(ibs$ibs)
[1] NaN

As you can see, I can't continue my analysis (heat map plot,
clustering with hclust) because values are NaN.


On Tue, Jan 8, 2019 at 2:01 PM David L Carlson <dcarlson using tamu.edu> wrote:
>
> Your attached file is not a .csv file since the field are not separated by commas (just rename the mydata.csv to mydata.txt).
>
> The command "genod2 <- as.matrix(genod)" created a character matrix from the data frame genod.  When you try to force genod2 to numeric, the marker column becomes NAs which is probably not what you want.
>
> The error message is because you passed genod (a data frame) to the snpgdsCreateGeno() function not genod2 (the matrix you created from genod).
>
> ------------------------------------
> David L. Carlson
> Department of Anthropology
> Texas A&M University
>
> -----Original Message-----
> From: R-help [mailto:r-help-bounces using r-project.org] On Behalf Of N Meriam
> Sent: Tuesday, January 8, 2019 1:38 PM
> To: Michael Dewey <lists using dewey.myzen.co.uk>
> Cc: r-help using r-project.org
> Subject: Re: [R] Warning message: NAs introduced by coercion
>
> Here's a portion of what my data looks like (text file format attached).
> When running in R, it gives me this:
>
> > df4 <- read.csv(file = "mydata.csv", header = TRUE)
> > require(SNPRelate)
> > library(gdsfmt)
> > myd <- df4
> > myd <- df4
> > names(myd)[-1]
> [1] "marker" "X88"    "X9"     "X17"    "X25"
> > myd[,1]
> [1]  3  4  5  6  8 10
> # the data must be 0,1,2 with 3 as missing so you have r
> > sample.id <- names(myd)[-1]
> > snp.id <- myd[,1]
> > snp.position <- 1:length(snp.id) # not needed for ibs
> > snp.chromosome <- rep(1, each=length(snp.id)) # not needed for ibs
> > snp.allele <- rep("A/G", length(snp.id)) # not needed for ibs
> # genotype data must have - in 3
> > genod <- myd[,-1]
> > genod[is.na(genod)] <- 3
> > genod[genod=="0"] <- 0
> > genod[genod=="1"] <- 2
> > genod2 <- as.matrix(genod)
> > head(genod2)
>          marker                                             X88   X9
>  X17   X25
> [1,]  "100023173|F|0-47:G>A-47:G>A"     "0"    "3"    "3"     "3"
> [2,]  "1043336|F|0-7:A>G-7:A>G"             "2"    "0"    "3"     "0"
> [3,]  "1212218|F|0-49:A>G-49:A>G"         "0"    "0"    "0"     "0"
> [4,]  "1019554|F|0-14:T>C-14:T>C"           "0"   "0"    "3"     "0"
> [5,]  "100024550|F|0-16:G>A-16:G>A"     "3"    "3"    "3"     "3"
> [6,]  "1106702|F|0-8:C>A-8:C>A"              "0"   "0"     "0"     "0"
> > class(genod2) <- "numeric"
> Warning message: In class(genod2) <- "numeric" : NAs introduced by coercion
> > head(genod2)
>         marker   X88  X9   X17  X25
> [1,]     NA         0      3     3       3
> [2,]     NA         2      0     3       0
> [3,]     NA         0      0     0       0
> [4,]     NA         0      0     3       0
> [5,]     NA         3      3     3       3
> [6,]     NA         0      0     0       0
> > class(genod2) <- "numeric"
> > class(genod2)
> [1] "matrix"
> # read data
> > filn <-"simTunesian.gds"
> > snpgdsCreateGeno(filn, genmat = genod,
> +                  sample.id = sample.id, snp.id = snp.id,
> +                  snp.chromosome = snp.chromosome,
> +                  snp.position = snp.position,
> +                  snp.allele = snp.allele, snpfirstdim=TRUE)
> Error in snpgdsCreateGeno(filn, genmat = genod, sample.id = sample.id,
>  :   is.matrix(genmat) is not TRUE
>
> Can't find a solution to my problem...my guess is that the problem
> comes from converting the column 'marker' factor to numerical.
>
> Best,
> Meriam
>
> On Tue, Jan 8, 2019 at 11:28 AM Michael Dewey <lists using dewey.myzen.co.uk> wrote:
> >
> > Dear Meriam
> >
> > Your csv file did not come through as attachments are stripped unless of
> > certain types and you post is very hard to read since you are posting in
> > HTML. Try renaming the file to ????.txt and set your mailer to send
> > plain text then people may be able to help you better.
> >
> > Michael
> >
> > On 08/01/2019 15:35, N Meriam wrote:
> > > I see...
> > > Here's a portion of what my data looks like (csv file attached).
> > > I run again and here are the results:
> > >
> > > df4 <- read.csv(file = "mydata.csv", header = TRUE)
> > >
> > >> require(SNPRelate)> library(gdsfmt)> myd <- df4> myd <- df4> names(myd)[-1][1] "marker" "X88"    "X9"     "X17"    "X25"
> > >
> > >> myd[,1][1]  3  4  5  6  8 10
> > >
> > >
> > >> # the data must be 0,1,2 with 3 as missing so you have r> sample.id <- names(myd)[-1]> snp.id <- myd[,1]> snp.position <- 1:length(snp.id) # not needed for ibs> snp.chromosome <- rep(1, each=length(snp.id)) # not needed for ibs> snp.allele <- rep("A/G", length(snp.id)) # not needed for ibs> # genotype data must have - in 3> genod <- myd[,-1]> genod[is.na(genod)] <- 3> genod[genod=="0"] <- 0> genod[genod=="1"] <- 2
> > >
> > >> genod2 <- as.matrix(genod)> head(genod2)     marker                        X88 X9  X17 X25
> > > [1,] "100023173|F|0-47:G>A-47:G>A" "0" "3" "3" "3"
> > > [2,] "1043336|F|0-7:A>G-7:A>G"     "2" "0" "3" "0"
> > > [3,] "1212218|F|0-49:A>G-49:A>G"   "0" "0" "0" "0"
> > > [4,] "1019554|F|0-14:T>C-14:T>C"   "0" "0" "3" "0"
> > > [5,] "100024550|F|0-16:G>A-16:G>A" "3" "3" "3" "3"
> > > [6,] "1106702|F|0-8:C>A-8:C>A"     "0" "0" "0" "0"
> > >
> > >> class(genod2) <- "numeric"Warning message:In class(genod2) <- "numeric" : NAs introduced by coercion> head(genod2)
> > >
> > >   marker X88 X9 X17 X25
> > > [1,]     NA   0  3   3   3
> > > [2,]     NA   2  0   3   0
> > > [3,]     NA   0  0   0   0
> > > [4,]     NA   0  0   3   0
> > > [5,]     NA   3  3   3   3
> > > [6,]     NA   0  0   0   0
> > >
> > >> class(genod2) <- "numeric"> class(genod2)[1] "matrix"
> > >
> > >> # read data > filn <-"simTunesian.gds"> snpgdsCreateGeno(filn, genmat = genod,+                  sample.id = sample.id, snp.id = snp.id,+                  snp.chromosome = snp.chromosome,+                  snp.position = snp.position,+                  snp.allele = snp.allele, snpfirstdim=TRUE)Error in snpgdsCreateGeno(filn, genmat = genod, sample.id = sample.id,  :
> > >    is.matrix(genmat) is not TRUE
> > >
> > > Thanks,
> > > Meriam
> > >
> > > On Tue, Jan 8, 2019 at 9:02 AM PIKAL Petr <petr.pikal using precheza.cz> wrote:
> > >
> > >> Hi
> > >>
> > >> see in line
> > >>
> > >>> -----Original Message-----
> > >>> From: R-help <r-help-bounces using r-project.org> On Behalf Of N Meriam
> > >>> Sent: Tuesday, January 8, 2019 3:08 PM
> > >>> To: r-help using r-project.org
> > >>> Subject: [R] Warning message: NAs introduced by coercion
> > >>>
> > >>> Dear all,
> > >>>
> > >>> I have a .csv file called df4. (15752 obs. of 264 variables).
> > >>> I apply this code but couldn't continue further other analyses, a warning
> > >>> message keeps coming up. Then, I want to determine max and min
> > >>> similarity values,
> > >>> heat map plot, cluster...etc
> > >>>
> > >>>> require(SNPRelate)
> > >>>> library(gdsfmt)
> > >>>> myd <- read.csv(file = "df4.csv", header = TRUE)
> > >>>> names(myd)[-1]
> > >>> myd[,1]
> > >>>> myd[1:10, 1:10]
> > >>>   # the data must be 0,1,2 with 3 as missing so you have r
> > >>>> sample.id <- names(myd)[-1]
> > >>>> snp.id <- myd[,1]
> > >>>> snp.position <- 1:length(snp.id) # not needed for ibs
> > >>>> snp.chromosome <- rep(1, each=length(snp.id)) # not needed for ibs
> > >>>> snp.allele <- rep("A/G", length(snp.id)) # not needed for ibs
> > >>> # genotype data must have - in 3
> > >>>> genod <- myd[,-1]
> > >>>> genod[is.na(genod)] <- 3
> > >>>> genod[genod=="0"] <- 0
> > >>>> genod[genod=="1"] <- 2
> > >>>> genod[1:10,1:10]
> > >>>> genod <- as.matrix(genod)
> > >>
> > >> matrix can have only one type of data so you probaly changed it to
> > >> character by such construction.
> > >>
> > >>>> class(genod) <- "numeric"
> > >>
> > >> This tries to change all "numeric" values to numbers but if it cannot it
> > >> sets it to NA.
> > >>
> > >> something like
> > >>
> > >>> head(iris)
> > >>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
> > >> 1          5.1         3.5          1.4         0.2  setosa
> > >> 2          4.9         3.0          1.4         0.2  setosa
> > >> 3          4.7         3.2          1.3         0.2  setosa
> > >> 4          4.6         3.1          1.5         0.2  setosa
> > >> 5          5.0         3.6          1.4         0.2  setosa
> > >> 6          5.4         3.9          1.7         0.4  setosa
> > >>> ir <-head(iris)
> > >>> irm <- as.matrix(ir)
> > >>> head(irm)
> > >>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
> > >> 1 "5.1"        "3.5"       "1.4"        "0.2"       "setosa"
> > >> 2 "4.9"        "3.0"       "1.4"        "0.2"       "setosa"
> > >> 3 "4.7"        "3.2"       "1.3"        "0.2"       "setosa"
> > >> 4 "4.6"        "3.1"       "1.5"        "0.2"       "setosa"
> > >> 5 "5.0"        "3.6"       "1.4"        "0.2"       "setosa"
> > >> 6 "5.4"        "3.9"       "1.7"        "0.4"       "setosa"
> > >>> class(irm) <- "numeric"
> > >> Warning message:
> > >> In class(irm) <- "numeric" : NAs introduced by coercion
> > >>> head(irm)
> > >>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
> > >> 1          5.1         3.5          1.4         0.2      NA
> > >> 2          4.9         3.0          1.4         0.2      NA
> > >> 3          4.7         3.2          1.3         0.2      NA
> > >> 4          4.6         3.1          1.5         0.2      NA
> > >> 5          5.0         3.6          1.4         0.2      NA
> > >> 6          5.4         3.9          1.7         0.4      NA
> > >>>
> > >>
> > >> Cheers
> > >> Petr
> > >>
> > >>
> > >>>
> > >>>
> > >>> *Warning message:In class(genod) <- "numeric" : NAs introduced by
> > >> coercion*
> > >>>
> > >>> Maybe I could illustrate more with details so I can be more specific?
> > >>> Please, let me know.
> > >>>
> > >>> I would appreciate your help.
> > >>> Thanks,
> > >>> Meriam
> > >>>
> > >>> [[alternative HTML version deleted]]
> > >>>
> > >>> ______________________________________________
> > >>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>> PLEASE do read the posting guide
> > >> http://www.R-project.org/posting-guide.html
> > >>> and provide commented, minimal, self-contained, reproducible code.
> > >> Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních
> > >> partnerů PRECHEZA a.s. jsou zveřejněny na:
> > >> https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information
> > >> about processing and protection of business partner’s personal data are
> > >> available on website:
> > >> https://www.precheza.cz/en/personal-data-protection-principles/
> > >> Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou
> > >> důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení
> > >> odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any
> > >> documents attached to it may be confidential and are subject to the legally
> > >> binding disclaimer: https://www.precheza.cz/en/01-disclaimer/
> > >>
> > >>
> > >
> >
> > --
> > Michael
> > http://www.dewey.myzen.co.uk/home.html
>
>
>
> --
> Meriam Nefzaoui
> MSc. in Plant Breeding and Genetics
> Universidade Federal Rural de Pernambuco (UFRPE) - Recife, Brazil



-- 
Meriam Nefzaoui
MSc. in Plant Breeding and Genetics
Universidade Federal Rural de Pernambuco (UFRPE) - Recife, Brazil


More information about the R-help mailing list