[R-es] Problemas h2O

Jesús Para Fernández j.para.fernandez en hotmail.com
Dom Ago 20 10:45:05 CEST 2017


Nada, lo he probado y ni con esas...aunque creo que algo si esta haciendo bien, ya que calcula la distancia de una manera increibilemente rapida (aunque no se nota hacerlo en local o en una instancia ec2...)


Lo que me parece raro es que si grabo cada archivo uqe si lo almaceno en una matriz de h2o. Es decir, me va mas rapido este codigo:



for(i in nrow(dos)){

h2o.exportFile(h2o.distance(uno,dos[i,]),paste0("/home/jesus/datos",i,".csv"))

}


que este:


matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos))

for(i in nrow(dos)){

matriz[,i]<-h2o.distance(uno,dos[i,])

}




Lo cual a primera vista no tendriamucho sentido, que se tarde menos en escribir en disco que en una matriz. (aunque la matriz tiene un peso de 2.7 gb...)





________________________________
De: Jesús Para Fernández <j.para.fernandez en hotmail.com>
Enviado: sábado, 19 de agosto de 2017 21:26
Para: r-help-es en r-project.org
Asunto: Problemas h2O


Buenas


Estoy usando H20 en local y tb en un ec2 de amazon, pero tengo que tener algo mal configurado seguro.

Para iniciarlo, hago lo siguiente:

conexion<-h2o.init()


Me arranca el cluster con el maximo de cores y memoria que se permite.


Una vez hech oesto, quiero calcular la distancia entre dos data.frames:


uno<-data.frame(matrix(rnorm(300000),ncol=10))

dos<-data.frame(matrix(rnorm(500),ncol=10))

uno<-as.h2o(uno)

dos<-as.h2o(dos)

matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos))

for(i in nrow(dos)){

matriz[,i]<-h2o.distance(uno,dos[i,])

}



Al hacerlo, y haciendo uso de htop veo que de lso 4 nucleos de mi pc o los 16 del ec2 de amazon, solo se usa uno, y es mas, en el ec2 esta tardando en ejecutarlo mas que en el pc.

Por ello creo que no esta paralelizando bien. ¿A alguien le ha ocurrido?


Si hago un h2o.clusterStatus() me aparece que esta todo OK

R version 3.4.1 (2017-06-30) -- "Single Candle"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R es un software libre y viene sin GARANTIA ALGUNA.
Usted puede redistribuirlo bajo ciertas circunstancias.
Escriba 'license()' o 'licence()' para detalles de distribucion.

R es un proyecto colaborativo con muchos contribuyentes.
Escriba 'contributors()' para obtener más información y
'citation()' para saber cómo citar R o paquetes de R en publicaciones.

Escriba 'demo()' para demostraciones, 'help()' para el sistema on-line de ayuda,
o 'help.start()' para abrir el sistema de ayuda HTML con su navegador.
Escriba 'q()' para salir de R.

[Workspace loaded from ~/.RData]

> library(h2o)

----------------------------------------------------------------------

Your next step is to start H2O:
    > h2o.init()

For H2O package documentation, ask for help:
    > ??h2o

After starting H2O, you can use the Web UI at http://localhost:54321
For more information visit http://docs.h2o.ai

----------------------------------------------------------------------


Attaching package: ‘h2o’

The following objects are masked from ‘package:stats’:

    cor, sd, var

The following objects are masked from ‘package:base’:

    ||, &&, %*%, apply, as.factor, as.numeric, colnames, colnames<-, ifelse, %in%,
    is.character, is.factor, is.numeric, log, log10, log1p, log2, round, signif, trunc

> h2o.init()

H2O is not running yet, starting it now...

Note:  In case of errors look at the following log files:
    /tmp/RtmpbuV3iD/h2o_jesus_started_from_r.out
    /tmp/RtmpbuV3iD/h2o_jesus_started_from_r.err

java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

Starting H2O JVM and connecting: .. Connection successful!

R is connected to the H2O cluster:
    H2O cluster uptime:         1 seconds 905 milliseconds
    H2O cluster version:        3.10.5.3
    H2O cluster version age:    1 month and 20 days
    H2O cluster name:           H2O_started_from_R_jesus_rqh095
    H2O cluster total nodes:    1
    H2O cluster total memory:   1.71 GB
    H2O cluster total cores:    4
    H2O cluster allowed cores:  4
    H2O cluster healthy:        TRUE
    H2O Connection ip:          localhost
    H2O Connection port:        54321
    H2O Connection proxy:       NA
    H2O Internal Security:      FALSE
    R Version:                  R version 3.4.1 (2017-06-30)

> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  679385 36.3    1168576 62.5   940480 50.3
Vcells 1138497  8.7    1920143 14.7  1532430 11.7
> rm(list=ls())
> datos<-read.table("/home/jesus/master/datos/datos-balanceado/datos-100/datos.csv",header=T,dec=".",sep=",")
> uno<-datos[datos$InspectionReport == "ACCEPTED",]
> dos<-datos[datos$InspectionReport != "ACCEPTED",]
> uno$InspectionReport<-NULL
> dos$InspectionReport<-NULL
> uno2<-as.h2o(uno)
  |=======================================================================================| 100%
> dos2<-as.h2o(dos)
  |=======================================================================================| 100%
> h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv"))
Error in !missing(row) && !(base::is.character(row)) :
  objeto 'i' no encontrado
> i<-1
> h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv"))
  |=======================================================================================| 100%
> t=Sys.time()
> for(i in 1:10){
+  h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv"))
+
+ }

ERROR: Unexpected HTTP Status code: 412 Precondition Failed (url = http://localhost:54321/3/Frames/RTMP_sid_8e47_4/export)

water.exceptions.H2OIllegalArgumentException
 [1] "water.exceptions.H2OIllegalArgumentException: Illegal argument: /home/jesus/master/datos/datos-balanceado/matriz-overlapping/k1.csv of function: exportFrame: File /home/jesus/master/datos/datos-balanceado/matriz-overlapping/k1.csv already exists!"
 [2] "    water.fvec.Frame.export(Frame.java:1370)"
 [3] "    water.api.FramesHandler.export(FramesHandler.java:258)"
 [4] "    sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)"
 [5] "    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)"
 [6] "    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)"
 [7] "    java.lang.reflect.Method.invoke(Method.java:498)"
 [8] "    water.api.Handler.handle(Handler.java:63)"
 [9] "    water.api.RequestServer.serve(RequestServer.java:448)"
[10] "    water.api.RequestServer.doGeneric(RequestServer.java:297)"
[11] "    water.api.RequestServer.doPost(RequestServer.java:223)"
[12] "    javax.servlet.http.HttpServlet.service(HttpServlet.java:755)"
[13] "    javax.servlet.http.HttpServlet.service(HttpServlet.java:848)"
[14] "    org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)"
[15] "    org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)"
[16] "    org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)"
[17] "    org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429)"
[18] "    org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)"
[19] "    org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)"
[20] "    org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)"
[21] "    org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)"
[22] "    water.JettyHTTPD$LoginHandler.handle(JettyHTTPD.java:183)"
[23] "    org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)"
[24] "    org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)"
[25] "    org.eclipse.jetty.server.Server.handle(Server.java:370)"
[26] "    org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)"
[27] "    org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)"
[28] "    org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:982)"
[29] "    org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1043)"
[30] "    org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865)"
[31] "    org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)"
[32] "    org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)"
[33] "    org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)"
[34] "    org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)"
[35] "    org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)"
[36] "    java.lang.Thread.run(Thread.java:748)"

Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page,  :


ERROR MESSAGE:

Illegal argument: /home/jesus/master/datos/datos-balanceado/matriz-overlapping/k1.csv of function: exportFrame: File /home/jesus/master/datos/datos-balanceado/matriz-overlapping/k1.csv already exists!

>
> print(Sys.time()-t)
Time difference of 0.218374 secs
> ?h2o.exportFile
> t=Sys.time()
> for(i in 1:10){
+  h2o.exportFile(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv"),force=T)
+
+ }
  |=======================================================================================| 100%
  |=======================================================================================| 100%
  |=======================================================================================| 100%
  |=======================================================================================| 100%
  |=======================================================================================| 100%
  |=======================================================================================| 100%
  |=======================================================================================| 100%
  |=======================================================================================| 100%
  |=======================================================================================| 100%
  |=======================================================================================| 100%
>
> print(Sys.time()-t)
Time difference of 11.31977 secs
> matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos))
  |=======================================================================================| 100%
> t=Sys.time()
> for(i in 1:10){
+
+  matriz[,j]<-h2o.distance(uno2,dos2[i,])
+ }
Error in !allCol && is.na(col) : objeto 'j' no encontrado
>
> print(Sys.time()-t)
Time difference of 0.006168127 secs
> t=Sys.time()
> for(i in 1:10){
+
+  matriz[,i]<-h2o.distance(uno2,dos2[i,])
+ }
>
> print(Sys.time()-t)
Time difference of 30.33803 secs
> 10/30
[1] 0.3333333
> 30/10*nrow(dos)
[1] 16068
> 30/10*nrow(dos)
[1] 16068
> 30/10*nrow(dos)/3600
[1] 4.463333
> library(data.table)
data.table 1.10.4
  The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way
  Documentation: ?data.table, example(data.table) and browseVignettes("data.table")
  Release notes, videos and slides: http://r-datatable.com

Attaching package: ‘data.table’

The following objects are masked from ‘package:h2o’:

    hour, month, week, year

> t=Sys.time()
> for(i in 1:10){
+ fwrite(h2o.distance(uno2,dos2[i,]),paste0("/home/jesus/master/datos/datos-balanceado/matriz-overlapping/k",i,".csv"))
+
+ }
Error: is.list(x) is not TRUE
> print(Sys.time()-t)
Time difference of 0.1015148 secs
> ?fwrite
> matriz<-h2o.createFrame(rows=nrow(uno),cols=nrow(dos))
  |=======================================================================================| 100%
> t=Sys.time()
> for(i in 1:10){
+
+  matriz[,i]<-h2o.distance(uno2,dos2[i,])
+ }
>
> print(Sys.time()-t)
Time difference of 28.89684 secs
> 30/10
[1] 3
> 30/10*nrow(dos)
[1] 16068
> 30/10*nrow(dos)/3600
[1] 4.463333
> t=Sys.time()
> for(i in 1:50){
+
+  matriz[,i]<-h2o.distance(uno2,dos2[i,])
+ }
>
> print(Sys.time()-t)
Time difference of 2.506209 mins
> 2*60+30
[1] 150
> 150/50
[1] 3
> h2o.cluster_sizes()
Error in .model.parts(object) :
  el argumento "object" está ausente, sin valor por omisión
> h2o.clusterStatus()
Version: 3.10.5.3
Cluster name: H2O_started_from_R_jesus_rqh095
Cluster size: 1
Cluster is locked

                        h2o healthy   last_ping num_cpus sys_load mem_value_size   free_mem
1 localhost/127.0.0.1:54321    TRUE 1.50317e+12        4     0.55      707369984 1129209856
  pojo_mem swap_mem  free_disk    max_disk  pid num_keys tcps_active open_fds rpcs_active
1        0        0 6571425792 20121124864 8553    16417           0       38           0
> ?h2o.init()
> h2o.clusterIsUp()
[1] TRUE
> h2o.clusterInfo()
R is connected to the H2O cluster:
    H2O cluster uptime:         1 hours 31 minutes
    H2O cluster version:        3.10.5.3
    H2O cluster version age:    1 month and 20 days
    H2O cluster name:           H2O_started_from_R_jesus_rqh095
    H2O cluster total nodes:    1
    H2O cluster total memory:   1.05 GB
    H2O cluster total cores:    4
    H2O cluster allowed cores:  4
    H2O cluster healthy:        TRUE
    H2O Connection ip:          localhost
    H2O Connection port:        54321
    H2O Connection proxy:       NA
    H2O Internal Security:      FALSE
    R Version:                  R version 3.4.1 (2017-06-30)
> h2o.cluster_sizes(dos2)
Error in .model.parts(object) :
  tentativa de obtener un slot "model" de un objeto cuya clase ("H2OFrame") que no es un objecto clase S4
> h2o.clusterInfo()
R is connected to the H2O cluster:
    H2O cluster uptime:         1 hours 37 minutes
    H2O cluster version:        3.10.5.3
    H2O cluster version age:    1 month and 20 days
    H2O cluster name:           H2O_started_from_R_jesus_rqh095
    H2O cluster total nodes:    1
    H2O cluster total memory:   1.05 GB
    H2O cluster total cores:    4
    H2O cluster allowed cores:  4
    H2O cluster healthy:        TRUE
    H2O Connection ip:          localhost
    H2O Connection port:        54321
    H2O Connection proxy:       NA
    H2O Internal Security:      FALSE
    R Version:                  R version 3.4.1 (2017-06-30)



Gracias

Jesus



	[[alternative HTML version deleted]]



Más información sobre la lista de distribución R-help-es