[R-sig-Geo] randomForests for mapping vegetation
Tim Howard
tghoward at gw.dec.state.ny.us
Wed Jan 11 14:07:50 CET 2006
Tom,
We are using randomForests to generate predictive models (at 30 meters pixels, with a 22578 x 17160 grid) with about 36 environmental variables. I must admit we have not yet had the energy to do it completely within R or call R from ArcGIS.
We do our data prep and attributing in ArcGIS, then dump it all out to ASCII. We bring the attributed presence and absence points into R and build our random forest. With the known RF model, we then open connections to the ASCII grids for all 36 environmental layers, read them in piece by piece (each one is about 2GB), run a RF prediction on those pieces, and then write the prediction out to an ASCII file.
We then import the prediction layer into ArcGIS. Not pretty by any means, but it does work.
Sincerely,
Tim
------------------------------
Message: 2
Date: Wed, 11 Jan 2006 11:24:51 +0100
From: "Edzer J. Pebesma" <e.pebesma at geog.uu.nl>
Subject: Re: [R-sig-Geo] randomForests for mapping vegetation
To: "Miewald, Tom" <TMiewald at sanborn.com>
Cc: r-sig-geo at stat.math.ethz.ch
Message-ID: <43C4DCF3.1070705 at geog.uu.nl>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Tom, a possibility is to stay in R and use rgdal.
rgdal can open raster maps (and I'm sure landsat images) directly,
and read in parts of them, i.e. it doesn't read the full
map at once. You'd have to loop over the full map, read
a part, predict with randomForest's predict method,
write the predicted values out, and go to the next part.
Support for sp classes is under development, in a
packages called spGDAL which is available in source
code on cvs from sourceforge:
export CVSROOT=:pserver:anonymous at cvs.sf.net:/cvsroot/r-spatial
cvs co spGDAL
spGDAL has support for writing a gdal map, but I'm in
doubt whether it does support writing segments of
a gdal map. It should; please keep us updated on your mileage.
--
Edzer
Miewald, Tom wrote:
>Hello all,
>
>I am new to this list and wondering whether anyone has any experience (or ideas) for how to implement vegetation mapping using the randomForests package from R. The model produced from randomForests would be used to map vegetation from Landsat (30 x 30 meter pixels) for relatively large areas (> 10 million hectares, so a lot of pixels). There are ~15 explanatory data sets (imagery, dems,precip, etc). My main question concerns how to use the output from randomForests to predict vegetation over such an area. I have seen some literature out there using GRASS. I would rather not go down that road because I already have enough software packages. Is there any possibilities for using ArcGIS connectivity to enable the prediction of vegetation? Any input would be appreciated. Thanks!
>Tom
>
>CONFIDENTIALITY AND DISCLAIMER: This message and any attachments hereto are intended only for the use of the addressee(s) and may be legally privileged and/or confidential. Any dissemination, distribution, printing, forwarding, or any method of copying of this message or any attachment hereto, and/or the taking of any action in reliance on the information herein or in any attachment hereto is strictly prohibited except by the original intended recipient. If you have received this communication in error, please immediately notify the sender, and permanently delete this message and any attachment hereto from your computer or storage system, and destroy any printout thereof. Although reasonable precautions have been taken to ensure no viruses are present in this message or any attachment hereto, The Sanborn Map Company, Inc. takes no responsibility and has no liability for any virus which may be transferred via this message or any attachment hereto.
>
>(svr28)
>
>_______________________________________________
>R-sig-Geo mailing list
>R-sig-Geo at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
>
------------------------------
Message: 3
Date: Wed, 11 Jan 2006 11:28:19 +0100 (CET)
From: Roger Bivand <Roger.Bivand at nhh.no>
Subject: Re: [R-sig-Geo] randomForests for mapping vegetation
To: "Miewald, Tom" <TMiewald at sanborn.com>
Cc: r-sig-geo at stat.math.ethz.ch
Message-ID: <Pine.LNX.4.44.0601111110230.30033-100000 at reclus.nhh.no>
Content-Type: TEXT/PLAIN; charset=US-ASCII
On Tue, 10 Jan 2006, Miewald, Tom wrote:
>
> Hello all,
>
> I am new to this list and wondering whether anyone has any experience
> (or ideas) for how to implement vegetation mapping using the
> randomForests package from R. The model produced from randomForests
> would be used to map vegetation from Landsat (30 x 30 meter pixels) for
> relatively large areas (> 10 million hectares, so a lot of pixels).
> There are ~15 explanatory data sets (imagery, dems,precip, etc). My
> main question concerns how to use the output from randomForests to
> predict vegetation over such an area. I have seen some literature out
> there using GRASS. I would rather not go down that road because I
> already have enough software packages. Is there any possibilities for
> using ArcGIS connectivity to enable the prediction of vegetation? Any
> input would be appreciated. Thanks!
Have a look at:
http://www.ci.tuwien.ac.at/Conferences/DSC-2003/Drafts/FurlanelloEtAl.pdf
which is fairly close to your description, although not such a large
number of pixels, and does use R/GRASS integration.
My guess would be that you should tile the region for prediction into
subregions, and patch them together back in AcrGIS. You could do it by
writing out Arc ASCII grids using the write.asciigrid() function in
the maptools package. If this is going to be more heavyweight production,
then using the Rcom interface from VBA in ArcGIS might also be possible,
if the whole process is going to have to be repeated many times. We are
also looking at writing geotiffs from rdgal, so you should be able to find
a suitable route from the subregional predictions within R back to rasters
in ArcGIS.
Examples using VBA are shown here:
http://perso.univ-lr.fr/csaintje/Recherche/RArcgis/index.html
and a nice interface using Python from ArcGIS then Rcom:
http://www.nicholas.duke.edu/geospatial/software/
Most of the hard work will be in getting things to work once, from there
it'll get easier. Consider save()ing the RF model output, so that to make
predictions, you only need to load() then predict() for newdata (grids of
RHS variables) for the current subregion. This also parallelises nicely,
so you could also pass off subregions and the fitted model to slaves to do
the predictions, but getting it working will take substantial time. By the
way, OSX and Linux memory management will be better than Windows, so on
Windows, go for smaller subregions.
Roger
> Tom
>
> CONFIDENTIALITY AND DISCLAIMER: This message and any attachments hereto
> are intended only for the use of the addressee(s) and may be legally
> privileged and/or confidential. Any dissemination, distribution,
> printing, forwarding, or any method of copying of this message or any
> attachment hereto, and/or the taking of any action in reliance on the
> information herein or in any attachment hereto is strictly prohibited
> except by the original intended recipient. If you have received this
> communication in error, please immediately notify the sender, and
> permanently delete this message and any attachment hereto from your
> computer or storage system, and destroy any printout thereof. Although
> reasonable precautions have been taken to ensure no viruses are present
> in this message or any attachment hereto, The Sanborn Map Company, Inc.
> takes no responsibility and has no liability for any virus which may be
> transferred via this message or any attachment hereto.
>
> (svr28)
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no
------------------------------
_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
End of R-sig-Geo Digest, Vol 29, Issue 5
****************************************
Timothy G. Howard, Ecologist and Program Scientist
New York Natural Heritage Program
625 Broadway, 5th floor
Albany, NY 12233-4757
(518) 402-8945
facsimile (518) 402-8925
More information about the R-sig-Geo
mailing list