Alex Zvoleff

Calculating rotation invariant GLCM textures

2014-10-21T00:00:00+00:00

This post outlines how to use the glcm package to calculate image textures that are direction invariant (calculated over “all directions”). This feature is only available in glcm versions >= 1.0.

Getting started

First use install the latest version of glcm, and the raster package that is also needed for this example:

install.packages("glcm")

## Installing package into 'C:/Users/azvoleff/R/win-library/3.1'
## (as 'lib' is unspecified)

## package 'glcm' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
## 	C:\Users\azvoleff\AppData\Local\Temp\Rtmp2j8wNL\downloaded_packages

library(glcm)
library(raster)

## Loading required package: sp

Calculating rotationally invariant textures

glcm supports calculating GLCMs using multiple shift values. If multiple shifts are supplied, glcm will calculate each texture statistic using each of the specified shifts, and return the mean value of the texture for each pixel.
In general, I have not found large differences in calculated image textures when comparing GLCM textures calculated using a single shift versus calculating rotationally invariant textures. However this may not be the case for images with strongly directional textures.

To compare for a sample cropped out of a Landsat scene, use the L5TSR_1986 sample image included in the glcm package. This is a section of a 1986 Landsat 5 image preprocessed to surface reflectance. The image is from the Volcán Barva TEAM site.

When glcm is run without specifing a shift, the default shift (1, 1) is used (90 degrees), with a window size of 3 pixels x 3 pixels. Below is an example from running glcm with the default parameters:

test_rast <- raster(L5TSR_1986, layer=1)
tex_shift1 <- glcm(test_rast)
plot(tex_shift1)

To calculate rotationally invariant GLCM textures (over “all directions” in the terminology of commonly used remote sensing software), use: shift=list(c(0,1), c(1,1), c(1,0), c(1,-1)). This will calculate the average GLCM texture using shifts of 0 degrees, 45 degrees, 90 degrees, and 135 degrees:

tex_all_dir <- glcm(test_rast, shift=list(c(0,1), c(1,1), c(1,0), c(1,-1)))
plot(tex_all_dir)

To compare the difference between these textures, subtract the textures calculated with a 90 degree shift from those calculated using multiple shifts, and plot the result:

plot((tex_all_dir - tex_shift1) / tex_all_dir)

Computation time

First look at the time difference for calculating a GLCM with only one shift versus calculating a rotationally invariant form:

library(microbenchmark)

glcm_one_dir <- function(x) {
    glcm(x)
}

glcm_all_dir <- function(x) {
    glcm(x, shift=list(c(0,1), c(1,1), c(1,0), c(1,-1)))
}

microbenchmark(glcm_one_dir(test_rast), glcm_all_dir(test_rast), times=5)

## Unit: seconds
##                     expr      min       lq     mean   median       uq
##  glcm_one_dir(test_rast) 1.090759 1.117674 1.141704 1.146656 1.154196
##  glcm_all_dir(test_rast) 4.090347 4.108833 4.189145 4.116991 4.164241
##       max neval
##  1.199236     5
##  4.465313     5

As seen in the above, there is a performance penalty for using a rotationally invariant GLCM (not surprisingly, as more calculations are involved).

Prior to having the ability to use multiple shifts hardcoded in glcm, it was still possible to calculate rotationally invariant textures using the glcm function. However, the calculation had to be done manually, using an approach similar to what I do below with glcm_all_dir_manual. How much faster is it perform the averaging directly in glcm?

glcm_all_dir_manual <- function(x) {
    text_0deg <- glcm(x, shift=c(0,1))
    text_45deg <- glcm(x, shift=c(1,1))
    text_90deg <- glcm(x, shift=c(1,0))
    text_135deg <- glcm(x, shift=c(1,-1))
    overlay(text_0deg, text_45deg, text_90deg, text_135deg,
            fun=function(w, x, y, z) {
                return((w + x + y + z) / 4)
            })
}
tex_all_dir_manual <- glcm_all_dir_manual(test_rast)

# Check that the textures match
table(getValues(tex_all_dir_manual) == getValues(tex_all_dir))

## 
##   TRUE 
## 273488

microbenchmark(glcm_all_dir_manual(test_rast), glcm_all_dir(test_rast), 
               times=5)

## Unit: seconds
##                            expr      min       lq     mean   median
##  glcm_all_dir_manual(test_rast) 4.493543 4.501502 4.678655 4.656803
##         glcm_all_dir(test_rast) 4.134398 4.190528 4.267464 4.225021
##        uq      max neval
##  4.809615 4.931815     5
##  4.304919 4.482455     5

The time difference isn’t that great, but the need for repeated calls to glcm (and the need for multiple read/writes to disk for large files) could lead to a more substantial advantage for the direct approach with glcm than is apparent in this simple example. Of course, the manual approach does give more flexibility if you need to do other processing (or scaling, etc.) to the textures.

Calculating rotation invariant GLCM textures was originally published by Alex Zvoleff at Alex Zvoleff on October 21, 2014.

glcm 1.0 released

2014-09-26T00:00:00+00:00

I have released to CRAN version 1.0 of the “glcm” R package for calculating image texture measures from grey-level co-occurrence matrices (GLCMs). Type:

install.packages("glcm")

at your R command prompt to download the latest CRAN release. This version contains several new features, most importantly the ability to calculate rotation invariant textures, and to automatically handle images that cannot fit in memory (using features from the excellent raster package).

glcm 1.0 released was originally published by Alex Zvoleff at Alex Zvoleff on September 26, 2014.

glcm 0.3.2 released

2014-07-31T00:00:00+00:00

I just released to CRAN a new version of the “glcm” R package for calculating image texture measures from grey-level co-occurrence matrices (GLCMs).

Version 0.3.2 fixes a minor bug in the projection assigned to the test image included in glcm. The 1.0 release of glcm, which will support parallel computation of GLCMs and computation of GLCMs over all directions, will be coming soon - stay tuned. Type

install.packages("glcm")

at your R command prompt to download the latest CRAN release. See the NEWS file for more details.

glcm 0.3.2 released was originally published by Alex Zvoleff at Alex Zvoleff on July 31, 2014.

Filtering available Landsat scenes with teamlucc

2014-05-05T00:00:00+00:00

Overview

This post outlines how to use the teamlucc package to filter the Landsat imagery available in the archives for a particular study site. teamlucc includes several functions to make plots to assist with selecting anniversary dates (or near anniversary dates…) when multiple Landsat path/rows are needed to cover a single site. The teamlucc package also has functions to output an order text file in the proper format for the USGS ESPA system and to automatically download and verify the completed portions of a USGS ESPA order.

Getting started

First load the devtools package, used for installing teamlucc. Install the devtools package if it is not already installed:

if (!require(devtools)) install.packages('devtools')

Now load the teamlucc package, using devtools to install it from github if it is not yet installed. Also load the rgdal package needed for reading/writing shapefiles:

if (!require(teamlucc)) install_github('azvoleff/teamlucc')

## Loading required package: teamlucc
## Loading required package: Rcpp
## Loading required package: raster
## Loading required package: sp

## Warning: replacing previous import by 'raster::buffer' when loading
## 'teamlucc'

## Warning: replacing previous import by 'raster::interpolate' when loading
## 'teamlucc'

## Warning: replacing previous import by 'raster::rotated' when loading
## 'teamlucc'

library(rgdal)

## rgdal: version: 0.9-1, (SVN revision 518)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 1.11.0, released 2014/04/16
## Path to GDAL shared files: C:/Users/azvoleff/R/win-library/3.1/rgdal/gdal
## GDAL does not use iconv for recoding strings.
## Loaded PROJ.4 runtime: Rel. 4.8.0, 6 March 2012, [PJ_VERSION: 480]
## Path to PROJ.4 shared files: C:/Users/azvoleff/R/win-library/3.1/rgdal/proj

Downloading list of available Landsat scenes

Start by setting up an account on USGS Earth Explorer. After setting up an account, login to the system, and upload a shapefile or draw a polygon to indicate your area of interest. After doing so, click the “Data Sets > >” button on the lower left of the screen. For this example, we will be using Landsat CDR Surface Reflectance imagery. Click the “+” next to “Landsat CDR”, and then click the checkboxes for both “Land Surface Reflectance - L7 ETM+” and also “Land Surface Reflectance - L4-5 TM”:

Now click the “Results > >” button.

You will now see results for one of the two (Landsat 4-5 and Landsat 7) datasets you selected. Look under “Data Set” on the left side of the screen, and it will say “Land Surface Reflectance - L7 ETM+” if it is displaying the Landsat 7 CDR dataset. You will need to separately export the Landsat 7 and Landsat 4-5 scene lists.

To export the first scene list: from the “Search Results” page, download a CSV file of ALL available Landsat imagery for the search area. To do this, click the “Click here to export your results > >” text near the top right of the screen.

In the “Metadata Export” box, choose “Non-Limited Results” for “Export Type”. For “Format” choose “CSV”:

A window will come up saying “Your export file is being generated.” Click “OK”. Repeat the same process for the other CDR dataset, by changing the data set you have selected, and again clicking the “Click here to export your results > >” text.

You will receive two emails at your USGS Earth Explorer registered email address each with a link to one a zipfile containing one of the scene lists. Download these zipfiles and use them for the next step (or use the below example data instead)

Selecting Landsat scenes to download

To follow along with this analysis, download this zipfile with a shapefile of Zone of Interation (ZOI) of the TEAM site in Nam Kading National Protected Area in Lao PDR. The zipfile also includes scene lists from EarthExplorer of all available Landsat scenes (as of April 23, 2014) for this site.

First, read in the Landsat scene lists downloaded from USGS EarthExplorer, using the ee_read function in teamlucc:

l7 <- ee_read('NAK_L7_20140423_scenelist.csv')
l45 <- ee_read('NAK_L4-5_20140423_scenelist.csv')

Now, merge the Landsats 4-5 and Landsat 7 scene lists so they can be analyzed together:

l457 <- merge(l7, l45, all=TRUE)

Selecting Landsat images, particularly for an area covered by multiple Landsat path/rows, can be difficult. The wrspathrow package (available on CRAN) is helpful for producing a quick visualization of the number of Landsat scenes (path/row(s)) needed to cover an area. For example, the below code reads in the ZOI shapefile for Nam Kading, and plots the Landsat path/rows needed to cover the ZOI. This plot uses the path/row polygons included in the wrspathrow package, and includes text labels for each path and row:

library(wrspathrow)
NAK_zoi <- readOGR('.', 'ZOI_NAK_2012_EEsimple')

## OGR data source with driver: ESRI Shapefile 
## Source: ".", layer: "ZOI_NAK_2012_EEsimple"
## with 1 features and 3 fields
## Feature type: wkbPolygon with 2 dimensions

NAK_pathrows <- pathrow_num(NAK_zoi, as_polys=TRUE)
plot(NAK_pathrows)
plot(NAK_zoi, add=TRUE, lty=2, col="#00ff0050")
text(coordinates(NAK_pathrows), labels=paste(NAK_pathrows$PATH, 
                                             NAK_pathrows$ROW, sep=', '))

From the above plots, we can see that it takes scenes from three Landsat path/rows to cover the entire Nam Kading ZOI. Suppose we need images from within a particular period, covering the entire area of that ZOI. The the ee_plot function in the teamlucc package can make a plot of the available imagery within a given time period, color coded by sensor and percent cloud cover. To use this function, first specify a start and end date:

start_date <- as.Date('1995/1/1')
end_date <- as.Date('2000/1/1')

Now use ee_plot to plot the available imagery from within that time period:

ee_plot(l457, start_date, end_date)

Each box in this plot represents a single Landsat scene from the Landsat archive. The color of each box indicates the path and row of that scene, while the outline color of each box indicates the sensor (Landsat 5 versus Landsat 7 for example). The shading of a particular box (light or dark) box indicates the cloud cover of that scene (lighter colors correspond to scenes with greater cloud cover).

From this plot, we can see that within the 1995 to 2000 time period, January 1996 is the only month in which images are available for all three path/rows needed to cover the Nam Kading ZOI. Further, from the shading of the boxes in that month, we can tell that these images are almost cloud free, and are from Landsat 5 (indicated by the green outlines).

Note that ee_plot will by default only plot imagery where greater than 70% of the image is unobscured by plots (i.e. less than 30% cloud cover). This default can be changed by supplying the min_clear parameter to ee_plot.

When dealing with a large number of scenes, a different type of plot can be helpful. The normalize argument to ee_plot tells ee_plot to calculate the best (lowest cloud cover) image for each path and row for each month. Then, ee_plot sums across all the path and rows, and plots the results:

ee_plot(l457, start_date, end_date, normalize=TRUE)

This type of plot is helpful in visualizing the periods in which the greatest proportion of a site can be covered by cloud-free (or nearly cloud-free) imagery.

Downloading Landsat scenes using ESPA

teamlucc also facilitates placing orders for imagery using the USGS ESPA system. The ESPA system accepts scenes lists as a text file. To output a scene list for upload to ESPA, use the espa_scenelist function, specifying the start and end dates needed, and the name of the output file:

espa_scenelist(l457, as.Date('1996/1/1'), as.Date('1996/12/31'), 
               'NAK_ESPA_scenelist_1986.txt')

The above line of code will save a text file named NAK_ESPA_scenelist_1986.txt in your current working directory. To place an order, login to the ESPA system, and upload the text file.

After receiving an email from the ESPA system notifying you that your order is processed, download the order from within R using the espa_download function by specifying 1) the email address you used to place the order, 2) the ESPA order ID number (included in the email from ESPA), and 3) the output folder in which to save the downloaded images. espa_download will first check within the specified output folder to see if each image already exists, and will not re-download existing files unless the existing files do not match the files available on ESPA.

Note the below code is not working as of 7/1/2014 due to changes in the ESPA download system. I will update this post when the code is working again.

espa_download('azvoleff@example.com', '272014-114611', 'D:/ESPA_Downloads')

## Error in espa_download("azvoleff@example.com", "272014-114611", "D:/ESPA_Downloads"): Due to changes in the ESPA system, espa_download is not working as of 7/1/2014

Filtering available Landsat scenes with teamlucc was originally published by Alex Zvoleff at Alex Zvoleff on May 05, 2014.

glcm 0.3.1 released

2014-04-24T00:00:00+00:00

I just released to CRAN a new version of the “glcm” R package for calculating image texture measures from grey-level co-occurrence matrices (GLCMs). Type

install.packages("glcm")

at your R command prompt to download the latest CRAN release.

This version fixes a bug in handling window sizes other than the default 3x3 window size, adds additional test cases, and performs more validation on user input to the glcm function. See the NEWS file for more details.

glcm 0.3.1 released was originally published by Alex Zvoleff at Alex Zvoleff on April 24, 2014.

Landsat Surface Reflectance CDR preprocessing with teamlucc

2014-11-13T00:00:00-00:00

Overview

This post outlines how to use the teamlucc package to preprocess imagery from the Landsat Surface Reflectance Climate Data Record (CDR) archive. The teamlucc package supports a number of common preprocessing steps, including:

Conversion of CDR files to any GDAL supported file format
Cropping Landsat tiles to a given area of interest (AOI)
Mosaicking and cropping of DEM tiles (such as ASTER or SRTM) to a given AOI or Landsat path/row
Topographic correction of CDR scenes

Getting started

First load the teamlucc package, and the rgdal package:

library(teamlucc)

## Loading required package: Rcpp
## Loading required package: raster
## Loading required package: sp

## Warning: replacing previous import by 'raster::buffer' when loading
## 'teamlucc'

## Warning: replacing previous import by 'raster::interpolate' when loading
## 'teamlucc'

## Warning: replacing previous import by 'raster::rotated' when loading
## 'teamlucc'

library(rgdal)

## rgdal: version: 0.9-1, (SVN revision 518)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 1.11.0, released 2014/04/16
## Path to GDAL shared files: C:/Users/azvoleff/R/win-library/3.1/rgdal/gdal
## GDAL does not use iconv for recoding strings.
## Loaded PROJ.4 runtime: Rel. 4.8.0, 6 March 2012, [PJ_VERSION: 480]
## Path to PROJ.4 shared files: C:/Users/azvoleff/R/win-library/3.1/rgdal/proj

DEM setup

Before CDR surface reflectance images can be topographically corrected with teamlucc, a digital elevation model (DEM) needs to be obtained that is in the same resolution and coordinate system as the CDR imagery, and slope and aspect need to be calculated. The auto_setup_dem function in the teamlucc package facilitates this task. There are many options you can supply to auto_setup_dem - see ?auto_setup_dem for more information on these options. If you do not want to topographically correct your imagery, you can skip this step and move ahead to the next section.

auto_setup_dem assembles a DEM to cover a given area of interest (AOI), and can mosaic multiple DEM files if needed to cover an AOI. To use auto_setup_dem, the user must first define the area of interest with an AOI polygon. If you have a shapefile of an area of interest, load it into R using the readOGR command. The readOGR command needs the folder the shapefile is in (in this example the current working directory, specified by .) as the first parameter, and the filename of the shapefile (without the “.shp” extension) as the second parameter (PA_VB) in this example). This example uses a shapefile of the boundary of the Braulio Carrillo National Park, in which the Volcán Barva TEAM site is located:

aoi <- readOGR('.', 'PA_VB')

## OGR data source with driver: ESRI Shapefile 
## Source: ".", layer: "PA_VB"
## with 5 features and 8 fields
## Feature type: wkbPolygon with 2 dimensions

plot(aoi)

As seen in the above plot, there are a number of adjoining polygons in this shapefile. The functions in teamlucc expect the AOI to be of length one. So calculate the convex hull of the AOI using the functions in the rgeos package:

library(rgeos)

## rgeos version: 0.3-8, (SVN revision 460)
##  GEOS runtime version: 3.4.2-CAPI-1.8.2 r3921 
##  Polygon checking: TRUE

aoi <- gConvexHull(aoi)

The AOI is now a single polygon:

plot(aoi)

If you do not have an AOI, but know the Landsat path and row you want to work with, an alternative is to define the AOI based on the path and row, using the wrspathrow R package, and supplying the desired path and row numbers. Here 127 is the WRS-2 path number, and 47 is the WRS-2 row number for the path/row at the center of the above AOI:

library(wrspathrow)
aoi_from_pathrow <- pathrow_poly(127, 47)

In addition to the AOI, auto_setup_dem needs to know the location and spatial extents of the DEM files you have available on your machine. This list can be assembled automatically using the get_extent_polys function in teamlucc.
See ?get_extent_polys for more information. The below code will fail on your machine because you will not have the proper DEMs for this example. Download the proper DEMs for the area in which you are working and place them in a folder on your machine if you want to test this function.

dem_files <- dir('H:/Data/CGIAR_SRTM/Tiles', pattern='.tif$', full.names=TRUE)
dems <- lapply(dem_files, raster)
dem_extents <- get_extent_polys(dems)

For flexibility, auto_setup_dem can optionally crop each output DEM to the spatial extent of the supplied AOI. If crop_to_aoi=TRUE, then auto_setup_dem will crop the DEMs to the spatial extent of the supplied AOI. If crop_to_aoi=FALSE, then auto_setup_dem will crop the DEMs to the extent of the Landsat path/rows needed to cover the AOI.

Lastly, auto_setup_dem needs to know where to save its output. For this example, save the output to the current working directory .. Now that all of the essential inputs are defined, auto_setup_dem can be called:

auto_setup_dem(aoi, '.', dem_extents, crop_to_aoi=TRUE)

## [1] "2014-11-13 16:49:49: started \"Processing DEMS for 1 path/rows\""
## [1] "2014-11-13 16:49:50: started \"Processing 1 of 1: 015-053\""
## [1] "2014-11-13 16:50:00: finished \"Processing 1 of 1: 015-053\" (10.047s elapsed)"
## [1] "2014-11-13 16:50:00: finished \"Processing DEMS for 1 path/rows\" (10.285s elapsed)"

The result will be a mosaicked DEM, in the current working directory, that can be used for topographically correcting imagery with the auto_preprocess_landsat function.

Preprocessing

Images are delivered from the CDR archive in either ENVI format, GeoTIFF format, or Hierarchical Data Format (HDF). The teamlucc package will, by default, convert HDF or ENVI format images to a GeoTIFF format, as these image files can be easily read in R and in most commonly used remote sensing and GIS software packages. This example assumes you want GeoTIFF output - see the help files for teamlucc for other output options.

First you will need to acquire a time series of CDR imagery for your site. The espa_download function can facilitate downloading files from an ESPA order. Also see the post on Filtering and downloading Landsat scenes for more details on how teamlucc can help with image acquisition.

Once you have downloaded your imagery from ESPA, I recommend you put all of the zip files from your download in a single folder. You can then use the teamlucc espa_extract function to automate extracting your CDR image files, including placing the extracted files in consistently named output folders.
If, for example, your files are located in espa_downloads and you want to extract them to espa_extracts, run the following (this block, and the next block of code, will both fail on your computer since you do not have the required imagery - this is only an example, download your own imagery to follow along):

download_folder <- 'espa_downloads'
extract_folder <- 'espa_extracts'
espa_extract(download_folder, extract_folder)

## 1 of 2. Extracting LT50150532000044-SC20140514195145.tar.gz to espa_extracts/015-053_2000-044_LT5
## 2 of 2. Extracting LT50150532001014-SC20140514195632.tar.gz to espa_extracts/015-053_2001-014_LT5

Now that the CDR format image files are extracted, you are ready to run auto_preprocess_landsat. As with auto_setup_dem, there are many options you can supply to auto_preprocess_landsat - see ?auto_preprocess_landsat. The below is a simple example of how to call auto_preprocess_landsat.

The image_dirs line below is just a fancy way of finding all the folders located in extract_folder that contain CDR imagery. You could just as easily specify these folders individually as as a list of strings, like: image_dirs <- c('C:/folder1', 'C:/folder2') if you had two CDR Landsat scenes located in C:/folder1 and C:/folder2, respectively.

The prefix parameter specifies a string that will be used in naming files output by auto_preprocess_landsat. For prefix I suggest you use a short (2 or 3 character) site name or site code that is meaningful to you.

There are two other options we provide below to auto_preprocess_landsat.
tc=TRUE tells auto_preprocess_landsat to perform topographic correction.
Because of this, we also need to specify dem_path (where the DEM files preprocessed by auto_setup_dem are located), so that the DEM files for this scene can be found. Here we set dem_path='.' as the DEM is in our current working directory. We supply an AOI (same AOI as above) to crop the output images. verbose=TRUE indicates that we want detailed progress messages to print while the script is running. The output of auto_preprocess_landsat is a data.frame with a list of the preprocessed files and their file formats.

image_dirs <- dir('espa_extracts',
                  pattern='^[0-9]{3}-[0-9]{3}_[0-9]{4}-[0-9]{3}_((LT[45])|(LE7))$',
                  full.names=TRUE)
filelist <- auto_preprocess_landsat(image_dirs, prefix='VB', tc=TRUE, 
                                    dem_path='.', aoi=aoi, verbose=TRUE)

## Warning: executing %dopar% sequentially: no parallel backend registered

## [1] "2014-11-13 16:50:52: started \"Preprocessing 015-053_2000-044_L5TSR\""
## [1] "2014-11-13 16:50:52: started \"cropping and reprojecting\""

## Warning in build_mask_vrt(file_base, mask_vrt_file, file_format): Using
## "fmask_band" instead of newer "cfmask_band" band name

## [1] "2014-11-13 16:51:03: finished \"cropping and reprojecting\" (10.787s elapsed)"
## [1] "2014-11-13 16:51:03: started \"topocorr\""
## [1] "2014-11-13 16:53:25: finished \"topocorr\" (142.071s (~2.37 minutes) elapsed)"
## [1] "2014-11-13 16:53:25: started \"writing data\""
## [1] "2014-11-13 16:53:29: finished \"writing data\" (4.052s elapsed)"
## [1] "2014-11-13 16:53:29: finished \"Preprocessing 015-053_2000-044_L5TSR\" (156.919s (~2.62 minutes) elapsed)"
## [1] "2014-11-13 16:53:30: started \"Preprocessing 015-053_2001-014_L5TSR\""
## [1] "2014-11-13 16:53:30: started \"cropping and reprojecting\""

## Warning in build_mask_vrt(file_base, mask_vrt_file, file_format): Using
## "fmask_band" instead of newer "cfmask_band" band name

## [1] "2014-11-13 16:53:44: finished \"cropping and reprojecting\" (14.161s elapsed)"
## [1] "2014-11-13 16:53:44: started \"topocorr\""
## [1] "2014-11-13 16:55:55: finished \"topocorr\" (131.256s (~2.19 minutes) elapsed)"
## [1] "2014-11-13 16:55:55: started \"writing data\""
## [1] "2014-11-13 16:55:59: finished \"writing data\" (4.236s elapsed)"
## [1] "2014-11-13 16:55:59: finished \"Preprocessing 015-053_2001-014_L5TSR\" (149.662s (~2.49 minutes) elapsed)"

The result is two cropped, reprojected, and topographically corrected Landsat images covering the specified AOI. One image from 2000:

ls_2000 <-  brick('espa_extracts/015-053_2000-044_LT5/VB_015-053_2000-044_L5TSR_tc.tif')
ls_2000 <- linear_stretch(ls_2000, pct=2, max_val=255)
browse_image(ls_2000, r=4, g=3, b=2)

And one image from 2001:

ls_2001 <- brick('espa_extracts/015-053_2001-014_LT5/VB_015-053_2001-014_L5TSR_tc.tif')
ls_2001 <- linear_stretch(ls_2001, pct=2, max_val=255)
browse_image(ls_2001, r=4, g=3, b=2)

There is a fair amount of cloud cover in the 2000 image. See the post on cloud removal for one means of addressing this issue.

Landsat Surface Reflectance CDR preprocessing with teamlucc was originally published by Alex Zvoleff at Alex Zvoleff on April 16, 2014.

Cloud removal with teamlucc

2014-05-09T00:00:00-00:00

Overview

This post outlines how to use the teamlucc package to remove thick clouds from Landsat imagery using the Neighborhood Similar Pixel Interpolator (NSPI) algorithm by Zhu et al.¹. teamlucc includes the original (modified slightly to be called from R) IDL code by Xiaolin Zhu, as well as a native R/C++ implementation of the NSPI algorithm. Thanks to Xiaolin for permission to redistribute his code along with the teamlucc package.

Getting started

First load the teamlucc package, and the SDMTools package we will use later:

library(teamlucc)

## Loading required package: Rcpp
## Loading required package: raster
## Loading required package: sp

## Warning: replacing previous import by 'raster::buffer' when loading
## 'teamlucc'

## Warning: replacing previous import by 'raster::interpolate' when loading
## 'teamlucc'

## Warning: replacing previous import by 'raster::rotated' when loading
## 'teamlucc'

library(SDMTools)

## 
## Attaching package: 'SDMTools'
## 
## The following object is masked from 'package:teamlucc':
## 
##     accuracy
## 
## The following object is masked from 'package:raster':
## 
##     distance

If teamlucc is not installed, install it using devtools”

if (!require(teamlucc)) install_github('azvoleff/teamlucc')

First I will cover how to cloud fill a single clouded image using a single clear (or partially clouded) image. Skip to the end to see how to automate the cloud fill process using teamlucc.

Cloud fill a single clouded image with a single clear image

This example will use a portion of a 1986 Landsat 5 scene from Volcan Barva, Costa Rica (a TEAM Network monitoring site). The scene is WRS-2 path 15, row 53. Particularly in the tropics, it can sometimes be difficult to find a Landsat image that is cloud-free. Cloud filling can offer a solution to this problem if there are multiple Landsat scenes captured of an area of interest, that, taken together, offer a cloud-free (or nearly cloud-free) view of an area. Throughout this post I will refer to the “base” and the “fill” images. The “base” image is a cloudy image that will be filled using images (the “fill images) of the same area that were captured on different dates.

While it can sometimes be possible to find a cloud-free scene from a different part of the year that can be used to fill in a cloudy scene from an earlier or later base date, it is often the case that both the fill and base image will have clouds. Therefore we must use cloud masks to mark areas in both the base and the fill image. Without a cloud mask for the fill image we could otherwise inadvertently fill clouded areas in one image with also cloudy pixels from another image.

The base (cloudy) image for this example is from January 5, 1986, and the fill image is from January 21, 1986. The images are surface reflectance images from the Landsat Surface Reflectance Climate Data Record (CDR), that also include cloud masks constructed with the Function of Mask (fmask) algorithm². Both of these images have significant cloud cover, and some areas are cloudy in both images. This example will show the ability of the cloud fill algorithms to function even in difficult circumstances.

To follow along with this analysis, download these files.
Note that the original CDR reflectance images have been rescaled to range between 0 and 255 in the files supplied here (this rescaling is not required prior to performing cloud fill - I just did it here to make the files sizes smaller so they could be more easily hosted on this site).

Load input data

First load the base and fill images into R:

base <- brick('vb_1986_005_b234.tif')

## rgdal: version: 0.9-1, (SVN revision 518)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 1.11.0, released 2014/04/16
## Path to GDAL shared files: C:/Users/azvoleff/R/win-library/3.1/rgdal/gdal
## GDAL does not use iconv for recoding strings.
## Loaded PROJ.4 runtime: Rel. 4.8.0, 6 March 2012, [PJ_VERSION: 480]
## Path to PROJ.4 shared files: C:/Users/azvoleff/R/win-library/3.1/rgdal/proj

fill <- brick('vb_1986_021_b234.tif')

Notice the cloud cover in the base image:

plotRGB(base, stretch='lin')

The fill image also has cloud cover, but less than the fill image - there are areas of the fill that can be used to fill clouded pixels in the base image:

plotRGB(fill, stretch='lin')

Topographic correction

In mountainous areas, topographic correction should be performed prior to cloud fill¹. teamlucc supports performing topographic correction using algorithms derived from those in the landsat package](http://cran.r-project.org/web/packages/landsat/index.html) by Sarah Goslee³.

To perform topographic correction, use the topographic_corr function in teamlucc. First load the slope and aspect rasters:

slp_asp <- brick('vb_slp_asp.tif')

Now call the topographic_corr function twice, to topographically correct both the base and fill image. Note that the sun angle elevation and sun azimuth (both in degrees) must be supplied - values for these parameters can be found in the metadata accompanying your imagery. See ?topographic_corr for more information. DN_min and DN_max can be used to ensure that invalid values are not generated by the topographic correction routine (which can sometimes be a problem in very heavily shadowed areas, or in very bright areas, such as clouds).

base_tc <- topographic_corr(base, slp_asp, sunelev=90-47.34, sunazimuth=134.04, 
                            DN_min=0, DN_max=255)

## Warning: executing %dopar% sequentially: no parallel backend registered

fill_tc <- topographic_corr(fill, slp_asp, sunelev=90-46.80, sunazimuth=129.88, 
                            DN_min=0, DN_max=255)

plotRGB(base_tc, stretch='lin')

plotRGB(fill_tc, stretch='lin')

Construct cloud masks

The fmask band from the CDR imagery uses the following codes:

Pixel type	Code
Clear land	0
Clear water	1
Cloud shadow	2
Snow	3
Cloud	4
No observation	255

We need to construct a mask of areas where all pixels that are cloud (code 4) or cloud shadow (code 2) are equal to 1, and where pixels in all other areas are equal to zero. This is easy using raster algebra from the R raster package. First load the masks:

base_fmask <- raster('vb_1986_005_fmask.tif')
fill_fmask <- raster('vb_1986_021_fmask.tif')

Now do the raster algebra, masking out clouds and cloud shadows, and setting missing values in both images to NAs in the masks:

# Set mask to 1 in clouds and shadow areas
base_cloud_mask <- (base_fmask == 2) | (base_fmask == 4)
fill_cloud_mask <- (fill_fmask == 2) | (fill_fmask == 4)
# Set mask to NA in background areas
base_cloud_mask[base_fmask == 255] <- NA
fill_cloud_mask[fill_fmask == 255] <- NA
# Set mask to NA in other NA areas in imagery (NAs can result from topographic 
# correction, generally in very dark areas or areas of very high slope)
base_cloud_mask[is.na(base_tc[[1]])] <- NA
fill_cloud_mask[is.na(fill_tc[[1]])] <- NA

Plot these masks to double-check they align with the clouds in the images we viewed earlier:

plot(base_cloud_mask)

plot(fill_cloud_mask)

Now use these two masks to mask out the clouds in the fill and base images, by setting clouded areas to zero (as the cloud_remove code treats pixels with zero values as “background”:

base_tc[base_cloud_mask] <- 0
fill_tc[fill_cloud_mask] <- 0

The cloud mask for the base image must be constructed so that each cloud has its own unique integer code, with codes starting from 1. This process can be automated using the ConnCompLabel function from the SDMTools package.
However, because there are clouds in our fill image as well as in our base image, we need to modify the base_cloud_mask slightly to account for this. First, code all pixels in base_cloud_mask that are clouded in fill_cloud_mask with NAs. This will tell the ConnCompLabel function not to label these pixels as clouds (because they are also clouded in the fill image, we cannot perform cloud fill on these pixels).

# Set clouds in fill image to NA in base mask:
base_cloud_mask[fill_cloud_mask] <- NA
# Set missing values in fill image to NA in base mask:
base_cloud_mask[is.na(fill_cloud_mask)] <- NA

Now run ConnCompLabel, and set the output datatype to INT2S (meaning the data in base_cloud_mask can range from -32768 - 32767). That said, please don’t try to run cloud fill with 32,767 clouds in your image :).

base_cloud_mask <- ConnCompLabel(base_cloud_mask)

The final base_cloud_mask is now coded as:

Pixel type	Code
Background in `fill` or `base`	NA
Clouded in `fill`	-1
Clear in `base`, clear in `fill`	0
Clouded in `base`	1 … n

where n is the number of clouds in the image:

plot(base_cloud_mask)

Fill clouds

For this simple example, we will directly use the cloud_remove function in teamlucc. This function has a number of input parameters that can be supplied (see ?cloud_remove). Two important ones to note are DN_min and DN_max.
These are the minimum and maximum valid values, respectively, that a pixel in the image can take on. These limits are used to ignore unrealistic predictions that may arise in the cloud fill routine. For the base and fill images we are working with here, these values are 0 and 255, for max and min, respectively.
Set these parameters to appropriate values as necessary for the images you are working with.

There are three different cloud fill algorithms that can be used from teamlucc. Two require an IDL installation, while the third uses a cloud fill algorithm that is native to R (though it is coded in C++ for speed reasons).
The R-based algorithm is a bit more flexible than the IDL algorithms, and is designed to handle images in which both the base and fill image have clouds. Based on the options supplied to cloud_remove, teamlucc will select one of the fourt algorithms to run. The algorithm parameter to cloud_remove determine which cloud fill algorithm is used:

Algorithm	Requires IDL license?	Algorithm used by `cloud_remove`
`CLOUD_REMOVE`	Yes	CLOUD_REMOVE¹
`CLOUD_REMOVE_FAST`	Yes	CLOUD_REMOVE_FAST¹
`teamlucc`	No	`teamlucc` fill algorithm
`simple`	No	simple linear model algorithm

First I will review the two IDL-based algorithms, then I will discuss the two R-based algorithms.

Cloud removal using IDL code

If run with algorithm="CLOUD_REMOVE" (the default), cloud_remove runs an IDL script provided by Xiaolin Zhu. For R to be able to run this script it must know the path to IDL on your machine. For Windows users, this means the path to “idl.exe”. To specify this path you will need to provide the idl parameter to the cloud_remove script. The default value (C:/Program Files/Exelis/IDL83/bin/bin.x86_64/idl.exe) may or may not work on your machine. I recommend you set the IDL path at the beginning of your scripts:

idl_path <- "C:/Program Files/Exelis/IDL83/bin/bin.x86_64/idl.exe"

An optional out_name parameter can be supplied to cloud_remove to specify the filename for the output file. If not supplied, R will save the filled image as an R object pointing to a temporary file.

To run the cloud removal routine, call the cloud_remove function with the appropriate parameters. Note that this computation may take some time (it takes around 1.5 hours on a 2.9Ghz Core-i7 3520M laptop).

# Takes 2-3 hours on a 2.9Ghz Core-i7 3520M laptop
start_time <- Sys.time()
# Ensure dataType is properly set prior to handing off to IDL
dataType(base_cloud_mask) <- 'INT2S'
filled_cr <- cloud_remove(base_tc, fill_tc, base_cloud_mask, 
                            algorithm="CLOUD_REMOVE", DN_min=0, DN_max=255, 
                            idl=idl_path)

## Loading required package: ncdf

Sys.time() - start_time

## Time difference of 1.637638 hours

Use plotRGB to check the output. Note that IDL does not properly code missing values in the output - prior to plotting or working with the data be sure to set any pixels with values less than DN_min (here DN_min is zero) to NA:

filled_cr[filled_cr < 0] <- NA
plotRGB(filled_cr, stretch="lin")

The default cloud fill approach can take a considerable amount of time to run.
There is an alternative approach that can take considerably less time to run, with similar results. This option can be enabled by supplying the algorithm="CLOUD_REMOVE_FAST parameter to cloud_remove.

The “fast” version of the algorithm makes some simplifications to improve running time. Specifically, rather than follow the precise algorithm as outlined by Zhu et al.¹, the “fast” routine uses k-means clustering to divide the image into the number of classes specified by the num_class parameter. The script then constructs a linear model of the temporal change in reflectance for each class within the neighborhood of a given cloud. This “temporal” adjustment is complemented by a “spatial” adjustment that considers the change in reflectance in a small neighborhood around each clouded pixel. For each clouded pixel, a weighted combination of the predicted fill values from the spatial and temporal models determines the final predicted value for that pixel. This version of the algorithm takes only 2.5 minutes to run on the same machine as used above:

# Takes 2-3 minutes on a 2.9Ghz Core-i7 3520M laptop
start_time <- Sys.time()
# Ensure dataType is properly set prior to handing off to IDL
dataType(base_cloud_mask) <- 'INT2S'
filled_crf  <- cloud_remove(base_tc, fill_tc, base_cloud_mask, 
                                  algorithm="CLOUD_REMOVE_FAST", DN_min=0,
                                  DN_max=255, idl=idl_path)
Sys.time() - start_time

## Time difference of 55.48255 secs

Use plotRGB to check the output:

filled_crf[filled_crf < 0] <- NA
plotRGB(filled_crf, stretch='lin')

Cloud removal using native R code

If you do not have IDL on your machine, there is a C++ implementation of the NSPI cloud fill algorithm that will run directly in R, as well as a “simple” cloud fill algorithm that uses linear models developed using the neighborhood of each cloud to perform a naive fill. To run the R version of the NSPI algorithm, call the cloud_remove function with the same parameters as above, but specify algorithm="teamlucc". This function also has a verbose=TRUE option to tell cloud_remove to print progress statements as it is running (this option is not available with the IDL scripts shown above). This version is nearly identical to the IDL algorithm called with the algorithm="CLOUD_REMOVE" option, but it takes much less time to run (only 3-4 minutes on my machine).

Note that when cloud_remove is run with algorithm="teamlucc" and verbose=TRUE, there will be a large number of status messages printed to the screen. For the purposes of this demo (so that the webpage is not unnecessarily long), I have not used the verbose=TRUE argument, but I recommend using it if you try this command yourself.

# Takes 4-5 minutes on a 2.9Ghz Core-i7 3520M laptop
start_time <- Sys.time()
filled_tl <- cloud_remove(base_tc, fill_tc, base_cloud_mask, DN_min=0, 
                              DN_max=255, algorithm="teamlucc")
Sys.time() - start_time

## Time difference of 3.320532 mins

View the results with plotRGB:

plotRGB(filled_tl, stretch='lin')

The fastest cloud fill option is to run cloud_remove with algorithm="SIMPLE". This uses a simple cloud fill approach in which the value of each clouded pixel is calculated using a linear model. The script develops a separate linear model (with slope and intercept) for each band and each cloud. For each cloud, and each image band, the script finds all pixels clear in both the cloudy and fill images, and calculates a regression model in which pixel values in the fill image are the independent variable, and pixel values in the clouded image are the dependent variable. The script then uses this model to predict pixel values for each band in each cloud in the clouded image. For example:

# Takes 2-5 seconds on a 2.9Ghz Core-i7 3520M laptop
start_time <- Sys.time()
filled_simple <- cloud_remove(base_tc, fill_tc, base_cloud_mask, DN_min=0, 
                              DN_max=255, algorithm="simple")
Sys.time() - start_time

## Time difference of 0.6300631 secs

View the results with plotRGB:

plotRGB(filled_simple, stretch='lin')

Compare all four fill algorithms:

To plot the results of all four fill algorithms, make a layer stack of the first band of all four images, then plot:

filled_comp <- stack(filled_cr[[1]], filled_crf[[1]], filled_tl[[1]], 
                     filled_simple[[1]])
names(filled_comp) <- c('CLOUD_REMOVE', 'CLOUD_REMOVE_FAST', 'teamlucc', 
                       'simple')
filled_comp <- linear_stretch(filled_comp, pct=2, max_val=255)
plot(filled_comp)

Automated cloud fill from an image time series

The teamlucc package also includes functions for automated cloud filling from an image time series. Automatic cloud filling is performed using the auto_cloud_fill function. This function automates the majority of the cloud filling process. As multiple images are required to demonstrate this process, the images required for this portion of the example are not available for download from this site. I suggest you download the appropriate imagery for a particular study site and preprocess the imagery using the auto_setup_dem and auto_preprocess_landsat functions in the teamlucc package so that you can follow along with this example. The auto_preprocess_landsat function will also perform topographic correction, which is necessary prior to cloud filling images in mountainous areas.

The auto_cloud_fill function allows an analyst to automatically construct a cloud-filled image after specifying: data_dir (a folder of Landsat images), wrspath and wrsrow (the WRS-2 path/row to use), and start_date and end_date (a start and end date limiting the images to use in the algorithm). The analyst can also optionally specify a base_date, and the auto_cloud_fill function will automatically pick the image closest to that date to use as the base image (otherwise auto_cloud_fill will automatically pick the image with the least cloud cover as the base image).

As the auto_cloud_fill function automatically chooses images for inclusion in the cloud fill process, it relies on having images stored on disk in a particular way, and currently only supports cloud fill for Landsat CDR surface reflectance images. To ensure that images are correctly stored on your hard disk, use the auto_preprocess_landsat function to extract the original Landsat CDR hdf files from the USGS archive. The auto_preprocess_landsat function will ensure that images are extracted and renamed properly so that they can be used with the auto_cloud_fill script.

# start_time <- Sys.time()
# start_date <- as.Date('1986-01-01')
# end_date <- as.Date('1987-01-01')
# filled_image <- auto_cloud_fill("C:/Data/LEDAPS_imagery", wrspath=230, 
#                                 wrsrow=62, start_date=start_date,
#                                 end_date=end_date)
# Sys.time() - start_time

Zhu, X., Gao, F., Liu, D., Chen, J., 2012. A modified neighborhood similar pixel interpolator approach for removing thick clouds in Landsat images. Geoscience and Remote Sensing Letters, IEEE 9, 521–525. doi:10.1109/LGRS.2011.2173290 ↩ ↩² ↩³ ↩⁴ ↩⁵
Zhu, Z. and Woodcock, C. E., Object-based cloud and cloud shadow detection in Landsat imagery, Remote Sensing of Environment (2012), doi:10.1016/j.rse.2011.10.028 ↩
Sarah C. Goslee (2011). Analyzing Remote Sensing Data in R: The landsat Package. Journal of Statistical Software, 43(4), 1-25. ↩

Cloud removal with teamlucc was originally published by Alex Zvoleff at Alex Zvoleff on April 16, 2014.

gfcanalysis 1.0 released

2014-04-01T00:00:00+00:00

Version 1.0 of the gfcanalysis R package for working with the Hansen et al.
2013¹ Global Forest Change dataset is now on CRAN. See the gfcanalysis page on CRAN for more information.

Hansen, M. C., P. V. Potapov, R. Moore, M. Hancher, S. A. Turubanova, A. Tyukavina, D. Thau, S. V. Stehman, S. J. Goetz, T. R. Loveland, A. Kommareddy, A. Egorov, L. Chini, C. O. Justice, and J. R. G. Townshend. 2013. High-Resolution Global Maps of 21st-Century Forest Cover Change. Science 342, (15 November): 850–853. Data available on-line from: http://earthenginepartners.appspot.com/science-2013-global-forest. ↩

gfcanalysis 1.0 released was originally published by Alex Zvoleff at Alex Zvoleff on April 01, 2014.

Analyzing forest change with gfcanalysis

2014-03-25T00:00:00+00:00

Overview

This gfcanalysis R package facilitates simple analyses using the Hansen et al. 2013¹ Global Forest Change dataset. The package was written to analyze forest change in within the Zone of Interaction surrounding each of the forest monitoring sites of the Tropical Ecology Assessment and Monitoring (TEAM) Network.

If you need help with any of the functions in the package, see the help files for more information. For example, type ?download_tiles in R to see the help file for the download_tiles function.

Getting started

This post will outline an analysis using the gfcanalysis package. Note that as the computations are intensive, some parts of this analysis may take some time to run (about 30 minutes total to run all of the code outlined here). If you do not already have the GFC product data downloaded on your computer, downloading the dataset will also take some time (though this process is automated by gfcanalysis).

To get started, first install the gfcanalysis package from CRAN. Also install the rgdal package needed for reading/writing shapefiles if you do not already have it.

if (!require(gfcanalysis)) install.packages('gfcanalysis')

## Loading required package: gfcanalysis
## Loading required package: raster
## Loading required package: sp

if (!require(rgdal)) install.packages('rgdal')

## Loading required package: rgdal
## rgdal: version: 0.8-16, (SVN revision 498)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 1.10.1, released 2013/08/26
## Path to GDAL shared files: C:/Users/azvoleff/R/win-library/3.2/rgdal/gdal
## GDAL does not use iconv for recoding strings.
## Loaded PROJ.4 runtime: Rel. 4.8.0, 6 March 2012, [PJ_VERSION: 480]
## Path to PROJ.4 shared files: C:/Users/azvoleff/R/win-library/3.2/rgdal/proj

Indicate where we want to save GFC tiles downloaded from Google. For any given AOI, the script will first check to see if these tiles are available locally (in the below folder) before downloading them from the server - so I recommend storing ALL of your GFC tiles in the same folder. For this example we will use “.” - the current working directory of the R session.

output_folder <- "."

Set the threshold for forest/non-forest based on the treecover2000 layer in the GFC product:

forest_threshold <- 90

Downloading data from Google server for a given AOI

Load an area of interest. For this example we use a shapefile of the Zone of Interaction (ZOI) of the TEAM Network site in Nam Kading National Protected Area, Laos. Notice that first we specify the folder the shapefile is in (here it is a “.” indicating the current working directory), and then the name of the shapefile without the “.shp”. To follow along with this example, download this shapefile of the ZOI.

aoi <- readOGR('.', 'ZOI_NAK_2012_EEsimple')

## OGR data source with driver: ESRI Shapefile 
## Source: ".", layer: "ZOI_NAK_2012_EEsimple"
## with 1 features and 3 fields
## Feature type: wkbPolygon with 2 dimensions

Calculate the tiles needed to cover the AOI:

tiles <- calc_gfc_tiles(aoi)
print(length(tiles)) # Number of tiles needed to cover AOI

## [1] 1

To check the overlap between the tiles and the aoi, you can make a plot of the needed tiles and the AOI using R’s plotting functions:

plot(tiles)
plot(aoi, add=TRUE, lty=2, col="#00ff0050")

Now, check to see if these tiles are already present locally, and download them if they are not. By default the “first” and “last” composite surface reflectance images are not downloaded. To also download these images specify first_and_last=TRUE.

download_tiles(tiles, output_folder, first_and_last=FALSE)

## 1 tiles to download/check.
## 0 file(s) succeeded, 5 file(s) skipped, 0 file(s) failed.

Performing thresholding and calculating basic statistics

Extract the GFC data for this AOI from the downloaded GFC tiles, mosaicing multiple tiles as necessary (if needed to cover the AOI). Save this extract in GeoTiff format in the current working directory (can also save as ENVI format, Erdas format, etc.)

gfc_extract <- extract_gfc(aoi, output_folder, filename="NAK_GFC_extract.tif")

The extracted dataset has 5 layers (not yet thresholded):

gfc_extract

## class       : RasterBrick 
## dimensions  : 4358, 4761, 20748438, 5  (nrow, ncol, ncell, nlayers)
## resolution  : 0.0002778, 0.0002778  (x, y)
## extent      : 103.5, 104.8, 17.83, 19.04  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## data source : C:\Users\azvoleff\Code\Misc\azvoleff.github.io\Rmd\2014-03-25-analyzing-forest-change-with-gfcanalysis\NAK_GFC_extract.tif 
## names       : treecover2000, loss, gain, lossyear, datamask 
## min values  :             0,    0,    0,        0,        1 
## max values  :           100,    1,    1,       12,        2

Threshold the GFC data based on a specified percent cover threshold (0-100), and save the thresholded layers to a GeoTiff:

gfc_thresholded <- threshold_gfc(gfc_extract, forest_threshold=forest_threshold, 
                                 filename="NAK_GFC_extract_thresholded.tif")

Coding of the thresholded output

The thresholded dataset has 5 layers:

gfc_thresholded

## class       : RasterBrick 
## dimensions  : 4358, 4761, 20748438, 5  (nrow, ncol, ncell, nlayers)
## resolution  : 0.0002778, 0.0002778  (x, y)
## extent      : 103.5, 104.8, 17.83, 19.04  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## data source : C:\Users\azvoleff\Code\Misc\azvoleff.github.io\Rmd\2014-03-25-analyzing-forest-change-with-gfcanalysis\NAK_GFC_extract_thresholded.tif 
## names       : forest2000, lossyear, gain, lossgain, datamask 
## min values  :          0,        0,    0,        0,        1 
## max values  :          1,       12,    1,        1,        2

The output is coded using the following coding scheme:

Band 1 (forest2000)

Based on the provided forest_threshold. Pixels wiwth percent canopy cover greater than forest_threshold are coded as forest.

Cover in 2000	Code
Non-forest	0
Forest	1

Band 2 (lossyear)

Note that lossyear is zero for pixels that were not forested in 2000.

Year of loss	Code
No loss	0
Loss in 2001	1
Loss in 2002	2
Loss in 2003	3
Loss in 2004	4
Loss in 2005	5
Loss in 2006	6
Loss in 2007	7
Loss in 2008	8
Loss in 2009	9
Loss in 2010	10
Loss in 2011	11
Loss in 2012	12

Band 3 (gain)

Note that gain is zero for pixels that were forested in 2000.

Change	Code
No gain	0
Gain	1

Band 4 (lossgain)

Note that loss and gain is difficult to interpret from the thresholded product, as the original GFC product does not provide information on the sequence of loss and gain (loss then gain, or gain then loss). The product also does not provide information on the levels of canopy cover reached prior to loss (for gain then loss) or after loss (for loss then gain pixels). The layer is calculated here as: lossgain <- gain & (lossyear != 0), where lossyear and gain are the original GFC gain and lossyear layers, respectively.

Change	Code
No loss and gain	0
Loss and gain	1

Band 5 (datamask)

Class	Code
No data	0
Land	1
Water	2

Calculating statistics on forest loss and forest gain

Calculate annual statistics on forest loss/gain:

gfc_stats <- gfc_stats(aoi, gfc_thresholded)

## Data appears to be in latitude/longitude. Calculating cell areas on a sphere.

gfc_stats

## $loss_table
##    year   aoi  cover   loss
## 1  2000 AOI 1 433656     NA
## 2  2001 AOI 1 433125  530.6
## 3  2002 AOI 1 432108 1017.4
## 4  2003 AOI 1 430451 1656.4
## 5  2004 AOI 1 429590  861.6
## 6  2005 AOI 1 427738 1851.3
## 7  2006 AOI 1 425938 1800.2
## 8  2007 AOI 1 421196 4742.1
## 9  2008 AOI 1 420032 1164.0
## 10 2009 AOI 1 415982 4049.5
## 11 2010 AOI 1 412196 3786.5
## 12 2011 AOI 1 407462 4734.3
## 13 2012 AOI 1 403578 3884.1
## 
## $gain_table
##      period   aoi  gain lossgain
## 1 2000-2012 AOI 1 16194    12287

Save these statistics to CSV files for use in Excel, or other software:

write.csv(gfc_stats$loss_table, 
          file='NAK_GFC_extract_thresholded_losstable.csv', row.names=FALSE)
write.csv(gfc_stats$gain_table, 
          file='NAK_GFC_extract_thresholded_gaintable.csv', row.names=FALSE)

To view the output format of the CSV files output by gfcanalysis, see the loss table and gain table for Nam Kading.

Making simple visualizations

There is also a function in gfcanalysis to calculate and save a thresholded annual layer stack from the GFC product (useful for simple visualizations, etc.):

gfc_annual_stack <- annual_stack(gfc_thresholded)
writeRaster(gfc_annual_stack, filename="NAK_GFC_extract_thresholded_annual.tif")

## class       : RasterBrick 
## dimensions  : 4358, 4761, 20748438, 13  (nrow, ncol, ncell, nlayers)
## resolution  : 0.0002778, 0.0002778  (x, y)
## extent      : 103.5, 104.8, 17.83, 19.04  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## data source : C:\Users\azvoleff\Code\Misc\azvoleff.github.io\Rmd\2014-03-25-analyzing-forest-change-with-gfcanalysis\NAK_GFC_extract_thresholded_annual.tif 
## names       : NAK_GFC_e//d_annual.1, NAK_GFC_e//d_annual.2, NAK_GFC_e//d_annual.3, NAK_GFC_e//d_annual.4, NAK_GFC_e//d_annual.5, NAK_GFC_e//d_annual.6, NAK_GFC_e//d_annual.7, NAK_GFC_e//d_annual.8, NAK_GFC_e//d_annual.9, NAK_GFC_e//_annual.10, NAK_GFC_e//_annual.11, NAK_GFC_e//_annual.12, NAK_GFC_e//_annual.13 
## min values  :                     1,                     1,                     1,                     1,                     1,                     1,                     1,                     1,                     1,                     1,                     1,                     1,                     1 
## max values  :                     6,                     6,                     6,                     6,                     6,                     6,                     6,                     6,                     6,                     6,                     6,                     6,                     6

The annual stack output by annual_stack has one layer for each year:

gfc_annual_stack

## class       : RasterBrick 
## dimensions  : 4358, 4761, 20748438, 13  (nrow, ncol, ncell, nlayers)
## resolution  : 0.0002778, 0.0002778  (x, y)
## extent      : 103.5, 104.8, 17.83, 19.04  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## data source : C:\Users\azvoleff\AppData\Local\Temp\R_raster_azvoleff\raster_tmp_2014-04-04_141034_13628_76953.grd 
## names       : y2000, y2001, y2002, y2003, y2004, y2005, y2006, y2007, y2008, y2009, y2010, y2011, y2012 
## min values  :     1,     1,     1,     1,     1,     1,     1,     1,     1,     1,     1,     1,     1 
## max values  :     6,     6,     6,     6,     6,     6,     6,     6,     6,     6,     6,     6,     6 
## time        : 2000-01-01, 2001-01-01, 2002-01-01, 2003-01-01, 2004-01-01, 2005-01-01, 2006-01-01, 2007-01-01, 2008-01-01, 2009-01-01, 2010-01-01, 2011-01-01, 2012-01-01

Forest change in each year is coded as:

Cover/Change	Code
No data	0
Forest	1
Non-forest	2
Forest loss	3
Forest gain	4
Forest loss and gain	5
Water	6

The animate_annual function can be used to save a simple visualization of the thresholded annual layer stack.

Note: For this example, we are using the data in the WGS84 coordinate system. For a real analysis or presentation, the data should be projected into UTM or another projection system for this. The utm_zone function in the gfcanalysis package and the projectRaster function in the raster package could be used to automate this. Also see the to_utm option for the extract_gfc function (type ?extract_gfc in R).

To make an annual animation (in WGS84) type:

aoi$label <- "ZOI" # Label the polygon on the plot
animate_annual(aoi, gfc_annual_stack, out_dir='.', site_name='Nam Kading')

## HTML file created at: C:\Users\azvoleff\Code\Misc\azvoleff.github.io\Rmd\2014-03-25-analyzing-forest-change-with-gfcanalysis/gfc_animation.html
## You may use ani.options(outdir = getwd()) or saveHTML(..., outdir = getwd()) to generate files under the current working directory.

The animation will be saved in the directory specified by out_dir (in this example the current working directory). To view the animation, double-click the new “.html” file in that directory. The animation will look something like this.

Hansen, M. C., P. V. Potapov, R. Moore, M. Hancher, S. A. Turubanova, A. Tyukavina, D. Thau, S. V. Stehman, S. J. Goetz, T. R. Loveland, A. Kommareddy, A. Egorov, L. Chini, C. O. Justice, and J. R. G. Townshend. 2013. High-Resolution Global Maps of 21st-Century Forest Cover Change. Science 342, (15 November): 850–853. Data available on-line from: http://earthenginepartners.appspot.com/science-2013-global-forest. ↩

Analyzing forest change with gfcanalysis was originally published by Alex Zvoleff at Alex Zvoleff on March 25, 2014.

Classifying an image with teamlucc

2014-03-19T00:00:00-00:00

Getting started

First load the devtools package, used for installing teamlucc. Install the devtools package if it is not already installed:

if (!require(devtools)) {
    install.packages('devtools')
    library(devtools)
}

Now load the teamlucc package, using devtools to install it from github if it is not yet installed:

if (!require(teamlucc)) {
    install_github('azvoleff/teamlucc')
    library(teamlucc)
}

Also load the rgdal package needed for reading/writing shapefiles:

library(rgdal)

## rgdal: version: 0.9-1, (SVN revision 518)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 1.11.0, released 2014/04/16
## Path to GDAL shared files: C:/Users/azvoleff/R/win-library/3.1/rgdal/gdal
## GDAL does not use iconv for recoding strings.
## Loaded PROJ.4 runtime: Rel. 4.8.0, 6 March 2012, [PJ_VERSION: 480]
## Path to PROJ.4 shared files: C:/Users/azvoleff/R/win-library/3.1/rgdal/proj

Collect training data for supervised classification

The first step in the classification is putting together a training dataset. teamlucc includes a function to output a shapefile that can be used for collecting training data. Here we are collecting training data for the L5TSR_1986 raster (a portion of a 1986 Landsat 5 surface reflectance image) that is included with the teamlucc package. Use the get_extent_polys function to quickly construct a shapefile in the same coordinate system as the image:

train_polys <- get_extent_polys(L5TSR_1986)

Add an empty field named “class_1986” to the object, and delete the extent polygon (because we don’t need it, and just want an empty shapefile):

train_polys$class_1986 <- '' # Add an empty column named "class_1986"
train_polys <- train_polys[-1, ] # Delete extent polygon

Now save the train_polys object to a shapefile using writeOGR from the rgdal package. The "." below just means “save the shapefile in the current directory”.

writeOGR(train_polys, ".", "training_data", "ESRI Shapefile")

Open the generated “training_data.shp” shapefile in a GIS program (I recommend QGIS) and digitize a number of polygons in each of the land cover classes you want to map. For this example, we will simply classify “Forest” and “Non-forest”. For each polygon you digitize, record the cover type in the “class_1986” column. After digitizing a number of polygons within each class, save the shapefile, and load it back into R using train_polys <- readOGR(".", "training_data").

Or: (for this example) you can use the thirty training polygons included in the teamlucc package in the L5TSR_1986_2001_training dataset:

train_polys <- L5TSR_1986_2001_training

Classify image

First we need to extract the training data from our training image, for each pixel within the polygons in our train_polys dataset. get_pixels will use the training parameter that we pass to determine the fraction of the training data to use in training the classifier. If set to 1, ALL of the training data will be used to train the classifier, leaving no independent data for validation. If set to a fraction (for example .6), then only 60% of the data (randomly selected) will be used in training, and 40% will be preserved as an independent sample for use in testing.

Note: Validation data should generally be collected separately from training data anyways, to ensure the image is randomly sampled (training data collection is almost never random), so in most cases I don’t recommend making heavy use of the training parameter. It can be useful though in testing.

set.seed(0) # Set a random seed so results can be reproduced
train_data <- get_pixels(L5TSR_1986, train_polys, class_col="class_1986", 
                         training=.6)

A summary method is provided by teamlucc for printing summary statistics on training datasets:

summary(train_data)

## Object of class "pixel_data"
## 
## Number of classes:	2
## Number of polygons:	30
## Number of pixels:	120
## Number of sources:	1
## 
## Training data statistics:
## Source: local data frame [2 x 5]
## 
##       class n_polys n_train_pixels n_test_pixels train_frac
## 1    Forest      17             48            20       0.71
## 2 NonForest      13             24            28       0.46
## 
## Number of training samples:	72
## Number of testing samples:	48
## Training fraction:		0.6

To perform the actual image classification, we will use the classify function. Prior to using that function, we need to train a classifier. The train_classifier function automates training a random forest or support vector machine (SVM) classifier. There are many options that can be provided to train_classifier - for this example we will just use the defaults. The default is to use a random forest classifier.

clfr <- train_classifier(train_data)

## Loading required package: randomForest
## randomForest 4.6-10
## Type rfNews() to see new features/changes/bug fixes.
## Loading required package: lattice
## Loading required package: ggplot2

Now we can use the classify function to perform the image classification:

cls <- classify(L5TSR_1986, clfr)

## Warning in .local(x, ...): min value not known, use setMinMax

## Warning in .local(x, ...): min value not known, use setMinMax

## Loading required package: parallel
## 
## Attaching package: 'parallel'
## 
## The following objects are masked from 'package:snow':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, clusterSplit, makeCluster,
##     parApply, parCapply, parLapply, parRapply, parSapply,
##     splitIndices, stopCluster
## 
## Loading required package: iterators
## Loading required package: foreach
## foreach: simple, scalable parallel programming from Revolution Analytics
## Use Revolution R for scalability, fault tolerance and more.
## http://www.revolutionanalytics.com
## 
## Attaching package: 'mmap'
## 
## The following object is masked from 'package:Rcpp':
## 
##     sizeof

## Warning in int64(): unsupported int64, use int32 or real64

To see the predicted classes, use spplot:

spplot(cls$classes)

We can also see the class probabilities (per pixel probabilities of membership of each class):

spplot(cls$probs)

The output from classify also includes a table indicating the coding for the output:

print(cls$codes)

##   code     class
## 1    0    Forest
## 2    1 NonForest

Parallel processing

Training a classifier and predicting land cover classes is very CPU-intensive.
If you have a machine that has multiple processors (or multiple cores), using more than one processor can significantly increase the speed of some calculations. teamlucc supports parallel computations (using the capabilities of the raster package). To enable this functionality, first install the doParallel package if it is not already installed, and load the package:

if (!require(doParallel)) {
    install.packages('doParallel')
    library(doParallel)
}

## Loading required package: doParallel

Now, just call registerDoParallel(), and by default any calculations that are coded to run in parallel will use half of the available CPUs on your machine.
You can also specify a number of CPUs to use, by running, for example, registerDoParallel(2) to use two CPUs. The get_pixels, train_classifier
and classify functions in teamlucc all support parallel computation, and will run in parallel automatically if you have called registerDoParallel.
Below is the code for the same classification problem we just ran, but this time we run the classification in parallel:

library(doParallel)
registerDoParallel(2)
set.seed(0) # Set a random seed so results match what we got earlier
train_data_par <- get_pixels(L5TSR_1986, train_polys, class_col="class_1986", 
                             training=.6)
clfr_par <- train_classifier(train_data)
cls_par <- classify(L5TSR_1986, clfr)

## Warning in .local(x, ...): min value not known, use setMinMax

## Warning in .local(x, ...): min value not known, use setMinMax

Accuracy assessment

Conducting a thorough accuracy assessment is one of the most important components of image classification. The teamlucc package includes an accuracy function to assist with measuring the accuracy of image classifications. In addition to the standard contingency tables often used for describing accuracy, accuracy also calculates “quantity disagreement” and “allocation disagreement” as introduced by Pontius and Millones 2011¹.
Unbiased contingency tables can be calculated with accuracy by supplying a pop parameter to accuracy. accuracy provides 95% confidence intervals for user’s, producer’s, and overall accuracies, calculated as in Olofsson et al.
2013².

To calculate a basic contingency table, assuming that population frequencies of the observed classes can be estimated from the classification output, and using the 40% of pixels that were excluded from training the classifier as testing data, run the accuracy function using the model calculated above:

acc <- accuracy(clfr)

## Warning in calc_accuracy(predicted, observed, pop, reclass_mat): pop was
## not provided - assuming sample frequencies equal population frequencies

Note the warning from accuracy, which is reminding us that we did not provide population frequencies for the classes.

Asummary method for the accuracy object is provided by teamlucc, and calculates user’s, producers, and overall accuracy, and quantity and allocation disagreement:

summary(acc)

## Object of class "accuracy"
## 
## Testing samples:	48
## 
## Sample contingency table:
##            observed
## predicted    Forest NonForest   Total   Users
##   Forest    16.0000    3.0000 19.0000  0.8421
##   NonForest  4.0000   25.0000 29.0000  0.8621
##   Total     20.0000   28.0000 48.0000        
##   Producers  0.8000    0.8929          0.8542
## 
## Population contingency table:
##            observed
## predicted   Forest NonForest  Total  Users
##   Forest    0.3333    0.0625 0.3958 0.8421
##   NonForest 0.0833    0.5208 0.6042 0.8621
##   Total     0.4167    0.5833 1.0000       
##   Producers 0.8000    0.8929        0.8542
## 
## Overall accuracy:	0.8542
## 
## Quantity disagreement:		0.0208
## Allocation disagreement:	0.125

Pontius, R. G., and M. Millones. 2011. Death to Kappa: birth of quantity disagreement and allocation disagreement for accuracy assessment. International Journal of Remote Sensing 32:4407-4429. ↩
Olofsson, P., G. M. Foody, S. V. Stehman, and C. E. Woodcock. 2013. Making better use of accuracy data in land change studies: Estimating accuracy and area and quantifying uncertainty using stratified estimation. Remote Sensing of Environment 129:122-131. ↩

Classifying an image with teamlucc was originally published by Alex Zvoleff at Alex Zvoleff on March 19, 2014.

glcm 0.2 released

2014-02-17T00:00:00+00:00

I just released to CRAN a new version of the “glcm” R package for calculating image texture measures from grey-level co-occurrence matrices (GLCMs). Type

install.packages("glcm")

at your R command prompt to download the latest CRAN release. To install the latest development version from github using devtools type:

install_github("glcm", user="azvoleff")

For more information on the development version, see the github project page for glcm.

glcm 0.2 released was originally published by Alex Zvoleff at Alex Zvoleff on February 17, 2014.

Calculating image textures with GLCM

2014-03-19T00:00:00-00:00

glcm can calculate image textures from either a matrix or a Raster* object from the raster package. First install the package if it is not yet installed:

if (!(require(glcm))) install.packages("glcm")

## Loading required package: glcm

The below examples use an image included in the glcm package, a red/green/blue cutout of a Landsat 5 image from 1986 from a Tropical Ecology Assessment and Monitoring (TEAM) Network site in Volcan Barva, Costa Rica. The image is included in the glcm package as L5TSR_1986:

library(raster) # needed for plotRGB function

## Loading required package: sp

plotRGB(L5TSR_1986, 3, 2, 1, stretch='lin')

To calculate GLCM textures from this image using the default settings, type:

textures <- glcm(raster(L5TSR_1986, layer=3))

where raster(L5TSR_1986, layer=3) selects the third (red) layer. To see the textures that have been calculated by default, type:

names(textures)

## [1] "glcm_mean"          "glcm_variance"      "glcm_homogeneity"  
## [4] "glcm_contrast"      "glcm_dissimilarity" "glcm_entropy"      
## [7] "glcm_second_moment" "glcm_correlation"

This shows the eight GLCM texture statistics that have been calculated by default. These can all be visualized in R:

plot(textures$glcm_mean)

plot(textures$glcm_variance)

plot(textures$glcm_homogeneity)

plot(textures$glcm_contrast)

plot(textures$glcm_dissimilarity)

plot(textures$glcm_entropy)

plot(textures$glcm_second_moment)

plot(textures$glcm_correlation)

Calculating image textures with GLCM was originally published by Alex Zvoleff at Alex Zvoleff on February 17, 2014.

wrspathrow 0.1 released

2014-02-13T00:00:00+00:00

A new R package for working with path and row numbers from the World Reference System (WRS) grids (both WRS-1 and WRS-2) is now available on CRAN.

For more information see the github project page for the wrspathrow package, or see this post.

wrspathrow includes functions for determining the path and row number(s) needed to cover a given spatial object, or, conversely, for returning the polygon for a given path and row. Note that installation of the wrspathrow package may take a bit of time due to the need to download the wrspathrowData package it depends on. The wrspathrowData package is approximately 26MB in size, as it includes the full WRS-1 and WRS-2 vector grids in R format. My thanks to the USGS for allowing the re-release of these reformatted datafiles on CRAN. See the USGS WRS-1 and WRS-2 shapefile download page for the original source of these files.

wrspathrow 0.1 released was originally published by Alex Zvoleff at Alex Zvoleff on February 13, 2014.

Running ABMs in the Cloud with Amazon EC2

2013-04-10T00:00:00+00:00

PyABM can be installed on an Amazon Elastic Computing Core (EC2) instance to allow you to run agent-based models (ABMs) in the cloud. If you are new to Amazon EC2, see the EC2 overview before you get started. Amazon also has a special page on high performance computing (HPC) with EC2. You will also probably want to look at the available Amazon EC2 instance types, and of course the pricing information before you get started.

A basic cluster with a manager node and two worker nodes will run you about $1.50 - $2.00 per hour, depending on the options you choose. You can vary the number of CPU cores and the memory in your worker nodes depending on the needs of your modeling. In general my models are CPU limited (not requiring large amounts of memory) so I will create a small cluster of three Amazon EC2 instances, with one “Large Standard On-Demand” (m1.large) instance to manage the cluster, and two “Extra Large High-CPU On-Demand” (c1.xlarge) instances as worker nodes.This configuration gives me a total of 16 processor cores to work with (8 per worker node), so I can run 16 model runs at the same time.

The cost to run this cluster is $1.58 per hour with current Amazon EC2 pricing.
Note that pricing varies depending on the region you choose to place your clusters in - the cheapest region is currently in northern Virginia in the United States.

The easiest way I have found to get Amazon EC2 clusters up and running is using StarCluster - a python program that makes setting up, running, and managing Amazon EC2 clusters much easier.

Running ABMs in the Cloud with Amazon EC2 was originally published by Alex Zvoleff at Alex Zvoleff on April 10, 2013.

ChitwanABM 1.5 Released

2013-02-23T00:00:00+00:00

The latest release of Chitwan ABM (version 1.5) is now available at the Python Package Index.

ChitwanABM 1.5 Released was originally published by Alex Zvoleff at Alex Zvoleff on February 23, 2013.

PyABM 0.3.3 released

2013-02-01T00:00:00+00:00

The latest release of PyABM (version 0.3.3) is now available at the Python Package Index.

PyABM 0.3.3 released was originally published by Alex Zvoleff at Alex Zvoleff on February 01, 2013.

Modifying the PyABM source code

2014-03-19T00:00:00-00:00

Modifying the latest release of PyABM

If you plan on making any changes to the PyABM source code, you can use a pip “editable install” to install PyABM in your local user folder so that you can edit the source code in place without having to rebuild and reinstall the PyABM package every time you make a change. To use this feature, first install pip, then open a command window and type:

sudo pip install -e pyabm

in a command prompt in Linux, or

    pip install -e pyabm

on Windows. This will install the latest release of PyABM as an editable install so that you can import pyabm from any python window or script, and have the module imported from your own version of the source code (which will be in a folder named something like C:\users\azvoleff\src\pyabm (on Windows) or /home/azvoleff/src/pyabm (on Linux).

Modifying the development version of PyABM

There are several ways to end up with an editable version of the current development version of PyABM. If you have git installed on your system you can use a pip editable install to download the latest version of PyABM from github for you, and install it in an editable mode. If you do not have git you use distribute, which offers a “development mode”. With both approaches, the end result is having the development version of PyABM installed in such a way that any changes you make to the code take effect immediately.

If you have git installed on your system, you can use pip to clone and install the development version of PyABM from github by typing:

    sudo pip install -e git+https://github.com/azvoleff/pyabm.git#egg=pyabm

in a command prompt in Linux, or

    pip install -e git+https://github.com/azvoleff/pyabm.git#egg=pyabm

on Windows. An advantage of cloning PyABM from github is that you can easily update your copy of PyABM to include the latest changes by navigating to the main PyABM folder and typing

    git pull

to pull the latest version of the PyABM source code from git and (depending on your settings in git) merge any upstream changes with your local edits.

If you do not have git installed, download the latest development source of PyABM as a zip file. After downloading the PyABM source code, navigate to the main PyABM folder (the one with setup.py) and type:

    python setup.py develop

This will install a development version of PyABM and setup your system so that you can import pyabm from your Python interpreter from your Python scripts.

Modifying the PyABM source code was originally published by Alex Zvoleff at Alex Zvoleff on November 27, 2012.

PyABM 0.3.2 released

2012-11-20T00:00:00+00:00

The latest release of PyABM (version 0.3.2) is now available at the Python Package Index.

PyABM 0.3.2 released was originally published by Alex Zvoleff at Alex Zvoleff on November 20, 2012.

PyABM logging setup

2012-11-19T00:00:00+00:00

PyABM uses the python logging module, so that warning and informational messages from the PyABM module can be written to the console, and also saved to files along with model output. For flexibility, and consistent with recommended usage of the logging module, configuration of the logging output is left up to the user of PyABM. If you import pyabm directly from a python session without setting up logging first, you will see a warning message when PyABM tries to log a message:

In [1]: import pyabm
No handlers could be found for logger "pyabm.rcsetup"
In [2]:

For this example, I have forced PyABM to try to print an error message, by specifying the wrong path to git in my pyabmrc. To see the error message, I will need to import logging and configure a logging handler prior to importing PyABM:

In [1]: import logging
In [2]: logging.basicConfig()
In [3]: import pyabm
WARNING:pyabm.rcsetup:Failure while reading rc parameter path.git_binary on 
line 42 in /home/azvoleff/pyabmrc: /wrong/path/to/git does not exist. Reverting 
to default parameter value.
WARNING:pyabm.rcsetup:git version control features disabled. Specify valid git 
binary path in your pyabmrc to enable.
In [4]:

I now see a warning telling me that /wrong/path/to/git does not exist, and PyABM tries the default path to git specified in rcparams.default. This path also does not exist (as I am running this example on Linux and PyABM is setup for Windows by default) so I then see another warning ‘git version control features disabled’ as PyABM is not able to find git. If I fix my pyabmrc file to give the correct path to git (which, on my system, is /usr/bin/git) I can import pyabm without seeing any error messages:

In [1]: import logging
In [2]: logging.basicConfig()
In [3]: import pyabm
In [4]:

PyABM logging setup was originally published by Alex Zvoleff at Alex Zvoleff on November 19, 2012.

PyABM configuration using 'pyabmrc' files

2012-11-18T00:00:00+00:00

PyABM configuration is done using a pyabmrc text file. When loaded in Python, using:

import pyabm

PyABM will search for a pyabmrc file. PyABM will search three locations, in order:

the current working directory
the current user’s home directory
the pyabm module directory

PyABM will use the first pyabmrc file it finds, ignoring any others. Example pyabmrc files are provided with PyABM versions < 0.3.1, in pyabmrc.windows and pyabmrc.linux, in the main module folder (under pyabm\pyabm in the development version). To set custom values for any of the pyabmrc parameters, rename the proper file from to ‘pyabmrc’ and move it to one of the three above locations. See the pyabmrc.default file for details on each parameter and on possible parameter values. Changes can also be made in the rcparams.defaults file in the PyABM module directory, but this is not recommended as these values will be overwritten when PyABM is upgraded.

PyABM configuration using 'pyabmrc' files was originally published by Alex Zvoleff at Alex Zvoleff on November 18, 2012.