User Tools

Site Tools


remote_sensing_methods:classification_and_regression_tree_cart

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Last revision Both sides next revision
remote_sensing_methods:classification_and_regression_tree_cart [2012/03/08 16:09]
jgillan
remote_sensing_methods:classification_and_regression_tree_cart [2012/07/23 14:12]
calbury
Line 15: Line 15:
 Object-based classification is a general term for a type of image classification applied to groups of pixels or “objects” (also known as image segments). ​ These objects represent features on the ground, such as uniform stands of vegetation, water bodies, or meadows. ​ Objects are created through a process known as segmentation,​ which is performed by specialized software such as Trimble’s eCognition. ​ Object-based classification is a general term for a type of image classification applied to groups of pixels or “objects” (also known as image segments). ​ These objects represent features on the ground, such as uniform stands of vegetation, water bodies, or meadows. ​ Objects are created through a process known as segmentation,​ which is performed by specialized software such as Trimble’s eCognition. ​
  
-Like other types of image classification,​ object-based classification divides an image into a set of discrete classes such as vegetation types. ​ This method is applicable when utilizing simple to complex classification hierarchies that include vegetation types that may be difficult to distinguish from one another.  ​+Like other types of image classification,​ object-based classification divides an image into a set of discrete classes such as vegetation types. ​ This method is applicable when utilizing simple to complex classification hierarchies that include vegetation types that may be difficult to distinguish from one another.  ​
  
 There are a variety of methods for classifying objects, with some more sophisticated than others. ​ In general, RSAC prefers classification and regression tree (CART)–type algorithms because they are robust, relatively easy to use, and reliably produce good results. ​ Therefore, this article will focus on CART-based methods. ​ For information on other object-based classification methods, please refer to the Additional Information section below. There are a variety of methods for classifying objects, with some more sophisticated than others. ​ In general, RSAC prefers classification and regression tree (CART)–type algorithms because they are robust, relatively easy to use, and reliably produce good results. ​ Therefore, this article will focus on CART-based methods. ​ For information on other object-based classification methods, please refer to the Additional Information section below.
  
-CART analysis is a process that builds models called decision trees—so called because of their tree-like structure—based on training data.  Decision trees can be used for classification (predicting what group a case belongs to) and for regression (predicting a continuous value). ​ Decision trees are not only valuable for their classification potential, but they also provide ​users with insight about the relationships of dependent and independent variables. ​ CART analysis can be performed by a variety of software packages; nevertheless,​ these packages all require some customization to be properly integrated into the vegetation mapping process.  ​+CART analysis is a process that builds models called decision trees—so called because of their tree-like structure—based on training data.  Decision trees can be used for classification (predicting what group a case belongs to) and for regression (predicting a continuous value). ​ Decision trees are not only valuable for their classification potential, but they also provide insight about the relationships of dependent and independent variables. ​ CART analysis can be performed by a variety of software packages; nevertheless,​ these packages all require some customization to be properly integrated into the vegetation mapping process.  ​
  
 Once decision trees are generated, the user will usually do some pruning. ​ Trees often over-fit data and create an excessive number of nodes and branches. ​ The user can usually either interactively prune their decision trees or experiment with different parameters controlling the thresholds for new tree branches. ​ Once pruning is complete, the decision tree should be tested. ​ Ideally the tree (and its rules) should be tested on an independent dataset or data withheld from the original data set.  Once an acceptable level of accuracy has been achieved, the decision tree rules (or model) can be applied. Once decision trees are generated, the user will usually do some pruning. ​ Trees often over-fit data and create an excessive number of nodes and branches. ​ The user can usually either interactively prune their decision trees or experiment with different parameters controlling the thresholds for new tree branches. ​ Once pruning is complete, the decision tree should be tested. ​ Ideally the tree (and its rules) should be tested on an independent dataset or data withheld from the original data set.  Once an acceptable level of accuracy has been achieved, the decision tree rules (or model) can be applied.
  
-CART analysis algorithms are not generally included in image processing software packages. ​ However, several software packages are available commercially. ​ RSAC has shown success using the See5 and Cubist software packages. ​ In addition, RSAC frequently uses the random forests classifier in the freeware statistical package, "R".  Random forests is a CART analysis tool, but instead of using a single decision tree, random forests creates many decision trees, each of which is derived from a random subset of the training data.  The prediction from each decision tree is tabulated and the majority prediction is used for the classification. ​ This approach eliminates over-fitting and the need for pruning, simplifying the analysis process.+CART analysis algorithms are not generally included in image processing software packages. ​ However, several software packages are available commercially. ​ RSAC has shown success using the See5 and Cubist software packages. ​ In addition, RSAC frequently uses the random forests classifier in the freeware statistical package, "R." ​ ​Random forests is a CART analysis tool, but instead of using a single decision tree, random forests creates many decision trees, each of which is derived from a random subset of the training data.  The prediction from each decision tree is tabulated and the majority prediction is used for the classification. ​ This approach eliminates over-fitting and the need for pruning, simplifying the analysis process.
  
 CART classification is not limited to object-based analysis. ​ CART analysis can be applied to individual pixels and to continuous data such as slope analysis (see the [[remote_sensing_methods:​RSAC Riparian Mapping Tool|RSAC Riparian Mapping Tool]]). CART classification is not limited to object-based analysis. ​ CART analysis can be applied to individual pixels and to continuous data such as slope analysis (see the [[remote_sensing_methods:​RSAC Riparian Mapping Tool|RSAC Riparian Mapping Tool]]).
Line 32: Line 32:
   * [[remote_sensing_methods:​rsac_riparian_mapping_tool|RSAC Riparian Mapping Tool]]   * [[remote_sensing_methods:​rsac_riparian_mapping_tool|RSAC Riparian Mapping Tool]]
 ===== Data Inputs ===== ===== Data Inputs =====
-CART classification can be performed on any digital image. It is frequently applied to satellite or aerial imagery, vegetation indexes (e.g. normalized difference vegetation index [NDVI]) derived from such imagery, and a variety of other ancillary data such as topographic or climatic data.  A high quality and comprehensive training data set is also required. ​+CART classification can be performed on any digital image. It is frequently applied to satellite or aerial imagery, vegetation indexes (e.g.normalized difference vegetation index [NDVI]) derived from such imagery, and a variety of other ancillary data such as topographic or climatic data.  A high quality and comprehensive training data set is also required. ​
 ===== Method Products ===== ===== Method Products =====
 This method produces a new layer with objects assigned to unique vegetation classes. This method produces a new layer with objects assigned to unique vegetation classes.
Line 43: Line 43:
   - Classification check and manual editing.   - Classification check and manual editing.
 ==== Segmentation ==== ==== Segmentation ====
-Segmentation is the process of dividing images into spectrally and spatially cohesive objects that are representative of features on the ground. ​ These objects can be vegetation patches of similar physiognomy,​ structure, and floristics, or other uniform features such as lakes and roads. ​ The segmentation is performed by a specialized image-processing package. ​ Several segmentation packages are available; click [[http://​www.ioer.de/​segmentation-evaluation/​results.html|here]] for an extensive list.  RSAC evaluated many of these packages. ​ Trimble’s eCognition 8.64 performed best in this evaluation. However, Berkeley Image Segmentation also yielded reasonably good results for a significantly lower price. Currently, eCognition is available to all Forest Service regional offices.+Segmentation is the process of dividing images into spectrallyand spatiallycohesive objects that are representative of features on the ground. ​ These objects can be vegetation patches of similar physiognomy,​ structure, and floristics, or other uniform features such as lakes and roads. ​ The segmentation is performed by a specialized image-processing package. ​ Several segmentation packages are available; click [[http://​www.ioer.de/​segmentation-evaluation/​results.html|here]] for an extensive list.  RSAC evaluated many of these packages. ​ Trimble’s eCognition 8.64 performed best in this evaluation. However, Berkeley Image Segmentation also yielded reasonably good results for a significantly lower price. Currently, eCognition is available to all Forest Service regional offices.
  
 Using eCognition, the user can influence the size and shape of the segments by adjusting the software’s scale, color, and shape settings to obtain segments that match ground features. ​ The settings that produce good segments vary from image to image, requiring some experimentation until the proper settings are found.  ​ Using eCognition, the user can influence the size and shape of the segments by adjusting the software’s scale, color, and shape settings to obtain segments that match ground features. ​ The settings that produce good segments vary from image to image, requiring some experimentation until the proper settings are found.  ​
Line 49: Line 49:
 {{remote_sensing_methods:​CART_segmentation.jpg?​600x323}} {{remote_sensing_methods:​CART_segmentation.jpg?​600x323}}
  
-Figure 1: Example segmentation in two areas with (A) higher vegetation density and (B) lower vegetation density(from Hamilton and others, 2007).+Figure 1: Example segmentation in two areas with (A) higher vegetation density and (B) lower vegetation density (from Hamilton and others, 2007).
 ==== Calculation of Zonal Statistics ==== ==== Calculation of Zonal Statistics ====
 Object-based classification can take advantage of a variety of data beyond remote sensing imagery, such as elevation data in the form of a digital elevation model (DEM). ​ The input data must be summarized for each of the segments by computing zonal statistics, typically the mean, for each segment. ​ For example, a representative elevation for each segment is calculated by averaging the values of all of the pixels that a segment covers. ​ Object-based classification can also incorporate thematic data such as ecotype. ​ If a segment covers more than one thematic class, the dominant class is assigned to the segment. The CART classification is performed on the zonal statistics from each segment. Object-based classification can take advantage of a variety of data beyond remote sensing imagery, such as elevation data in the form of a digital elevation model (DEM). ​ The input data must be summarized for each of the segments by computing zonal statistics, typically the mean, for each segment. ​ For example, a representative elevation for each segment is calculated by averaging the values of all of the pixels that a segment covers. ​ Object-based classification can also incorporate thematic data such as ecotype. ​ If a segment covers more than one thematic class, the dominant class is assigned to the segment. The CART classification is performed on the zonal statistics from each segment.
Line 72: Line 72:
 This is a processing intensive method that requires substantial processing power and large amounts of memory. This is a processing intensive method that requires substantial processing power and large amounts of memory.
 ===== Technical References ===== ===== Technical References =====
-Hamilton, R.; Megown, K.; Mellin, T.; Fox, I. 2007. {{remote_sensing_methods:​0094-RPT1.pdf|Guide to automated stand delineation using image segmentation.}} Rep. No. RSAC-0094-RPT1. ​ Salt Lake City, UT: U.S. Department of Agriculture Forest Service, Remote Sensing Applications Center. 17p. 20 p.+Hamilton, R.; Megown, K.; Mellin, T.; Fox, I. 2007. {{remote_sensing_methods:​0094-RPT1.pdf|Guide to automated stand delineation using image segmentation.}} Rep. No. RSAC-0094-RPT1. ​ Salt Lake City, UT: U.S. Department of Agriculture Forest Service, Remote Sensing Applications Center. 17p.
 ===== Additional Information ===== ===== Additional Information =====
 More on [[remote_sensing_methods:​object-based_classification|object-based classification.]] More on [[remote_sensing_methods:​object-based_classification|object-based classification.]]
remote_sensing_methods/classification_and_regression_tree_cart.txt · Last modified: 2012/08/27 09:46 by calbury