The topic of sample design is a very involved one where there are few singly right solutions. The choice of techniques for designing field studies is driven by many factors like the overall assessment/monitoring objectives, sampling objectives, the population being sampled, underlying variability, etc. The best approach to sample design is to couple a thorough understanding of the system you're sampling with a good foundation in basic sampling concepts and the different sample design methods. This wiki page on sample design for rangeland assessment and monitoring is necessarily brief, and serves the purpose of just introducing basic concepts and linking to other resources.
Perhaps the best treatment of sample design for assessment and monitoring is from Elzinga, Salzer, and Willoughby's (2001) Measuring and Monitoring Plant Populations. Chapters 5 and 7 are particularly helpful for sample design and are the basis for much of the materials below.
The discussions below assume a basic understanding of the purpose and principles of sampling. The following sites have good (and gentle) introductions to sampling for natural resources:
Sample design methods generally refer to the technique used to select sample units for measurement (e.g., select individuals from a population or locations to sample within a study area). Before sample design methods can be considered, it is necessary to have thoroughly defined the population, study area, sampling unit, and sampling objective. All of these will have an impact on which sample design methods are suitable. Selection of a suitable sample design method ensures that the samples you invest your time and money into collecting can support the inferences you want to make. Use of a sample design method that is not appropriate can lead to samples that are biased with respect to your assessment or monitoring objectives. In this case, inference is valid only for samples/sites that were measured, and not the larger area/population.
Sample design methods are typically divided into two types: Non-random and random methods. These two types and commonly-applied methods within each are discussed below. It is not uncommon for sample design for a single project to include aspects of random and non-random selection. For example, sample site locations may be selected randomly within a study area, but the transects or plots to be sampled within the site may be located systematically. In this case, the randomization of the site locations can preserve the statistically-unbiased nature of the overall sample design. However, just because randomization is included at some point in the sample design doesn't guarantee good sample design. Selecting site locations non-randomly based on local knowledge and then randomizing the locations of plots within each site will not result in a statistically unbiased sample. Attention must be paid to where the randomization occurs relative to the distribution of the population being sampled to ensure that the overall sample design maintains the desired statistical properties
Common sample design methods.
Non-random sampling methods select locations for sampling by either: according to regular (i.e., systematic) patterns, targeting specific features or events, using personal or anecdotal information, or without any specific plan. Care must be exercised when using non-random sample selection methods because the samples may not be representative of the entire population. If this is the case, then inference cannot extend beyond the set of sampling units. Some common non-random sample design techniques are discussed below. Unless otherwise stated, the primary reference for these discussions was Elzinga et al. (2001).
Systematic sampling is the selection of units for sampling or the placement of sampling locations within an area according to a regularly-repeating pattern. Examples of systematic sampling are: locating sample sites on a 1km grid within a pasture, taking measurements every meter along a transect, or orienting transects along cardinal directions. Systematic techniques are commonly used to locate sub-plot sampling sites (e.g., points, transects, frames) within a sampling site where the location of the sampling site has been selected randomly. Alternatively, larger sampling units can be selected systematically and then the location of the specific sampling unit randomly selected within the larger unit (i.e., a form of two-stage sampling or restricted random sampling - see below). This technique is often used with regional- or national-scale assessment and monitoring programs like the NRCS Natural Resource Inventory (NRI) or the USFS Forest Inventory and Analysis (FIA) programs.
Advantages of systematic sampling are:
Disadvantages of systematic sampling are:
Targeted or selected sampling is common in rangeland assessment and monitoring. With this method, areas are subjectively selected for sampling according to a particular objective. The subjective nature of selecting the sampling locations, however, can easily introduce bias into the results and preclude being able to assess sampling errors. For these reasons, it is not appropriate to extent inference of sampling results beyond the elements sampled to the whole population. For a random sampling method that can, in some cases, achieve the same end as targeted sampling (i.e., selection of areas representative of some specified condition), see Unequal Probability Sampling below.
The key area concept is a form of non-random targeted sampling. The idea of key areas is to select locations for sampling that are representative of either a larger area (e.g., an allotment or pasture) or to critical areas (e.g., high impact sites or locations where rare species occur). Assessment and monitoring then takes place in these locations. Because statistical inferences can only be made to the key areas that are sampled, and because sampling results from different key areas cannot be averaged, objectives should be defined specific to the key areas being measured.
Targeted sampling is also common in remote-sensing applications. When creating a land-cover or vegetation-class map from remotely-sensed imagery, field observations are often needed to “train” the classification algorithm to the classes being mapped. In these cases, statistical inference of the field observations to the entire population is not an objective of the sampling. Areas are selected in a targeted manner to represent the range of variability within each class and for ease in data collection in the field. Use of a randomization method of sample design for this type of remote sensing application would be an inefficient way to get the needed data.
Advantages of targeted sampling
Disadvantages of targeted sampling
Haphazard sampling occurs when samples are collected in the field without any pre-determined method for deciding where to sample. In essence, this approach is the de facto method when no other method was used. It probably could go without saying that haphazard sampling leads to data that cannot be used to make inferences to other areas or a larger population, and this method should be avoided for assessment and monitoring. Data collected with this technique is considered anecdotal information.
Random sampling methods rely on randomization at some point in the sample design process in an attempt to achieve statistically unbiased samples. Random sampling methods are a form of design-based inference where 1): the population being measured is assumed to have fixed parameters at the time they are sampled, and 2) that a randomly-selected set of samples for the population represents one realization of all possible sample sets (i.e., the sample set is a random variable). There are many different random sampling techniques. Some of the most common techniques are described below. Unless otherwise stated, the primary source for information on these methods is Elzinga et al. (2001).
Simple random sampling is the foundation for all of the other random sampling techniques. With this approach, all of the sampling units are enumerated and a specified number of sampling units are selected at random from all the sampling units. Selection of samples for simple random sampling follow two criteria:
Simple random selections are easy and fast to implement using a variety of GIS, statistical, or spreadsheet programs.
Advantages of simple random sampling
Disadvantages of simple random sampling
Stratification is the process of dividing a set of sampling units into one or more subgroups (i.e., strata) prior to selection of units for sampling. Sampling units are then selected randomly within each stratum. The purpose of using stratification is to account for variability in a population that can be explained by another variable (e.g., vegetation type, aspect, soil type). Therefore, strata should be defined so that the population conditions are similar within the strata.
Sampling effort does not need to be equally allocated between strata. It is common for sampling intensity to be varied between strata based on either the variability of the population parameter within the strata or the size of the strata.
Advantages of stratified random sampling
Disadvantages of stratified random sampling
In restricted random sampling, the area to be sampled is divided up into large segments based on the number of sampling units needed to meet monitoring objectives. Within each segment, a single sampling unit is then selected (i.e., a single sampling location is selected) at random. The samples are analyzed as if they were collected using the simple random sampling technique. This technique helps ensure good coverage of points within a study area. Many GIS random-point-generation tools include a derivation of this technique - enforcing a minimum distance between sample points.
Restricted random sampling has similarities to both systematic sampling and stratified random sampling. The distinctions however, are that: 1) while the segments into which the population was divided are technically a form of stratification, they are arbitrary with respect to the system and only one sample is collected per segment, and 2) the area need not be divided into equally spaced or shaped segments like would be the case in systematic sampling.
Advantages of restricted random sampling are:
Disadvantages of restricted random sampling are:
One of the main assumptions for simple random sampling is that all sampling units have an equal likelihood of being selected for sampling. However, as discussed with a number of the other sample design techniques, this can lead to inefficiencies in sampling, especially if the sampling objective is to focus on a subset of the population or there are logistical constraints in getting to some portions of the total area. In these cases, non-random targeted (e.g., key area) sampling becomes tempting. An alternative to simple random sampling that can help address some of these issues is sampling with unequal selection probabilities.
Basically, this method works in a similar manner to simple random sampling except that the sampling units have different probabilities of being selected. How the selection probabilities are determined and assigned to the sampling units is not as important as is the knowledge of the selection probability assigned to EVERY sampling unit. Accordingly, samples can be weighted toward “representative” or “critical” areas or assigned to give preference to sampling units that are within easily accessible regions of the study area. Preferentially selecting units for sampling introduces bias into the sampling results, but the fact that we know the likelihood associated with selecting each sampling unit allows for the bias to be corrected for. In essence, individual samples are weighted according to their selection probability - samples with a high likelihood of being selected have a low weight, and samples that are unlikely to be selected carry a higher weight.
Sampling with unequal selection is commonly applied in forestry surveys as sampling with probability proportional to size. Consider the example of needing to estimate the total board-feet of timber in a stand. Board-feet is correlated to diameter of the tree, so assigning selection probabilities according to the diameter of the trees in the stand allows the observer to measure a few large trees and expand those results to the entire stand using a correction (a.k.a. expansion factor) calculated from the selection probabilities.
The probability-proportional-to-size concept can be generalized to probability proportional to a covariate. In rangeland situations, many of the parameters of interest are correlated with different remote-sensing products. These image products can be used to calculate selection probabilities. For instance, if “key area” samples (i.e., representative of a larger area) are desired, a greenness index such as NDVI could be used to assign selection probabilities such that extreme conditions received low selection probabilities and the most typical areas received the highest selection probabilities.
Advantages of Unequal Probability Sampling
Disadvantages of Unequal Probability Sampling
See Horvitz and Thompson (1952), Saxen et al. (1986), Rosen (1997) and Berger (2004) for more information
Adaptive sampling refers to a technique where the sample design is modified in the field based on observations made at a set of pre-selected sampling units. Perhaps the best way to describe adaptive sampling is through an example. Consider sampling for the presence or abundance of rare plants. A random selection of sample units will yield many sample units where the plant is not detected, but the rare plant is likely to occur in sample units nearby to those units where it was detected. With adaptive sampling, the detection of the rare plant at one site triggers the selection and sampling of additional nearby sites that were not originally selected as part of the sample set. Thus the biggest difference between adaptive sampling and many other random selection techniques is that the observed conditions at one sampling unit influence the selection of other sampling units.
One typical implementation of adaptive sampling is that whenever a specified event occurs (e.g., detection of a target species, measurement over a specified threshold), all of the neighboring sample units are searched/sampled. This continues until no new detections occur.
Adaptive sampling introduces bias into the samples that must be corrected for. More specifically, adding additional units to the sample that contain high values for the parameter being measured will result in overestimation of the population mean (Thompson 1992). Various techniques are available for correcting for the bias introduced by adaptive sampling.
Advantages of Adaptive Sampling
Disadvantages of Adaptive Sampling
See Thompson (1992), Thompson and Thompson and Seber (1996), and Prather (2006) for details on adaptive sampling.
Cluster sampling is a technique that can be applied when it is not possible or desirable to take a random sample from the entire population. With cluster sampling, the known or accessible sampling units are grouped into clusters. A random selection of clusters is then made and each sampling unit is measured within each of the selected clusters. Cluster sampling is typically applied to monitoring of rare plants or invasive species when the objective is to estimate a property related to individual plants (e.g., mean height, number of flowers per plant).
Advantages of cluster sampling are:
Disadvantages of cluster sampling are:
In two-stage sampling, elements of the population are grouped together into large groups called primary sampling units. The individual sampling units within each primary sampling unit are called secondary sampling units. A random selection of the primary sampling units is made, and then a selection of secondary sampling units is made (usually random, but can be systematic) within each of the selected primary sampling units.
Two-stage sampling is a powerful sample design method for systems that are hierarchical in nature. For example, allotments within a BLM District could be considered primary sampling units. A random selection of allotments could be made and then sample sites selected within the selected allotments. This design would allow for inference at the allotment level (e.g., average allotment condition) as well as at the district level.
The concept of two-stage sampling can be generalized to multi-stage sampling where there are more than two hierarchical levels for sampling. However, as the number of stages increases, sample size requirements go up and degrees of freedom for statistical hypothesis testing decrease. Accordingly, the number of stages is generally small (i.e., two or three).
Advantages of two-stage sampling are:
Disadvantages of two-stage sampling are:
Double sampling (also called two-phase sampling - not to be confused with two-stage sampling above) involves estimating two correlated variables. This method would be used in cases where the primary variable of interest is expensive or difficult to measure, but a secondary covariate is easily measurable. A small number of sample units are randomly selected and both variables are measured at these locations. The secondary variable only is then measured at a larger number of randomly selected points. The success of a double-sampling sample design depends on how well correlated the primary and secondary variables are.
Double-sampling is commonly used in estimation of above-ground biomass in rangelands. Clipping and weighting of vegetation is expensive and tedious. With the double-sampling method, ocular estimates of biomass are made for a small number of quadrats, and the vegetation on those quadrats is then clipped and weighed. For the remaining quadrats, only the ocular estimates are performed.
Advantages of double sampling are:
Disadvantages of double sampling are:
Spatially-balanced refers to samples that are evenly distributed across a study area. Spatially balanced sampling is much more efficient than simple random sampling if the population being sampled is more-or-less evenly distributed across the area being sampled. While a systematic sample design can achieve complete spatial balance, it lacks randomization that is desirable in statistical sample designs and it is difficult to apply when the units being selected for sampling are not contiguous within the study area (e.g., selecting lakes or wetlands to sample).
There are several different techniques for creating spatially-balanced sample designs, but one of the most common ones is the Generalized Random-Tessellation Stratified (GRTS) design described by Stevens and Olsen (2004). For sampling within an area, the GRTS technique works as follows:
The GRTS method produces samples that are spatially-balance. It also has the interesting property that each subsequent sample location selected using GRTS will be spatially-balanced with respect to the previous points. The benefit of this is that an “oversample” of sampling units can be drawn (e.g., draw a sample of 30 units when you only intend to sample 20) and if one unit needs to be thrown out for some reason (e.g., access restrictions), then the next selected unit (sample unit 21 in the example above) will maintain the statistical properties of the original 20 sample units.
Image source:Stevens and Olsen (2004)
An example of a recursive, hierarchical randomization process applied to an area to be sampled. This process is used to assign the random, but spatially-balanced order to the sampling units. The area is first split into four quadrats. Each quadrat is then split into four smaller quadrats, and so on until there is only one sampling unit per quadrat (or until the size of the quadrats equals the desired distance between samples). The sample units are then ordered according to the numbering assigned to the quadrats. In this example, the main quadrats and the sub-quadrats have the same number ordering, but in practice, random numbering is assigned for each quadrat level.
Advantages of Spatially-Balanced Sampling (using GRTS)
Disadvantages of Spatially-Balanced Sampling (using GRTS)
See Stevens and Olsen (2003, 2004) for details of GRTS
You must have an account and be logged in to post or reply to the discussion topics below. Click here to login or register for the site.