Raster Analysis

RASTER ANALYSES

The following provides an overview of the various analytical functions available for raster data sets.

Analysis Environment

Some GIS packages require that all input raster layers have the same resolution (cell size) and extents. Some of the more recent GIS software allow for layers with a variety of cell size and extents to be analyzed. In this case it is important to be sure to set these parameters for the output layer prior to doing the analysis. A third parameter to consider is that of an analysis mask. Map extents are rectangular in shape; however, your study area may have an irregular shape (e.g. watershed, island, municipal boundary). A mask is a layer that contains the value NoData (or sometimes zeros) for cells outside of the study area. In this way the analysis is only conducted on cells within the study area. A fourth parameter is the output directory.

Breakout of Raster Analyses

· Raster

o Reclassify

o Raster Calculator

o Spatial Interpolation (IDW, spline, Kriging)

o Density

o Surface (DTM)

§ Slope & Aspect from elevation

§ Hillshade / Illumination

§ Viewshed

§ Watershed

o Distance

§ Straight line

§ Allocation

§ Cost surface

§ Cheapest Path

Reclassify

Reclassify converts cell values in the input layer to new values in the output layer. Thus it can be used for:

· simplifying (or generalizing) a data layer – converting raw data into classes

· bringing values to a common scale – often converting to values in the range of 0 – 10 (where zero has no value and 10 has the highest value).

We often bring values to a common scale when doing a multi-criteria evaluation. In this case each criteria (or map layer) is brought to a common scale to simplify the process of combining the layers. Consider the simple case of combining slopes and land value to determine suitability for land development. It is difficult to conceive of combining slope (measured in degrees) with value (measured in dollars). In such an analysis several layers (i.e. soils, steepness of slope and land value) are combined to determine the most suitable area for a proposed use (e.g. subdivision development). This would be a 2-step process where first we would reclassify the map layers, and second we would combine the layers. (Note, the combining of layers is a different process. With ArcView we accomplish this task with the raster calculator – see next section).

The vector equivalent is to reclassify the attribute data in the table, and then dissolve boundaries based on this generalization.

Raster Calculator

This ability to conduct mathematical operations on the values in raster cells is sometimes referred to as map algebra. The raster calculator is a powerful and flexible analytical function. The functionality can be divided into two broad groups: functions and operators.

Mathematical functions are performed on a single raster layer and include arithmetic, logarithmic, trigonometric and power functions. A simple example is the conversion of elevation in feet to metres. As temperature is related to elevation, the conversion of elevation to temperature is another (more complex) example. A study in Kenya yielded the following relationship between elevation and mean annual temperature:

Temperature (ºC) = -0.0016 * (elevation metres) + 26.985

Thus, multiplying each elevation value by –0.0016 and then adding 26.985 will yield an approximation of the mean annual temperature for that location. You can see from an example such as this where the term map algebra came from.

Mathematical operators work on one or more layers. These operators evaluate the coincident cells (cells that overlap) from two or more layers and can be divided into three categories: arithmetic, Boolean and relational.

Arithmetic operators include +, -, * and /. As an example, if two layers (e.g. soils and steepness of slope) were previously reclassified to a suitability rating ranging from 1 – 10, these two layers could then be added to determine overall suitability. This suitability layer would then range from 0 – 20.

Boolean operators include AND, OR, NOT and XOR. Boolean logic pertains to conditions of true or false (classically represented as 1’s and 0’s in a raster layer). In the previous example the soils and slope layers were reclassified to a scale of 1 – 10. In this case the soils and slope layers would be reclassified to suitable (1) and unsuitable (0). A Boolean operation would then test for cells that contain 1’s (i.e. true) on both layers, e.g. soils =1 and slopes = 1. The overall suitability layer would contain 1’s only where there were 1’s on both the input layers.

Relational (or conditional) operators include <, >, <=, >=, = and <>. A relational operator can be used to compare the cell values of one layer against a constant value (e.g. slopes < 10 degree) or two layers (e.g. 1999 Land value > 2003 Land value). The output of a relational operator is Boolean (i.e. 1’s and 0’s). In the case of 1999 Land Value > 2003 Land Value, the cells where the 1999 land value exceeded that of 2003 would result in a cell value of 1 in the output layer.

You should note that when we are comparing two or more layers we are overlaying map layers. With vector data we combined the line work of the map layers (breaking the lines to create nodes and likely cleaning up sliver polygons). With vector data the output map layer also had all the data fields from both data tables. With raster the output is much simpler and there are no sliver polygons to clean up.

Interpolation

Many phenomena vary continuously over the landscape. Elevation changes as one moves across hilly/mountainous terrain. Other phenomena include temperature, soil depth, level of ground water pollution, and salinity in an estuary. This type of phenomena can be represented by point values (e.g. spot heights) or isolines (e.g. contours). However, the raster grid is an excellent data model for this type of data, as each cell is independent of the rest and can contain a unique value.

Source data often comes from point samples, whereby measurements are recorded at specific locations. Estimating the value at a location that was not sampled is a matter of interpolating a value based on the surrounding sample points. This is called spatial interpolation.

Two assumptions underlie spatial interpolation:

· the phenomenon (z value) varies continuously over the landscape (as previously stated), and

· the z value is spatially dependent.

The first assumption is easy enough to understand. But what does ‘spatially dependent’ mean. Essentially this is the “First Law of Geography” by Waldo Tobler,

Everything is related to everything else, but near things are more related than distant things.

This means that when we try to determine a value at a point that was not sampled, we should give greater weight to the closer sample points than the more distant sample points.

There are various methods for conducting the interpolation. Examples include:

· inverse distance weighting (IDW),

· spline (curve fitting),

· kriging (where direction is factored into the weighting process), and

· trend (polynomial regression, not as bad as it sounds).

We will now examine a very simple example of the IDW method. Imagine we are to estimate a value for a raster cell and there are only two sampled points we are considering. One has a value of 200 and is 10 metres away; the second point has a value of 300 and is 20 metres away (see diagram).

200

300

Clearly the value that is 10 metres away should carry more weight in the estimation (interpolation) of the raster cell than the value 20 metres away. So we use a weighting process based on distance. However, we cannot use distance directly because the greater the distance, the less weight it should carry. In our case, the value of 200 (that is 10 m away) should have twice as much weight as the value of 300 (20 m away). But 10 m is half of 20 m. This is the opposite (inverse) of what we want. The solution, therefore, is to take the inverse of the distance. In so doing 10 and 20 become 1/10 and 1/20, respectively. Note that 1/10 is twice as large as 1/20, which is just what we wanted. It is important to note that it is the relative size difference that counts, not the value itself. Multiplying 200 by 1/10 (and 300 by 1/20) makes no sense. This would yield values of 20 and 15, which sum to 35 - clearly not the right answer. To calculate the relative weight of each value we first have to sum the inverse distances. In this case we add 1/10 and 1/20 (0.1 + 0.05). Using decimals for the rest of the calculations we see that the total inverse distance is 0.15. The weight of the closer value is 0.1/0.15, which equals 0.67. The weight of the further value is 0.05/0.15, which equals 0.33. Thus the closer value contributes 67% of its value to the answer and the further value contributes only 33% of its value. The final calculation for the raster cell is

Value = (0.67 * 200) + (0.33 * 300)

= (134) + (99)

= 233

You can see that the interpolated value is closer to the value that is nearest.

A slightly more complicated example follows.

A Worked Example of IDW

A raster grid with a point theme of spot heights imposed on top.

Raw Data
Pt.	Elevation (m)
r	251
s	330
t	363
u	280

Calculation of elevation for shaded pixel

			Proportion	Elev. (m) at
Pt.	Distance (m)	Inv. Dist.	of Inv. Dist.	that point	Wt. Elev (m)
r	100	0.0100	0.12	251	30.7
s	70	0.0143	0.17	330	57.7
t	23	0.0435	0.53	363	193.3
u	72	0.0139	0.17	280	47.6
	Total Inv. Distance à	0.0817	1.00		329.4

Where,

	Proportion of Inv. Dist. = Inv. Dist. / Total Inv. Distance (e.g. 0.01 / 0.0817 = 0.12)
	Wt. Elev. = Elev. at that point * Proportion of Inv. Dist. (e.g. 0.12 * 251m = 30.7 m)
	Final value = sum of Wt. Elev. (e.g. 30.7 + 57.7 + 193.3 + 47.6 = 329.4)

One point to note is that interpolated values based on IDW are just weighted averages. As the raster cell values are averages, the surface produced does not coincide with the original data points.

Another method of spatial interpolation is called spline. The idea of splines is an old one. One way to draw smooth curves that do indeed go through defined points is to first pushpins into a drafting board. And then second, fit a thin wooden strip along these pins. See diagram below, where the circles represent the pushpins and the line represents the thin wooden strip (also known as a spline).

The diagram above is a one-dimensional example of a spline curve. Of course in the GIS wooden splines are replaced with mathematical formulae.

Two points regarding spline interpolation are worth mentioning:

· the line passes through the points (i.e. the surface coincides with the original points)

· it is possible for the line to have values that exceed the surrounding points (look at the middle hump – the peak exceeds the surrounding points.

Neither of these conditions occurs with IDW.

Kriging considers 2 factors: nearness and direction. Kriging is a multi-step process. One of the early steps is an analysis of the data point values to see if there is a "directional trend". Consider spot heights in the Andes mountain range - which has a strong north-south alignment. Steep slopes face east and west. Thus if you were standing at any given location on this mountain range and there were 4 spot heights each located 50 metres from you in cardinal directions, it is very likely that the spot heights located to the north and south are going to be closer to your elevation than the spots located to the east and west (likely located up and down a steep incline). Thus, if there is a strong "directional trend", then there should be more weighting assigned to points located "along the same contour line" vs. those located up or down hill.

There are other spatial interpolation methods (trend surfaces, and Fourier series are examples). These methods will not be discussed in this course. The most common method is IDW. One of the biggest reasons is that it is the most straightforward and easily understood. However, your choice of method should depend on your source data (number and distribution of the sample points) and nature of the phenomenon of interest. If you have a special project that requires spatial interpolation, then more research on these other methods would be in order.

Density Surface

Simply put, density is measurement of a phenomena per unit area (or volume). A very common "GIS density" measure is population - number of people per square kilometre. Other densities could include wildlife sightings, crimes or potential customers. In all cases, the end result is a raster data layer where the pixels contain density values (i.e. the number of something per m² or km²). The source data is usually in point form:

	towns/ cities
	census tracts (or city blocks) where the number of people are "assigned" to a centroid (centre point)
	incident locations - i.e. mapped points that represent crime locations or wildlife sightings

These point-values can be considered "artificial" or "real". For the examples of cities and census tracts, the point values are somewhat "artificial" in that the value attributed to a specific point actually represents a population for the immediate surrounding area. The examples of crime locations and wildlife sightings can be considered "real", such that the value for the point represents an occurrence that did indeed occur at that location.

Regardless whether the point-values are "artificial" or "real", the end result of determining density values is the same: the point values are distributed across the landscape. The actual process goes something like this:

	a search area is defined (i.e. provide the radius for a circle)
	all point values within the search area ("hoola hoop") are summed
	this sum is divided by the search area to derive the density value

The following is taken (with minor edits) from the help section in ArcGIS.

By calculating density you spread point values out over a surface. The measured quantity (originally attached to a data point) is distributed throughout a landscape, and a density value is calculated for each cell in the output raster.

Density maps are predominantly created from point data, and a circular search area is applied to each cell in the output raster being created. The search area determines the distance to search for points in order to calculate a density value for each cell in the output raster.

You can calculate density using simple or kernel calculations.

· In a simple density calculation, points that fall within the search area are summed and then divided by the search area size to get each cell’s density value.

· The kernel density calculation works the same as the simple density calculation, except the points or lines lying near the center of a raster cell’s search area are weighted more heavily than those lying near the edge. The result is a smoother distribution of values.

Density surfaces are good for showing where point features are concentrated. For example, with census data, you may have a point representing the number of people in each town. Since all the people in each town do not live at the population point, by calculating density you can create a surface showing the predicted distribution of the population throughout the landscape.

Surface

Surface analyses are fairly intuitive and include:

· Slope – This is a measure of rate of change (e.g. steepness of slope). In fact it is the maximum rate of change between cells. Slope can be calculated in degrees or percent (slope % = rise / run).

· Aspect – Measured in degrees (azimuth 0-360), aspect indicates “direction of slope” (e.g. which way the hill is facing).

· Watershed – This function will determine the land area that drains to a specified point. As an example, you can point to a point along a creek and find the sub-basin area that drains to that point.

· Viewshed – The output from this function shows areas that are visible from a given observation point (or several points). The obvious use is to determine the aesthetic impact of developing (building houses, logging, road construction, etc.) a portion of the landscape. This analysis can also be used to placement of communications towers – as towers should be ‘in sight’ of one another.

· Hillshading (Illumination) – The hypothetical illumination of a surface can be determined using Hillshading. This is done by placing a light source (the sun) in the sky. This placement of the light source includes direction and height – both given in degrees (i.e. horizontal and vertical angle). The output is quite often combined with the elevation layer to enhance the visualization of the surface. It can also be used in analysis to determine areas in light/shadow at any given time of the day (and any time of the year).

Distance Analyses

Distance in the vector world was a fairly straightforward concept, examples include:

· measuring the distance between two (or more) points,

· determining the length along a road network

· creating buffers around features.

In raster we are still able to interactively measure the distance between points. We are not able to effectively measure distance in a network. We can however, create buffers; although now it is a two-step process: determine distance, then reclass. The first phase entails creating a ‘distance layer’. In this layer distance is a continuous variable. The cells contain the distance to the target feature – the further away the cell, the greater the value. The second step is simply a reclass. For example, reclass the distance layer such that all values <= 50 become 1 (this is the buffer), and all values > 50 become 0 (not the buffer).

The allocation function is used to delineate areas according to their proximity to other features. A simple example is to consider high schools in a town. The kids living in the catchment area for a given school are expected to attend that school. There are exceptions of course. The allocation function can be used to determine these catchment areas. This is especially useful when considering where to add a new school, and what affect a new school will have on the catchment areas.

Sometimes we need a modification of the straight-line distance. The most familiar case is the cost of movement. Consider hiking in the backcountry. It would certainly be shorter to walk in a straight line to your destination. But what if that led you straight up a cliff and then down an escarpment on the other side. It would be less effort (cost of movement) to walk around the rock out crop to the other side. This is an example of a cost surface. Distance is still a factor, but it is modified by a second factor. This second factor can be thought of as cost of travel (or friction to movement). This cost can be dollars (for road construction) or energy (for animal migration).

Determining the cheapest path (a.k.a. shortest path, or least-cost path) usually follows creating a cost surface. This is the route that will cost the least (least expensive or least energy) to get from one point to another.

Some interesting sites:

http://www.esri.com/news/arcuser/0403/files/landfill_1.pdf

http://www.esri.com/news/arcuser/0703/files/landfill_2.pdf

Raw Data

Calculation of elevation for shaded pixel