Classification

When we have quantitative data with many features and a wide range of possible values (e.g. 1000 forest stands with an age range of 0 – 500 years), it is beneficial to place these features into categories. For example we may wish to classify the 1000 forest stands into four categories: young, immature, mature and old growth. If we colored these categories with a green color ramp (light green for young, through to dark green for old growth) we could view a map and easily discern the distribution of age classes (e.g. where the old growth, or young plantations are).

Deciding to put map features into categories is an easy one. However, how do you decide which features go into which categories? Several methods are outlined below.

Steps for placing features into categories:

a) first, put the features in ascending order according to the attribute of interest (e.g. age)

b) second use one of the schemes below to determine the class break point

· natural breaks – look for the biggest gaps

· quantile – number of values / number of classes

· equal interval – range of values (largest – smallest) / number of classes

· std. deviation (SD) – the middle break point = the mean, then the classes are determined by multiples of the SD (i.e. + or – 1 SD, then 2 SD … note that you can also use a fraction of a SD, + or – 0.25* SD, then 0.5 * SD, then 0.75 * SD, etc.)

User Defined Breaks

Quite obviously the user (you in this case) decides where the break points are (e.g. young 0-20 yrs, immature 20-100 yrs, mature 100-250 yrs and old growth >250 yrs). This is often used when the categories are already predefined.

The following descriptions are copied from the ArcMap Help facility.

------------------------------------

Natural Breaks

Classes are based on natural groupings inherent in the data. ArcMap identifies break points by picking the class breaks that best group similar values and maximize the differences between classes. The features are divided into classes whose boundaries are set where there are relatively big jumps in the data values.

Quantile (also quartile or percentile)

Each class contains an equal number of features. A quantile classification is well suited to linearly distributed data. Because features are grouped by the number in each class, the resulting map can be misleading. Similar features can be placed in adjacent classes, or features with widely different values can be put in the same class. You can minimize this distortion by increasing the number of classes.

Equal Interval

This classification scheme divides the range of attribute values into equal-sized sub-ranges, allowing you to specify the number of intervals while ArcMap determines where the breaks should be. For example, if features have attribute values ranging from 0 to 300 and you have three classes, each class represents a range of 100 with class ranges of 0–100, 101–200, and 201–300. This method emphasizes the amount of an attribute value relative to other values, for example, to show that a store is part of the group of stores that made up the top one-third of all sales. It's best applied to familiar data ranges, such as percentages and temperature.

Standard Deviation

This classification scheme shows you how much a feature's attribute value varies from the mean. ArcMap calculates the mean values and the standard deviations from the mean. Class breaks are then created using these values.

-------------------------------------

Example

Classify the following data set (12 values) into 4 classes according to natural breaks, quantile, equal interval & std. deviation:

Raw Data

A	24
B	78
C	3
D	15
E	37
F	34
G	69
H	54
I	2
J	20
K	41
L	30

Step 1: Ascending Order

I	2
C	3
D	15
J	20
A	24
L	30
F	34
E	37
K	41
H	54
G	69
B	78

Step 2: Determine Break Points & Place into Classes

Natural Breaks – look for ‘biggest jumps’ in value

I	2
C	3
D	15
J	20
A	24
L	30
F	34
E	37
K	41
H	54
G	69
B	78

Quantile – there are 12 items to be placed in 4 classes … 12 / 4 = 3 items per class, therefore simply put the first 3 items into class 1, the next 3 into class 2, etc.

I	2
C	3
D	15
J	20
A	24
L	30
F	34
E	37
K	41
H	54
G	69
B	78

Equal Interval – the range is values is 78 – 2 = 76 … divide this by 4 classes and you get an interval of 19.0 … classes start at the lowest value (i.e. 2) and increment by 19 … class 1 goes from 2 to 21, class 2 then increments to 40, class 3 to 59 and class 4 to 78.

I	2
C	3
D	15
J	20
A	24
L	30
F	34
E	37
K	41
H	54
G	69
B	78

Standard Deviation – For this method there will be an even number of categories (as there is an equal number of categories above and below the mean value). In our case we will create categories based on whole increments of the standard deviation (i.e. the mean plus or minus 1 SD, then 2 SD, etc. until the data is all accounted for).

· The mean = 33.9 and this break point is indicated with the dark line in the table below

· The SD = 23.9 … and this determines the break points above and below the mean …

o 33.9 + 23.9 = 57.8 … 33.9 to 57.8 = a class

o 57.8 + 23.9 = 81.7 (beyond our largest value) … 57.8 to 81.7 = a class

o 33.9 – 23.9 = 10 … 10 to 33.9 = a class

o 10 – 23.9 = -13.9 (beyond our lowest number) … 2 to 10 = a class

I	2
C	3
D	15
J	20
A	24
L	30
F	34
E	37
K	41
H	54
G	69
B	78