Classification
When we have quantitative data with many
features and a wide range of possible values (e.g. 1000 forest stands with an
age range of 0 500 years), it is beneficial to place these features into
categories. For example we may wish to
classify the 1000 forest stands into four categories: young, immature, mature
and old growth. If we colored these
categories with a green color ramp (light green for young, through to dark
green for old growth) we could view a map and easily discern the distribution
of age classes (e.g. where the old growth, or young plantations are).
Deciding to put map features into categories is
an easy one. However, how do you decide
which features go into which categories?
Several methods are outlined below.
Steps for placing features into categories:
a) first, put the
features in ascending order according to the attribute of interest (e.g. age)
b) second use one of
the schemes below to determine the class break point
·
natural breaks look for
the biggest gaps
·
quantile number of values
/ number of classes
·
equal interval range of
values (largest smallest) / number of classes
·
std. deviation (SD) the middle break point = the mean, then the classes
are determined by multiples of the SD (i.e. + or 1 SD, then 2 SD
note that you can also use a fraction
of a SD, + or 0.25* SD, then 0.5 * SD, then 0.75 * SD, etc.)
Quite obviously the user (you in this case)
decides where the break points are (e.g. young 0-20 yrs, immature 20-100 yrs,
mature 100-250 yrs and old growth >250 yrs).
This is often used when the categories are already predefined.
The
following descriptions are copied from the ArcMap Help facility.
Classes
are based on natural groupings inherent in the data. ArcMap identifies break
points by picking the class breaks that best group similar values and maximize
the differences between classes. The features are divided into classes whose
boundaries are set where there are relatively big jumps in the data values.
Each
class contains an equal number of features. A quantile classification is well
suited to linearly distributed data. Because features are grouped by the number
in each class, the resulting map can be misleading. Similar features can be
placed in adjacent classes, or features with widely different values can be put
in the same class. You can minimize this distortion by increasing the number of
classes.
This
classification scheme divides the range of attribute values into equal-sized
sub-ranges, allowing you to specify the number of intervals while ArcMap
determines where the breaks should be. For example, if features have attribute
values ranging from 0 to 300 and you have three classes, each class represents
a range of 100 with class ranges of 0100, 101200, and 201300. This method
emphasizes the amount of an attribute value relative to other values, for
example, to show that a store is part of the group of stores that made up the
top one-third of all sales. It's best applied to familiar data ranges, such as
percentages and temperature.
This
classification scheme shows you how much a feature's attribute value varies
from the mean. ArcMap calculates the mean values and the standard deviations
from the mean. Class breaks are then created using these values.
-------------------------------------
Example
Classify the following data set (12 values)
into 4 classes according to natural breaks, quantile, equal interval & std.
deviation:
Raw
Data
A |
24 |
B |
78 |
C |
3 |
D |
15 |
E |
37 |
F |
34 |
G |
69 |
H |
54 |
I |
2 |
J |
20 |
K |
41 |
L |
30 |
Step
1: Ascending Order
I |
2 |
C |
3 |
D |
15 |
J |
20 |
A |
24 |
L |
30 |
F |
34 |
E |
37 |
K |
41 |
H |
54 |
G |
69 |
B |
78 |
Step
2: Determine Break Points & Place into Classes
Natural
Breaks look for biggest jumps in value
I |
2 |
C |
3 |
D |
15 |
J |
20 |
A |
24 |
L |
30 |
F |
34 |
E |
37 |
K |
41 |
H |
54 |
G |
69 |
B |
78 |
Quantile there are 12 items to be placed in 4 classes
12 / 4 = 3 items per
class, therefore simply put the first 3 items
into class 1, the next 3
into class 2, etc.
I |
2 |
C |
3 |
D |
15 |
J |
20 |
A |
24 |
L |
30 |
F |
34 |
E |
37 |
K |
41 |
H |
54 |
G |
69 |
B |
78 |
Equal
Interval the range is values is 78 2 = 76
divide
this by 4 classes and you get an interval of 19.0
classes start at the lowest
value (i.e. 2) and increment by 19
class 1
goes from 2 to 21, class 2
then increments to 40, class 3 to 59 and class 4 to 78.
I |
2 |
C |
3 |
D |
15 |
J |
20 |
A |
24 |
L |
30 |
F |
34 |
E |
37 |
K |
41 |
H |
54 |
G |
69 |
B |
78 |
Standard
Deviation For this method there will be an even number
of categories (as there is an equal number of categories above and below the
mean value). In our case we will create
categories based on whole increments of the standard deviation (i.e. the mean plus
or minus 1 SD, then 2 SD, etc. until the data is all accounted for).
·
The mean = 33.9 and this
break point is indicated with the dark line in the table below
·
The SD = 23.9
and this
determines the break points above and below the mean
o
33.9 + 23.9 =
57.8
33.9 to 57.8 = a class
o
57.8 + 23.9 =
81.7 (beyond our largest value)
57.8 to 81.7 = a class
o
33.9 23.9 =
10
10 to 33.9 = a class
o
10 23.9 =
-13.9 (beyond our lowest number)
2 to 10 = a class
I |
2 |
C |
3 |
D |
15 |
J |
20 |
A |
24 |
L |
30 |
F |
34 |
E |
37 |
K |
41 |
H |
54 |
G |
69 |
B |
78 |