Expand
-
Collapse
Data Quality
Terms
Error / uncertainty
= reliability ... correctness
"what is real" vs. "what you got"
attribute: description/ measurement
= "what"
realm of the specialist/ discipline
e.g. cruising sampling error
location / position
= "where"
emphasis of this lecture
common to all disciplines ... maps
"error/ uncertainty" in location
absolute vs. relative position
absolute - position w.r.t. the datum (GPS location +- 5m)
relative - position w.r.t. other features
which is of greater concern?
i.e. mapping a wildlife tree to reserve
temporal
= "time"
is it still current? ... or did Jim log it ;-)
measurement scale
nominal
= "named", or categories
no comparison between features (Main Street vs. Commercial Drive)
ordinal
= categories that are "ranked" or "ordered"
can compare
captain vs. major vs. colonel vs. general
1) very poor - 2) poor - 3) "meh" - 4) good - 5) very good
but no set quantitative increments
major is not "twice as good as" captain
it's just a higher rank
likely biggest "jump" is from colonel to general
... NO MATH operations
interval
numeric values: intervals along a measured scale
zero does not = "nothing"
temperature is the classic example
add/subtract OK (ratios not meaningful)
15C & 30C
30C is 15C warmer
but it is not "twice as warm"
0C doesn't mean there is "NO" temperature
ratio
numeric values: intervals along a measured scale AND zero means zero
money, volume, trees/ha are examples
$5 & $10
$10 is $5 more than just $5
$10 is twice as much as $5
... and $0 means I have NO $$
most of our forestry measures (and map measures) are RATIO
primary v. secondary data
primary ... YOU collected it (GPS and inventory tree measures)
secondary ... "second hand", someone else collected it (e.g. gov't)
quality
how good is the data you bought?
... the data you collected and passed on to someone else?
metadata
data about data
who: source of data
how: collection methods
where: extents
when: date
other: proj/ datum
there are standards for this (boring) stuff
Resolution
= smallest feature mapped
raster -> pixel resolution
vector -> MMU (min. mapping unit, = smallest area mapped)
determined in data capture stage
in the field
inventory for Woodlot ... 2 ha
ecol. unit in a cut block ... 1/2 ha
satellite image resolution (= pixel size ... 100m ... 10m)
Completeness
project not yet finished
data unavailable (private vs. gov't; different country - Sorghum)
data not collected "at that scale" (i.e. towns <20,000 pop'n)
Consistency
"uniformity" ... collected in consistent manner
e.g. lake bdy
Compatibility
relates to combining data sets
is the new map layer compatible with the rest of the map layers?
scale & consistency are usually the determining factors
combining 2 maps ... at best error is now that of the map w/ greatest error
Applicability
refers to appropriateness of data to a situation
elevation & snow free days?
elevation & income?
Sources of Error (Uncertainty)
Data Modeling
the forest
perspective
ecologist
wildlife manager
forester
climate specialist (global warming)
environmentalist/ preservationist
mtn. biker
3 possible views (spatial entities)
individual trees = points
stands of similar trees = polygons
attribute (height) = continuous variable = surface
perception (needs) dictate
conceptual model used (discrete vs. field; ... point, polygon, surface)
how forest is measured
wildlife manager = perservationist = forester ???
logical model
raster
resolution sets limits for feature size
no topological relations (adjacency, connectivity, containment) possible
vector
forces 'hard boundaries'
hard to model "fuzzy boundaries" (remember "wetness")
Original Data Capture
data sources
field survey
instrument error
technique
GPS
MMU
satellite images - resolution, 30m or 1m
"nature of the boundary"
distinct (forest/ agriculture)
gradual (change in forest up hill ...)
consistency of capture
density of observations
spot hts (peaks/ valleys?)
enough plots to meet sampling error
Data Encoding
paper map
original drafting errors
digitizing
poor tracing
hardware limitations
Data Integration
raster - vector conversion (V -> R -> V)
sliver polygons (integrate soils and veg. map)
edge-matching
align features that cross adjacent mapsheets
rubbersheeting
stretch/shrink common features when combining in overlay fashion
e.g. city boundary = outer lines of all subdivisions
e.g. lake boundary when combining wetland and stream maps
Analysis
vector overlays ... can produce slivers
"estimators": spatial interpolation, density, etc.
diff. methods (IDW vs.spline)
diff. input parameters (power 2 or 3, search area, etc.)
Output
generalization
why ... scale / audience
examples
simplification
smoothing
aggregation (combination)
exaggeration
displacement
... can result in positional inaccuracies
map scale
pen width often ~1mm thick ...
1mm on a 1:20,000 map?