Introduction to Database (for GIS)

 

 

What is a Database?

A database is simply a collection of related data that is organized.  This may come in the form of a set of index cards (e.g. a recipe collection), an ordered list (e.g. a phone book), or a set of tables in the computer (e.g. student records at VIU).

 

Regarding ArcGIS: descriptive data for mapped features are contained in attribute tables.

  

What is a DBMS?

A DBMS is a database management system – it is the software to store/ organize, update & retrieve data from a database.  Note that they are separate entities – the data is separate from the software.  Examples of DBMS’s include: dBase IV, FoxPro,  MS Access, and Oracle.

 

Basics of a Database Table

  1. Record (tuple)

  2. Field Name

  3. Field

  4. Unique identifier

  5. Primary key / key field / foreign key

  6. Field types

 

Understanding the basics of the descriptive data in a GIS is fairly straightforward.  It is organized in tables (see below).  Each table contains data about a collection of related features, e.g. contact information for students. 

  1. All of the information about one feature in the table is called a record - records happen to be arranged in rows.
     

  2. Categories of information are arranged in columns.  A column heading (Name) is called a field name in database jargon.  Sometimes the whole category is referred to as a field.
     

  3. A value is a single piece of information.  For example, “321-1234” is a field (piece of information) – it is in fact the phone number for John.   Sometimes a single piece of information is called a field ...

    ... So yes, there is some ambiguity regarding the term field as it can refer to the category of data OR an individual data entry
     

  4. One of the fields (usually the first or second column) contains a unique identifier for each record (e.g. your SIN number for the Canadian government, or your Student Number for VIU).  The unique identifier (Student ID) for Kramer is 99 ˝. 
     

  5. The field that contains unique identifiers is known as the key field or primary key.  In Table 1 the Student ID would be the key field (note that there are two students named George, so "Name" could not be the key field).  A foreign key is a corresponding field in a second data table - discussed later.

Table 1: Basic Student Data

Student ID

Name

Phone

Program

39

John

321-1234

Music

45

Paul

321-2345

Music

56

George

321-3456

Music

88

Ringo

321-4567

Music

89

Jerry

555-1111

 Fine Arts

99 ˝

Kramer

555-1111 (lv msg with Jerry)

Space

123

George

555-5959

Undeclared

119

Elaine

555-9876

Arts

200

Doug

111-2231

Forestry

 

 

Data Field Types:

  1. Text (String) – contain alpha-numeric entries, like names, address, phone numbers & Student ID – note that phone numbers & Student ID’s seem like numbers, but mathematical operations make no sense

  2. Integer – are whole numbers (with no decimal) and can be of two types:

    1. short integer - stores up to 4 digits with NO decimal places (e.g. 9876)

    2. long integer - stores up to 9 digits with NO decimal places (e.g. 987654321)

  3. Real numbers - have a decimal portion and can be of two types:

    1. Floats (single precision) for up to 7 significant digits, e.g. 123456.7 and 12.34567

    2. Double (precision) for up to15 significant digits

  4. Date – date and/or time (mm/dd/yyyy, hh:mm:ss with AM/PM)

  5. Object (Blob) - [binary large object] can be nearly any type of digital file, often an image

 

Technical stuff regarding ArcGIS.  When you add a field to an attribute table you are asked for the type (as described above) and then, depending on the type, you will be asked to define some parameters:

  1. Text (String)
        Length is the maximum number of characters (e.g. for (250) 753-3245 the length would be 14, includes the space and special characters)

  2. Integer - short (4 digits) or long (9 digits)
        Precision there is NO need to specify precision for integers

  3. Floats & Double
        Precision is the maximum number of digits on both sides of the decimal place, including a "minus" sign
        Scale is the number of decimal places (digits to right of decimal)
        e.g. -123.456 has precision = 7 and scale = 3
        use Float if precision is 6 or less; use Double if precision is 7 or more

 

** Note, once you define the parameters of a field you cannot change them (i.e. if you create a Phone field and define it as Integer, you cannot later change it to Text.  You'd have to accept it as is or delete it and create it again).

 

 

Database Creation - 4 Steps:

  1. Needs analysis / Data investigation: "what is it that you want described?"  Determine the entities and attributes of interest (e.g. entities are students and attributes include contact information: name, address, phone; financial information: tuition, amt paid, amt owing; academic record: courses enrolled, grades, etc.).  Also determine the relationships (if any) with other entities of interest (e.g. the relation between students & department, or students & scholarships).
     

  2. Logical design / Data model: determine the attribute data types (e.g. name is text, tuition is a real number) and the key fields required to define the relationships between the entities (e.g. Student ID will provide a link between several tables)  [Relationships between entities is described later.]
     

  3. Physical design / Database design: each database software program is different, thus how exactly you create the fields and tables will vary.  This is essentially sitting down at the computer and defining/creating the empty database tables.
     

  4. Database Implementation: do it!  Enter the data (or, in database jargon, "populate the database").  This also entails error checking and updates to keep current.

 

 

Basic Analyses: Sort & Query

A couple of basic functions you can do with database tables are sort and query.  Tables can be sorted in ascending (smallest ŕ largest) or descending (largest ŕ smallest) order.  Sorting simply means reordering the items in the table.  You simply pick the field and then sort – the records are then re-ordered, based on the values of your selected field of interest. 

 

Regarding ArcGIS: in order to sort an attribute table you first right-click on the field name.

 

A query, also known as a filter, is a “question” where the user sets criteria in the form of a logical expression and the result is only those records that meet the criteria.  For example, consider the data table containing contact information about students.  Fields include Name, Phone and Program.  A simple query for

    Program = Music

would yield a short list of students who are majoring in music. 

 

Note that a simple query has three parts:

  1. Field name (Program)

  2. Operator (=, also >, >=, <, <=, <>)

  3. Value (Music)

 

You can also set more than one criteria.  Boolean operators, such as AND, OR & NOT are used to combine simple queries.  Consider the following two queries:

Program = Music AND Name = George

Only students in music named George would be short listed, e.g. George Harrison

 

Program = Music OR Name = George

Then the short list would contain all the music students plus all the students named George – in this case you would have the Fab Four plus George Costanza (now there’s a group!).

 

Program = Music NOT Name = George

Then the short list would contain only Paul, John and Ringo

OR broadens the search since either of the criteria is sufficient

AND narrows the search since both criteria have to be met

NOT narrows the search by excluding records that meet the second criteria

 

GIS Context

In a GIS map layer each of the features (e.g. a forest stand, or a lake or a section of road) would have one record in the database table.  The GIS software automatically manages this feature-to-database association.  Every time we digitize a feature (e.g. a lake), the GIS software automatically adds

i) a new feature in the map base (i.e. the coordinates of the lake), and

ii) a new record to a database table (e.g. a new row in the Lake data table). 

So once we are finished digitizing 101 lakes in our Lakes map layer, there are 101 features in the map base, each with a corresponding row in the attribute table (thus we have 101 polygons and a table with 101 records).  Once the digitizing is done we can enter the descriptive data.  Lucky for us there often already exists a comprehensive database table for the features we just digitized.  With a database approach we can directly utilize this data.  This is done by relating tables.

 

 

Database Relations: Joining & Linking

*       Primary key field / secondary (foreign) key field

*      Relations: one-to-one, many-to-one, one-to-many

*      Destination table / (outside) source table

*      Relating tables: join / link

 

Most databases contain several data tables.  Consider a college database; it would likely contain several tables of data (refer to the four tables at the end of the document).  Some of these tables can be directly related to one another.  Relations are determined by having a field in common.  These fields are key fields, as they should contain unique identifiers for the features.  Refer to tables 1 and 2 below; both contain a field with student ID.  In this case, for every record in Table 1: Student Contact there is one record in Table 2: Student Finances.  However, note that the relationship between Table 1: Student Contact and Table 4: Scholarships is not a simple one-to-one relationship, as Paul has won two scholarships.

 

We will next discuss three relations between database tables:

*      One-to-One

*      Many-to-One

*      One-to-Many

 

These terms describe the relationship between the records in two tables.  One of the tables will be the table we are working with and the second table is considered ‘an outside source’.  The first table is called the destination and the second is called the outside source.

 

When we relate two tables we have to indicate the fields that have the corresponding unique identifiers (e.g. student ID or name).  We use the term primary key field for the destination table and secondary or foreign key field for the outside source table.  The data contained in the primary field and secondary field should correspond to one another.

 

Regarding ArcGIS: the attribute table for our map (data layer) is always the destination table.  And we want to bring in existing data from an outside source (i.e. "source table") to our map.

 

In a one-to-one relation, each record in the destination table corresponds to one (and only one) record in the outside source table.  For example, refer to Table 1 and Table 2. Both tables contain a field for Student_ID. 

 

In a many-to-one relation, many records in the destination table correspond to one record in the outside source table.  As an example consider the destination is Table 1: Student Contact Data and the source is Table 3: Departments.  There are typically many students in one program (department).  Note that the field names are different (Program vs. Department).  It isn't the field name that needs to match ... it is the data within the cells that need to match. 

 

In a one-to-many relation, one record in the destination table corresponds to many records in the outside source table.  This relation would occur if Table 1 was the destination and Table 4 was the source.  Note that Paul won two scholarships.

 

The last topic is that of “connecting” tables together.  When there is only one record in the outside source table that relates to a record in the destination table (e.g. one-to-one and many-to-one) then the records from the source table are appended to the end of the destination table.  That is, it appears as though you get one big table that has all of the fields in the destination table PLUS all the fields in the source table.  This type of connection is called a join.

 

However, when there is more than one record in the outside source table that relates to a single record in the destination table the tables cannot be joined.  (This would be like trying to park several cars (from an outside source) into one parking stall (the destination).  In our example, if we tried to join Table 4 (the source table) to Table 1 we would have an issue.  There are two scholarships that would have to be appended to Paul’s record – but there is only space.  So instead of joining the tables they remain separate.  But the software “knows” there is a connection … so after linking/relating the tables, if we click on Paul in Table 1 his record would be highlighted, but so would the two records in Table 4 (the MCA and Sony scholarships).  This type of connection is called a link or a relate

 

 

Data Tables

 

Table 1: Student Contact Data

Student ID

Name

Phone

Program

39

John

321-1234

Music

45

Paul

321-2345

Music

56

George

321-3456

Music

88

Ringo

321-4567

Music

89

Jerry

555-1111

Fine Arts

99 ˝

Kramer

555-1111 (leave msg. with Jerry)

Space

123

George

555-5959

Undeclared

119

Elaine

555-9876

Business

200

Doug

111-2231

Forestry

 

 

Table 2: Student Finances

Student ID

Name

Tuition

Amount Paid

Owing

39

John L

3000

3000

0

45

Paul M

3000

3000

0

56

George H

3000

3000

0

88

Ringo S

3000

3000

0

89

Jerry S

2500

2500

0

99 ˝

Cosmo Kramer

3600

5.57

3594.25

123

George C

3000

0

3000

119

Elaine B

2800

2000

800

200

Doug C

4000

4000

0

 

 

Table 3: Departments

Department

# of Faculty

Student Capacity

Students Enrolled

Music

7

122

120

Arts

39

2800

2654

Business

8

120

109

Forestry

6

46

39

 

 

Table 4: Scholarships

Scholarship ID

Name

Winner

A1

MCA

Paul

A2

MM$

John

A3

Sony

Paul

B1

Time-Warner

Elaine