Note that this site is in currently in version 1.0.0-alpha.   Some functionality may be limited.

6. Combining Data Through a Spatial Join

Image detailing the process of combining data in a "spatial join" Since our neighborhood data are in two separate files, we’ll need to perform a “spatial join” to combine them into one file. During the spatial join we will add the data on percent black by neighborhood that’s in our CSV to the neighborhood shapefile, which is a polygon layer. Remember that only vector data (points, lines and polygons) or raster data (pixels) is geographic data. Also, remember that vector data is typically stored as a shapefile, so when you have a shapefile, you know it’s geographic data. A CSV is not geographic data, even if it might have geographic data in it, such as addresses. A CSV is a text file, so it needs to be combined with geographic data to be visualized on a map. This is why we will need to use a spatial join.

What’s a “Spatial Join”?

Well, I’m glad you asked because a spatial join is one of the most common GIS operations! There are two types of spatial joins—spatial join by attribute and spatial join by location. Both of them are ways that the mapping software will let you add data from one map layer or file to another map layer. A spatial join by attribute is used when you want to join non-spatial data, such as a text file, to spatial data, such as a shapefile. A spatial join by location is used when you want to join two layers of spatial data (e.g. a points layer to a polygon layer). Let’s say you are working with the map of the U.S. used in the introduction and you want to aggregate information at the city level (the point layer) to the level of the state (the polygon layer). For this you will use a join by location since you are comparing two layers with spatial data.

Since we are joining a CSV file (non-spatial data) with a shapefile (spatial data) we will need to use a join by attribute.

In order to do this both files need to be the same resolution (e.g. NYC neighborhood). A resolution is the scale at which the data is aggregated and displayed. Additionally, both files need to have a column with the same unique identifiers—this will serve as a key to match the two data files. When you work with data from the government, such at Census data, each geographical unit (e.g. each different neighborhood) will be given a unique identifier, so if both your data and your shapefile are from the government then the unique identifiers will match.

Now that we have the concepts for what a spatial join means, we can move on to performing the spatial join on our data.

Challenges for lesson 6

Questions

Try again!

A spatial join by attribute is used when you want to join which combination of layers:

(Select one of the following)

Terms Used in Lesson

Can you define the terms below? Hover over each of them to read a preview of the definitions.

Polygon

a figure with three or more sides. In GIS, it usually refers to complex areas outlining lakes, city blocks, a set of buildings, or other complex features of the map, that can be outlined using …

See term page

Resolution

the scale at which the data is aggregated and displayed. For raster data this will be the size of the pixel. For vector data it is referring to the scale or mapping unit (e.g. neighborhood …

See term page

Spatial Join

a GIS operation that affixes data from one feature layer’s attribute table to another from a spatial perspective. Spatial joins begin by selecting a target feature and comparing it spatially to other feature layers. There …

See term page

Workshop overall progress