PostGIS ST_Within example and the Python equivalent

Seemingly simple questions like how many Starbucks are in New York or how many elementary schools are in my neighborhood can be answered by a simple GIS query. PostGIS provides the ST_Within function that will answer that geospatial question. PostGIS is a spatial extension of the Postgresql relational database. The PostGIS extension has many spatial queries built in that can answer different types of question. In this article, we will look at the ST_Within function. The syntax of ST_Within looks like this.

ST_Within(geometry A, geometry B)

The function determines rather geometry A is completely inside geometry B. The function returns True if that is the case. This can be used to count how many geometries are inside a large geometry. Let’s look at an practical example and see how it is used.

Example

Chicago data portal provides spatial data about the city. This example will determine how many bicycle racks are in each neighborhood. The bicycle racks and neighborhoods tables are already imported in the PostGIS database. You can find the steps on how to download bicycle racks and neigborhood in the heperlinks.

Bicycle racks in the City of Chicago

The map shows bicycle racks scatter across the city in each neighborhood. It is impractical and will be incorrect to count them all by hands. The fastest way is to write a database query that will count them for us. This guide assumes you have some basic knowledge writing SQL queries. You should go through a tutorial first if you’re unfamiliar with SQL.

Writing SQL in pgAdmin

First, we will construct a working SQL query in pgAdmin that calculates the number of bicycle racks within each neighborhood. To do that open up pgAdmin in your browser. You can learn more about how to set up a local PostGIS database here.

Once you have pgAdmin open, select the database that contains your data. Next, click on Tools and then click on Query Tool. This will bring up an empty text box tab where you can start writing your query.

Select the Query Tool to start writing queries

The next step of the process is constructing the SQL query. The following query selects the name of the neighborhood and counts how many racks are in each neighborhood. First, the query joins the bicycle rack with the neighborhood if the rack is within the neighborhood. Next, the query groups the neighborhoods together and count how many bicycle racks are within each. Finally, we can order by the number racks in a descending order.

SELECT
    neighborhood.community,
    COUNT(rack.rackid) AS rack_count
FROM boundary.neighborhood AS neighborhood
JOIN bicycle.rack AS rack
    ON ST_Within(rack.geom, neighborhood.geom)
GROUP BY neighborhood.community
ORDER BY COUNT(rack.rackid) DESC

Enter the above query in the text box, then hit F5, or click on the lightning button to run the query. You might have to adjust the schema, table, or field name. Then, you will see the following result. The output shows that West Town neighborhood has 172 bicycle racks, the loop has 127 racks and so on.

Result of the query

Using Shapely and GeoDataFrame to count points within polygons

Another way to calculate how many racks are within each community is to use a python library Shapely. Data do not always exist in PostGIS and it might be more trouble to load data into a PostGIS database just to perform basic spatial operations.

To determine how many points are within a polygon, we will use the within method from the Point object in Shapely. Here is an example of how to use it.

from shapely.geometry import Point, Polygon

coords = [(0, 0), (0, 1), (1, 1), (1, 0), (0, 0)]
print(Point(0.5, 0.5).within(Polygon(coords)))
// True

From the example above, you can see that a Point and a Polygon object were constructed. The Point object has a method that takes a Polygon object as the argument. It returns True if it is within the Polygon and False otherwise. We can use this method to calculate how many racks are within each community.

The complete solution is hosted on Gitlab. First, we created two GeoDataFrame representing the polygons and points. Next, the code iterates through the polygons and points to determine if the point is within the polygon. If the point is within the polygon, we increase the count. Finally, at the end of each loop, the value is written to the polygon row. Here is a snippet of the main loop.

def points_within(self,
                  points,
                  out_col='within_count',
                  polygon_geom_col='geom',
                  point_geom_col='geom'):
    self.gdf[out_col] = 0
    for poly_index, data in self.gdf.iterrows():
        count = 0
        polygon_geom = data[polygon_geom_col]
        for point_index, data in points.gdf.iterrows():
            point_geom = data[point_geom_col]
            if point_geom.within(polygon_geom):
                count += 1
        self.gdf.loc[poly_index, out_col] = count

Finally, if we examine the result, it will look like this.

Result containing the count

Conclusion

There are different ways to find to answer to how many points are within polygons. In this post, we discussed using PostGIS ST_Within function and using a Shapely’s within method to find out the answer. Compare the two solution, using PostGIS’s ST_Within is much cleaner, require less code and the calculation was much quicker. Looping through GeoDataFrame is a much slower process and require more code to be written. However, if you need to integrate your analysis with other application, using the Python approach might be a better approach.

Share the knowledge

Leave a Reply

Your email address will not be published. Required fields are marked *