Seemingly simple questions like how many Starbucks are in New York or how many elementary schools are in my neighborhood can be answered by a simple GIS query. PostGIS provides the ST_Within function that will answer that geospatial question. PostGIS is a spatial extension of the Postgresql relational database. The PostGIS extension has many spatial queries built in that can answer
ST_Within(geometry A, geometry B)
The function determines rather geometry A is completely inside geometry B. The function returns True if that is the case. This can be used to count how many geometries are inside a large geometry. Let’s look at
Chicago data portal provides spatial data about the city. This example will determine how many bicycle racks are in each neighborhood. The bicycle racks and neighborhoods tables are already imported in the PostGIS database. You can find the steps on how to download bicycle racks and
The map shows bicycle racks scatter across the city in each neighborhood. It is impractical and will be incorrect to count them all by hands. The fastest way is to write a database query that will count them for us. This guide assumes you have some basic knowledge writing SQL queries. You should go through a tutorial first if you’re unfamiliar with SQL.
Writing SQL in pgAdmin
First, we will construct a working SQL query in pgAdmin that calculates the number of bicycle racks within each neighborhood. To do that open up pgAdmin in your browser. You can learn more about how to set up a local PostGIS database here.
Once you have pgAdmin open, select the database that contains your data. Next, click on Tools and then click on Query Tool. This will bring up an empty text box tab where you can start writing your query.
The next step of the process is constructing the SQL query. The following query selects the name of the neighborhood and counts how many racks are in each neighborhood. First, the query joins the bicycle rack with the neighborhood if the rack is within the neighborhood. Next, the query groups the neighborhoods together and count how many bicycle racks are within each. Finally, we can order by the
SELECT neighborhood.community, COUNT(rack.rackid) AS rack_count FROM boundary.neighborhood AS neighborhood JOIN bicycle.rack AS rack ON ST_Within(rack.geom, neighborhood.geom) GROUP BY neighborhood.community ORDER BY COUNT(rack.rackid) DESC
Enter the above query in the text box, then hit F5, or click on the lightning button to run the query. You might have to adjust the schema, table, or field name. Then, you will see the following result. The output shows that West Town neighborhood has 172 bicycle racks, the loop has 127 racks and so on.
Using Shapely and GeoDataFrame to count points within polygons
Another way to calculate how many racks are within each community is to use a python library Shapely. Data do not always exist in PostGIS and it might be more trouble to load data into a PostGIS database just to perform basic spatial operations.
To determine how many points are within a polygon, we will use the
within method from the Point object in Shapely. Here is an example of how to use it.
from shapely.geometry import Point, Polygon coords = [(0, 0), (0, 1), (1, 1), (1, 0), (0, 0)] print(Point(0.5, 0.5).within(Polygon(coords))) // True
From the example above, you can see that a Point and a Polygon object were constructed. The Point object has a method that takes a Polygon object as the argument. It returns True if it is within the Polygon and False otherwise. We can use this method to calculate how many racks are within each community.
The complete solution is hosted on Gitlab. First, we created two GeoDataFrame representing the polygons and points. Next, the code iterates through the polygons and points to determine if the point is within the polygon. If the point is within the polygon, we increase the count. Finally, at the end of each loop, the value is written to the polygon row. Here is a snippet of the main loop.
def points_within(self, points, out_col='within_count', polygon_geom_col='geom', point_geom_col='geom'): self.gdf[out_col] = 0 for poly_index, data in self.gdf.iterrows(): count = 0 polygon_geom = data[polygon_geom_col] for point_index, data in points.gdf.iterrows(): point_geom = data[point_geom_col] if point_geom.within(polygon_geom): count += 1 self.gdf.loc[poly_index, out_col] = count
Finally, if we examine the result, it will look like this.
There are different ways to find to answer to how many points are within polygons. In this post, we discussed using PostGIS ST_Within function and using a