Location, location, location. It is important for businesses, organizations, planners to understand the utilize the spatial information that is collected. There are numerous functions in PostGIS that can allow for simple to advance spatial query. It is not always easy to convert between different data type. This guide will discuss how to download a regular JSON file (Not GeoJSON) from an API that contains longitude and latitude, modify the data and convert the data into PostGIS records.
The process in this guide will take place completely in Python. Using this process has the advantage of being repeatable and clearly documented what steps took place. Compared to manually downloading the data, clean-up the data, and import into PostGIS, it is harder to explain to someone else what exactly was done. Completing the process in code can also allow you to adapt the code written to a similar task later on.
Script: You can reference this script during the tutorial
Requirement: Python 3 installed on your local machine
Required library for Python 3.6:
urllib.request (request web resource)
- pandas (read JSON response from the web)
geopandas(store spatial data in dataframe)
- shapely (create geometry type using Lat and Long)
- geoalchemy2 (create PostGIS readable format for spatial data)
sqlalchemy(create engineto connect to PostGIS)
conda install -c conda-forge geopandas conda install gdal=2.2.4 pip install geoalchemy2 // here are the lines that worked for me
Warning: Setting up the Python environment could be the most time consuming and frustrating part of the process. Conda is a python package manager that hopefully will make your life easier. It is almost impossible to set up a guide for setting up the environment since everyone’s is different. Google is your best friend if you run into issues.
Finding a data source
There are many data source that provides geospatial data. Many local, state and federal government have data portals that host data and allows its citizen to download them free of charge. Some of the examples including data.gov, census.gov. These data portals provide data ranging from transportation, demographics, housing, etc. There are also private companies that provide location data streaming, like Twitter and Google. For this guide, we will learn to get data from one of the government data portal using its API (Application Programming Interface). API provides programmers a way for systems to communicate with each other.
For this guide, we will use the City of Chicago Data Portal as an example. It contains numerous categories of data. Let’s examine the bicycle rack data in the Transportation category. Simply search for ‘bicycle racks’ will take you to the correct data set. Click on it and you will see the map displaying the bicycle map.
What we want to do is extract the data information. Instead of downloading it as a Shapefile or JSON file, we will create a script to query the data directly from Python. However, we will need to know where the resource is. Click on Export > Select SODA API. The link under API End Point is the linked we want to make a copy of.
Writing the Python Script
Requesting the Data from API
Create a new file with the following. The following codes use the urllib library to read the JSON data from the source. Pandas dataframe read the JSON response and create a Pandas dataframe from it.
You can run the script by opening up your terminal and change directory to where your script is located and use the following commands. You should see similar output like this in your terminal.
Modifying the Data and Create Geometry
Now our python object has a Pandas data frame from the JSON file. We will create a geometry column based on the information provided and modify the data. The data supply the longitude and latitude information and we will use it to create a Point geometry with the Shapely library. In the ‘create_gdf’ method, longitude and latitude column is grouped together and transform into a Sharply Point object. Then the Geopandas library can use the ‘
GeoDataFrame can be used to perform spatial operations on its own. To find out more about how it works, visit the official Geopandas website. Once the Geopandas dataframe is created. A method was included to remove columns of the GeoDataFrame by supplying a list argument. You can include more methods to manipulate the data before importing it into the PostGIS database. There are numerous methods you can use from the Pandas library.
Import Data into PostGIS
Now the data is cleaned and ready to be imported. We will use two different libraries to help us connect to the PostGIS database and import the data. These two libraries are
HINT: Make sure the schema in the PostGIS database already exists and the user has the read and write privilege permission for that schema.
The ‘create_wkt_element’ function transform a geometry into a Well Know Text (WKT) and apply the projection of 4326 (WGS 84). The ‘transform_to_wkt’ method than uses the function created earlier to apply to all the rows in the GeoDataFrame.
Lastly, the ‘to_postgis’ method uses the engine created and connect to the PostGIS database and write the data.
The script should run and import the data into the PostGIS database. To double check, open up pgAdmin and connect to the PostGIS database. You should see that under the ‘bicycle’ schema, there is a table name ‘chicago_rack’.
In this guide, we have successfully converted a JSON file online into rows in our PostGIS database. We connected to a web API using Python. Then download the JSON data from the API. Clean up the data and then import it to the PostGIS. You can adapt the process to work with different type of spatial data or apply