Simple w​a​y to use Python to download, modify, and import JSON into PostGIS

Location, location, location. It is important for businesses, organizations, planners to understand the utilize the spatial information that is collected. There are numerous functions in PostGIS that can allow for simple to advance spatial query. It is not always easy to convert between different data type. This guide will discuss how to download a regular JSON file (Not GeoJSON) from an API that contains longitude and latitude, modify the data and convert the data into PostGIS records.

The process in this guide will take place completely in Python. Using this process has the advantage of being repeatable and clearly documented what steps took place. Compared to manually downloading the data, clean-up the data, and import into PostGIS, it is harder to explain to someone else what exactly was done. Completing the process in code can also allow you to adapt the code written to a similar task later on.

Project

Script: You can reference this script during the tutorial

Requirement: Python 3 installed on your local machine

Required library for Python 3.6:

  • url lib.request (request web resource)
  • pandas (read JSON response from the web)
  • geopandas (store spatial data in dataframe)
  • shapely (create geometry type using Lat and Long)
  • geoalchemy2 (create PostGIS readable format for spatial data)
  • sqlalchemy (create engine to connect to PostGIS)
conda install -c conda-forge geopandas
conda install gdal=2.2.4
pip install geoalchemy2

// here are the lines that worked for me

Warning: Setting up the Python environment could be the most time consuming and frustrating part of the process. Conda is a python package manager that hopefully will make your life easier. It is almost impossible to set up a guide for setting up the environment since everyone’s is different. Google is your best friend if you run into issues.

Finding a data source

There are many data source that provides geospatial data. Many local, state and federal government have data portals that host data and allows its citizen to download them free of charge. Some of the examples including data.gov, census.gov. These data portals provide data ranging from transportation, demographics, housing, etc. There are also private companies that provide location data streaming, like Twitter and Google. For this guide, we will learn to get data from one of the government data portal using its API (Application Programming Interface). API provides programmers a way for systems to communicate with each other.

For this guide, we will use the City of Chicago Data Portal as an example. It contains numerous categories of data. Let’s examine the bicycle rack data in the Transportation category. Simply search for ‘bicycle racks’ will take you to the correct data set. Click on it and you will see the map displaying the bicycle map.

Bicycle racks map in Chicago

What we want to do is extract the data information. Instead of downloading it as a Shapefile or JSON file, we will create a script to query the data directly from Python. However, we will need to know where the resource is. Click on Export > Select SODA API. The link under API End Point is the linked we want to make a copy of.

SODA API End Point for Chicago Data Portal

Writing the Python Script

Requesting the Data from API

Create a new file with the following. The following codes use the urllib library to read the JSON data from the source. Pandas dataframe read the JSON response and create a Pandas dataframe from it.

Request JSON file from Chicago Data Portal API

You can run the script by opening up your terminal and change directory to where your script is located and use the following commands. You should see similar output like this in your terminal.

python json_postgis_tutorial.py
First 5 rows of the Dataframe

Modifying the Data and Create Geometry

Now our python object has a Pandas data frame from the JSON file. We will create a geometry column based on the information provided and modify the data. The data supply the longitude and latitude information and we will use it to create a Point geometry with the Shapely library. In the ‘create_gdf’ method, longitude and latitude column is grouped together and transform into a Sharply Point object. Then the Geopandas library can use the ‘geom‘ column to create a GeoDataFrame.

GeoDataFrame can be used to perform spatial operations on its own. To find out more about how it works, visit the official Geopandas website. Once the Geopandas dataframe is created. A method was included to remove columns of the GeoDataFrame by supplying a list argument. You can include more methods to manipulate the data before importing it into the PostGIS database. There are numerous methods you can use from the Pandas library.

Create GeodataFrame and clean up the data

Import Data into PostGIS

Now the data is cleaned and ready to be imported. We will use two different libraries to help us connect to the PostGIS database and import the data. These two libraries are sqlalchemy and geoalchemy2. Sqlalchymy establishes a connection with the PostGIS database and geoalchemy2 tells the database how to read our geospatial data.

HINT: Make sure the schema in the PostGIS database already exists and the user has the read and write privilege permission for that schema.

The ‘create_wkt_element’ function transform a geometry into a Well Know Text (WKT) and apply the projection of 4326 (WGS 84). The ‘transform_to_wkt’ method than uses the function created earlier to apply to all the rows in the GeoDataFrame.

Lastly, the ‘to_postgis’ method uses the engine created and connect to the PostGIS database and write the data.

Final Check

The script should run and import the data into the PostGIS database. To double check, open up pgAdmin and connect to the PostGIS database. You should see that under the ‘bicycle’ schema, there is a table name ‘chicago_rack’.

Double checking the data

Conclusion

In this guide, we have successfully converted a JSON file online into rows in our PostGIS database. We connected to a web API using Python. Then download the JSON data from the API. Clean up the data and then import it to the PostGIS. You can adapt the process to work with different type of spatial data or apply different cleanup methods that suite your needs. Leave a comment below if you have any suggestion!

Share the knowledge

Leave a Reply

Your email address will not be published. Required fields are marked *