Using PostGIS to answer geodata questions

One of the biggest challenges when working with large sets of data is to find the least costly workflow that you have to follow in order to get the most accurate answers.

Let’s say you have a huge dataset composed of all sorts of geometry features (points, lines, areas etc.) and you want to do a bit of cleaning – because messy and redundant information is no fun!

So you might be thinking “Hmmm… which are the areas that have an unnecessary high density of points?”

The same issue can arise when working with OpenStreetMap data. This can be easily solved using PostGIS and a command line tool that we’ve created and using.

Note: The following steps require a Linux environment, Postgresql 9.x, PostGIS 2.x, Osmosis 0.43+, QGIS 2.12.2+

Getting the data

Download an *.osm.pbf file using command line:

 wget https://s3.amazonaws.com/metro-extracts.mapzen.com/san-francisco_california.osm.pbf

This is the metro extract for San Francisco, provided by Mapzen. Geofabrik is also a very good resource for OSM data extracts.

In the same folder, download SCOPE – databaSe Creator Osmosis Postgis loadEr.

wget https://github.com/baditaflorin/osm-postgis-scripts/blob/master/scope.sh

Make sure to set the file to be executable by using

chmod +x scope.sh

Load the data

Using SCOPE and following the instructions on the screen, load the *.osm.pbf into a database.

SCOPE automatically creates the database with hstore and postgis extensions and the pgsnapshot schema.

Play with the data

Now that you have the data set up, you can easily query it using the DB Manager from QGIS and some PostGIS scripts.

Interesting examples

For example, using the find_duplicate_nodes query, we can see that this building (@20.805088941495338, -104.92877615339032), appears on the same spot 23 times!

duplicate_building

The one next to it (@20.8054225, -104.9278152) appears 22 times!

duplicate_building_2

The node density for these areas (@20.4411867, -97.3172739) is too high – 168 nodes!

nodes1

Also, 171 nodes for a small fence segment (@46.7487683, 23.559687)!

fence

the-node-density

Feel free to fork the GitHub repository and modify the code to suit your needs! Also, if you feel insipred, you can suggest a better and shorter name or acronym for SCOPE!

Facebooktwittergoogle_plus

Author: mihai

Get in touch with me at twitter.com/ubermih