New Features and Enhancements in Cygnus+

Cygnus is the Telenav Mapping conflation tool. We use it a lot internally to compare approved external data sources with existing OSM data, but there is also a public version. We outlined how it works in an earlier blog post. In this post, I want to highlight some of the newer features in Cygnus. These new features are based on the feedback from our team of Map Analysts, who use the tool in their day to day work.

Discarding Very Short Segments

Cygnus outputs the differences in geometry between existing OSM data and the spatial data that we want to use to improve OSM. Sometimes, when the differences are very tiny, Cygnus used to export very short ways. These are not really meaningful enhancements, and clutter up the result data. Therefore, we implemented a length filter. Ways shorter than a defined length threshold will not be included in the output. Based on experience, we set the default to 5 meters. In the internal (command line) version our team uses, this can be tweaked using a parameter. In the public web version, this is not yet possible. We can consider adding it if there is sufficient demand.

An example of Cygnus in action. It finds an opportunity for improvement (possibly incorrect street name) as well as a false positive (degraded road geometry)

Road Names

When comparing road geometry, Cygnus not only compares geometry, but also road names. An annoying side effect we noticed is that road names are often not exactly the same in OSM as they are in the external data we compare with. This does not mean that the external data is necessarily better. For example, OSM could say that the name of a road is “River Road”, and the external data source could say it is “River Rd”. This is not a meaningful difference, and we would want to exclude those in most cases. So we added a string distance based  threshold in Cygnus to filter out similar strings. It is set to a sensible default which, again, can be tweaked in the command line version we use internally, but not yet in the web version.

Another Cygnus improvement related to road names is to ignore name differences on certain types of ways: roundabouts and service roads. Roundabout ways in OSM do not have names by convention, unless the roundabout itself has a name, so they should generally not be added. Service roads technically can have names in OSM, but it is not common. In external data, they do sometimes have names, but if they do, it usually does not make sense to add them to OSM. Based on our experience, they often have descriptive names like ‘driveway’ or ‘access road’ in the source data.

Using Cygnus

You can use Cygnus yourself by going to http://cygnus.improve-osm.org/ and uploading your source data file. You need to do a fair amount of work to prepare the source data: translating the source attributes into valid OSM tags, and converting to OSM PBF. And always remember to consider carefully what you do with the result. Cygnus is not designed to be an automated import tool. Every suggested change should be manually reviewed.

Let us know how you have used, or would like to use Cygnus!

Facebooktwittergoogle_plus

Cygnus – conflation at your fingertips!

This is a follow-up blogpost after the State Of the Map US 2017 conference held in Denver.

The process of conflation in GIS is defined as the act of merging two data layers to create one layer containing the features and attributes of both original layers.

Cygnus is a tool that compares external data with OSM, giving you a result file in JOSM XML format with all the changes. The comparison is made in a non-destructive way, so no OSM ways are ever deleted or degraded.

Workflow

NOTE – The license compatibility between the local data file and OSM has to be taken into account before adding anything in OSM. Also, please follow the OSM import procedures if you are planning to add external data to OSM.

First of all you need to have a shapefile with local data in WGS84 spatial reference. This shapefile has to be filtered in different ways, depending on the tags you want to compare. For example, if you want to compare oneways, make sure to have a flow-direction/oneway/etc. attribute in the shapefile.

Translation

The first thing that has to be taken care of is to assure a proper attribute translation. I created a simple example for this exercise. I don’t want to get neck-deep in too many technical details so the main focus remains the process as a whole. I kept the attribute information for this example straightforward:

In order to create an OSM file from this data, I wrote a simple translation file that will be used together with ogr2osm.

Next, run the below command to obtain the OSM file.

python ogr2osm.py simple_streets.shp -t simple_translation.py -o simple_output.osm

Finally, I converted the OSM file to PBF using osmosis, because Cygnus requires a PBF file as input.

Cygnus goes to work!

Now that you have gone through the pre-processing of the local data file, we can offer it to Cygnus for processing. Note that your upload needs to be small-ish – the spatial extent needs to be smaller than 50×50 km and the file needs to be 20MB or smaller in size.

The interface of the Cygnus service is very simple – there are just two pages:

  • the home page where you add new jobs
  • the job queue page where you can see your progress and download the result

If your input file was uploaded successfully, Cygnus will go to work. Your job will be added to the back of the queue. When it’s your turn, Cygnus will read your PBF file, and download the OSM data for the same extent, using Overpass API. It will then compare your upload with the existing OSM data and produce the output file that you can download from the job queue.

NOTE – Everyone’s jobs are listed here, so be careful not to touch other users’ stuff.

Process the output in JOSM

Once Cygnus gives us the output, we can open it in JOSM and inspect it. This is by far the most important, and time consumig, step. Even though Cygnus does a best effort to connect ways where needed, it acts conservatively so it will not snap ways together that do not belong together.

Here are a few ways that got properly connected to the existing highway=secondary:

But there are situations where the distance was too far so Cygnus did not snap:

In this case, you need to manually connect the ways if that is appropriate.

When you are finally satisfied with your manually post-processed conflation result, you can go ahead and merge it with the OSM data and upload it!

Facebooktwittergoogle_plus

Telenav presence in State of the Map Latam 2017

It has been only two years since the Latin American OSM community decided it was necessary to have a regional event. The first event took place in Santiago the Chile and the second one in Sao Paulo, Brazil. At the end of the second edition there were a few places considering to organize the third edition of the event. When I found out Lima was chosen I was very happy. Lima is one of the cities you never get tired to visit and you always discover new things, specially new amazing food 😉

This year Telenav had the chance to be sponsor. Every State of the Map is a gathering of different efforts that combining together keeps improving the data of the largest geospatial database in the world. This sponsor supported the attendance of Juliana Hernández (Peasent mappers- Tools for learn, teach and strengthen cartography) and Daniel Quisbert (Installing and configuring Offline OSM in Linux).  Both conferences where highly appreciated by the attendees.  Also Mapbox and Telenav co-hosted the first anniversary of Geochicas, it was a lovely evening in an historic bar from Lima in the one new geochicas from Perú and Colombia joined the group. We expect more companies collaborating together to reduce the gender gap in OSM. There are many ideas for what Geochicas will be doing in 2018, please check the full information here.

The keynote was given by Philipp Kandal (A journey with OSM from the past into the Future) In the one Philipp shared his experiences over the years with OSM data and how probe data and using machine learning tools the map its being improved in a tremendous way.

From my side I had  the chance to show the mapping efforts in Canada and the preliminary mapping results from Ecuador, a pilot project we are currently developing to improve road data in one LATAM country. By using Satellite imagery and OpenStreetCam data we started improving the OSM data in roads since September this year. Find the slides here. So far the results we have are the following:

Stay tuned for the updated results in the following months!

This is just the beginning of future collaborations Telenav would like to keep doing with Latam communities by providing tools and guidance in previous projects we have done.

Thanks to the organization team and attendees of SOTM LATAM 2017, see you in other place in Latam in 2018!

Facebooktwittergoogle_plus

Fire up the editors: ImproveOSM updated with many new things to fix in OSM

Our OSM team continually processes billions of anonymized GPS traces we receive through the Scout app and partners, in order to discover things potentially wrong or missing in OSM. We call this effort ImproveOSM, and it  is a big part of Telenav’s overall mission to keep making OSM even better.

Missing Roads in Northern Brazil. The denser the GPS point cloud, the more trips and the more likely you are helping people get around more accurately!

Our most recent update to ImproveOSM was a particularly big one. In the last month, we added:

  • 133 thousand missing roads tiles
    • Another 75 thousand tiles that are likely parking areas or tracks
    • Another 670 thousand (!) water tiles (see below)
  • 300 thousand suspected turn restrictions with over 50% high confidence

Using ImproveOSM data

Perhaps you have not looked at ImproveOSM data before. It is available through the ImproveOSM web site, which is based on the iD editor. The screenshots on this page are from that web site. If you know how to edit with iD, you will find it easy to work with ImproveOSM data and use it to edit OSM. We wrote a post that goes into more detail a little while ago.

If you prefer JOSM, we have created an ImproveOSM JOSM plugin as well. it works similar to the web site: you choose what ImproveOSM data you want to see (suspected missing roads, suspected wrong one-way roads, or suspected missing turn restrictions, or all of the above!) and the plugin will show you the ImproveOSM data as a separate layer. We also have a blog post about using the JOSM plugin.

Finally, a few interesting / funny examples of ImproveOSM data around the world.

ImproveOSM data points out that a new road alignment is now in use. Aerial imagery and OSM have not been updated yet. This is in northern Sweden.

Here, we stumble upon an undermapped town north of Surat, India. Of course, there are un- and undermapped areas everywhere in the world, but the ImproveOSM data shows that there are people driving around on these streets using a GPS enabled app or vehicle — people who would benefit from better OSM data in their everyday lives. It is not hard to find places like this around the world.

Finally, an animation showing clusters of ‘water’ tiles. This is a side effect of the partner data we process. Since it’s anonymized there is no way to say anything about why these traces exist. Useful for OSM? Perhaps.. Interesting? I think so!

Are you finding interesting, useful, funny or wrong data in ImproveOSM? Let us know! Happy Mapping!

Facebooktwittergoogle_plus

Is OpenStreetMap Big Data ready?

This article was written by Adrian Bona as a draft for a talk at State of the Map US in Boulder, Colorado this past month. The talk did not make it into the program, but the technology lives on as a central part of our OpenStreetMap technology stack here at Telenav. We will continue to deliver weekly Parquet files of OSM data. Adrian has recently moved on from Telenav, but our OSM team is looking forward to hearing from you about this topic! — Martijn

Getting started with OpenStreetMap at large scale (the entire planet) can be painful. A few years ago we were a bit intrigued to see people waiting hours or even days to get a piece of OSM imported in PostgreSQL on huge machines. But we said OK … this is not Big Data.Meanwhile, we started to work on various geo-spatial analyses involving technologies from a Big Data stack, where OSM was used and we were again intrigued as the regular way to handle the OSM data was to run osmosis over the huge PBF planet file and dump some CSV files for various scenarios. Even if this works, it’s sub-optimal, and so we wrote an OSM converter to a big data friendly columnar format called Parquet.The converter is available at github.com/adrianulbona/osm-parquetizer.Hopefully, this will make the valuable work of so many OSM contributors easily available for the Big Data world.

How fast?

Less than a minute for romania-latest.osm.pbf and ~3 hours (on a decent laptop with SSD) for the planet-latest.osm.pbf.

Getting started with Apache Spark and OpenStreetMap

The converter mentioned above takes one file and not only converts the data but also splits it in three files, one for each OSM entity type – each file basically represents a collection of structured data (a table). The schemas of the tables are the following:

node
 |-- id: long
 |-- version: integer
 |-- timestamp: long
 |-- changeset: long
 |-- uid: integer
 |-- user_sid: string
 |-- tags: array
 |    |-- element: struct
 |    |    |-- key: string
 |    |    |-- value: string
 |-- latitude: double
 |-- longitude: double

way
 |-- id: long
 |-- version: integer
 |-- timestamp: long
 |-- changeset: long
 |-- uid: integer
 |-- user_sid: string
 |-- tags: array
 |    |-- element: struct
 |    |    |-- key: string
 |    |    |-- value: string
 |-- nodes: array
 |    |-- element: struct
 |    |    |-- index: integer
 |    |    |-- nodeId: long

relation
 |-- id: long
 |-- version: integer
 |-- timestamp: long
 |-- changeset: long
 |-- uid: integer
 |-- user_sid: string
 |-- tags: array
 |    |-- element: struct
 |    |    |-- key: string
 |    |    |-- value: string
 |-- members: array
 |    |-- element: struct
 |    |    |-- id: long
 |    |    |-- role: string
 |    |    |-- type: string

Now, loading the data in Apache Spark becomes extremely convenient:

val nodeDF = sqlContext.read.parquet("romania-latest.osm.pbf.node.parquet")
nodeDF.createOrReplaceTempView("nodes")

val wayDF = sqlContext.read.parquet("romania-latest.osm.pbf.way.parquet")
wayDF.createOrReplaceTempView("ways")

val relationDF = sqlContext.read.parquet("romania-latest.osm.pbf.relation.parquet")
relationDF.createOrReplaceTempView("relations")


From this point on, the Spark world opens and we could either play around with DataFrames or use the beloved SQL that we all know. Lets consider the following task:

For the most active OSM contributors, highlight the distribution of their work over time.

The DataFrames API solution looks like:

val nodeDF = nodeDF
    .withColumn("created_at", ($"timestamp" / 1000).cast(TimestampType))
    .createOrReplaceTempView("nodes")

val top10Users = nodeDF.groupBy("user_sid")
    .agg(count($"id").as("node_count"))
    .orderBy($"node_count".desc)
    .limit(10)
    .collect
    .map({ case Row(user_sid: String, _) => user_sid })
    
nodeDF.filter($"user_sid".in(top10Users: _*))
    .groupBy($"user_sid", year($"created_at").as("year"))
    .agg(count("id").as("node_count"))
    .orderBy($"year")
    .registerTempTable("top10UsersOverTime")


The Spark SQL solution looks like:

select 
    user_sid, 
    year(created_at)) as year,
    count(*) as node_count
from 
    nodes
where 
    user_sid in (
        select user_sid from (
            select 
                user_sid, 
                count(*) as c 
            from 
                nodes 
            group by 
                user_sid 
            order by 
                c desc 
            limit 10
        )
    )
group by 
    user_sid, 
    year(created_at)
order by 
    year


Both solutions are equivalent, and give the following results:

alt tag

Even if we touched only a tiny piece of OSM, there is nothing to stop us from analyzing and getting valuable insights from it, in scalable way.

If you are curious about more advanced interaction between OpenStreetMap and Apache Spark, take a look at this databricks notebook.

OpenStreetMap Parquet files for the entire planet?

Telenav is happy to announce weekly releases of OpenStreetMap Parquet files for the entire planet at osm-data.skobbler.net.

Facebooktwittergoogle_plus

New ImproveOSM tiles are ready to be used!

New ImproveOSM missing road tiles are available! The new data is very helpful as they can help you to target the missing roads, add them to OSM and thus greatly improving the map.

Worldwide, there are 113048 new road tiles.  The countries with the highest number of tiles are: Russia – 38669 tiles, United Kingdom – 8890 tiles, Kazakhstan – 10993 tiles, India –  9418 tiles and the United States- 7560 tiles (see graph below). There are few new tiles in Detroit too so that you are welcome to give us a hand with them! You can find more information about our work in Detroit on our blog (http://blog.improve-osm.org/en/2017/08/lane-number-and-turn-lane-editing-in-detroit/).

 

Facebooktwittergoogle_plus

Lane number and turn lane editing in Detroit

Since we started editing in Detroit, we focused on making OSM navigation ready. We started with the basics: road geometry, road name, turn restrictions, and then we were able to further build on this foundation by adding details like lanes and turn lanes. In the last four months, we focused on adding and updating the lane info (lane number and  turn lane) on motorway, motorway_link, trunk, trunk_link, primary, primary_link, secondary, secondary_link roads in Detroit, Michigan.

For editing lanes and turn lanes we used JOSM, the TurnLanes-tagging Editor plugin and the Lane and road attributes map paint style.

We had two kinds of lane editing: unidirectional road editing, bidirectional road editing. The only difference between those two is the direction tag used in the second case, as you can see in the below table:

For every edited case, we used a simple workflow:

  • we split the way where the number of lanes changes
  • we checked and double checked the aerial imagery to make sure we enter the correct number of lanes and add the appropriate lanes tag
  • we opened the turnlanes-tagging plugin and activated the Lane and road attributes map style
  • using the plugin, we selected the type of the road: Unidirectional road or Bidirectional road
  • we marked the number of lanes for each way needed
  • we marked  the direction on each lane
  • before uploading the data, we checked again that the turn lanes that we had added  were similar to the markings on the road!

The approach of the main cases we’ve met during our edits are exemplified in the next GIFs.

Editing the number of lanes

Adding both ways lane

In some particular cases, when there were doubts, we consulted the OSM community on Github and Talk-US.

While editing, we paid special attention to other already existing features (like route relations, turn restrictions, speed limits, etc). Because all Telenav Mapping team was involved in this project, we established from the beginning some rules, in order to have consistency in our edits:

  • Add a new lane only when you have a line marked on the road (use the satellite imagery, OSC photos  to validate the marks).
  • Links without any marks on road or without one way tag should be edited as a bidirectional road, adding one lane on both driving directions.
  • Never add the turn lane before or after the continuous line mark on the road. The turn lane will be added  starting from the beginning of the continuous line mark on the road.
  • We split and edit lane number even when we have small segments of ways.
  • The location of the junction nodes should be at the beginning of the continuous line marks.
  • We always add the yellow both way lane.
  • We DO NOT add the yellow striped lanes and double marked line lanes.

The main sources used during the project were aerial imagery (Bing, Mapbox, NAIP, Digital Globe) and street level imagery: OSC, Mapillary.

We worked on this issue for 2 months and succeeded to review a large part of motorway, trunk, primary and secondary roads from Detroit area, in order to add or update lane info. During this project, we managed to review 3100 miles and edit 1730 miles of roads.

Here’s how the number of miles of roads with lane information has increased during the project:

The edits we made cover a large area of the Wayne, Macomb and Oakland counties. In the GIF below you can see an evolution (difference between March and July) of our lane info edits in OpenStreetMap.

Heatmaps with our edits during the last four months:

When we finished editing lanes and turn lanes in Detroit, we started assessing the general quality of the lane info by using different approaches. Internally, we call this process quality assurance and we think it is vital to do it after the end of each project.

During the QA process we edited lane info on about 400 miles of roads, and the main issues that we corrected were:

  • incorrect number of lanes and turn lanes
  • duplicated/overlapping ways
  • missing both way lane
  • oneways with lanes:forward/lanes:backward info
  • check roundabouts to have the proper number of lanes

Below you can see some examples of our improvements:

 

Facebooktwittergoogle_plus

Find your MapRoulette Challenge

MapRoulette is a fun way to spend a few minutes (or hours…) improving OpenStreetMap. MapRoulette will present you with a random, easy to solve issue in OSM. MapRoulette is organized in ‘Challenges’, groups of tasks that are of the same nature. For example, there is a challenge to add missing crosswalks in various areas in Switzerland, based on analysis of aerial images.

How do you find a challenge you would like to work on? The MapRoulette home page provides a map of all the challenges, but this has some shortcomings. The challenge ‘centers’ are no

t always representative of where the tasks actually are located. It is also hard to search by topic. MapRoulette also has a search bar that you can use to find a challenge by keyword.

I want to work on making it much easier to find

 interesting MapRoulette challenges, and I would like to hear from you how you think that should work. Please add a comment below with your ideas!

In the mean time, I made a page that lists the most popular and newest challenges. It is a bit of a hack so let me know if it stops working 😉

Happy Mapping!

Facebooktwittergoogle_plus

Help fix up TIGER v1 ways

Old, untouched TIGER ways are still abundant in OSM 🙁 and fixing them up seems to be an endless task.

ugh!

I don’t know why I didn’t do this before, but I finally got around to making a MapRoulette challenge so we can fix them together:

>> Go to the challenge <<

Because the number of old TIGER ways is huge, this challenge covers only a tiny part of the U.S. as you can see here:

Once this part is done, we can reload the challenge with more old TIGER ways.

If you look at the screenshot above, you can also see what the query is that goes into Overpass to create the challenge in the first place. You can easily adapt it to make your own local challenge if you want to start fixing up old TIGER ways with your local mapping friends! (Why not organize a TIGER fixing party? OSM US will pay for pizza!)

If you’re interested in the Overpass details and some ideas for improving it, keep reading. Otherwise, just start fixing! 

Query Overpass for old TIGER

Here is my extremely simplified way to query Overpass for old TIGER ways:

way[highway]["tiger:tlid"](40, -113, 41, -111);
out body geom qt;

It takes the bounding box (40, -113, 41, -111) and searches for ways that have the highway tag as well as the tiger:tlid tag. This query should be a pretty good approximation of a real old TIGER way query, because the tiger:tlid tag is removed automatically when you edit such a way in iD or JOSM. So any way that still has this tag must not have been edited since the import.

This query falls short of a real old TIGER ways query, because the nodes that make up the way may very well have been edited. I am also not 100% sure under which circumstances the editors remove the tiger:tlid and other unnecessary TIGER import tags. It may be safer to look for last edited date or version number. If you have suggestions for improvement, please let me know in the comments.

Happy mapping!

Facebooktwittergoogle_plus

Improving OSM in Canada one day at a time

Ever since we started our mapping project in Canada, nearly 8 months ago, we’ve been continuously working on bringing the OSM data to the level where all elements needed for routing get as detailed as possible.

Whether we are talking about the basics of road networks such as geometry, naming or traffic flow direction, to in-depth details like number of lanes, turn lanes, turn restrictions, signposts and even complex relations referring to highways, we edit everything.

Our main focus is oriented towards the Top 5 metro areas: Toronto, Montreal, Ottawa, Vancouver, Calgary. These are the places where we spent the most of our time researching for open data, adding new features, editing existing ones. In order to make sure that the overall state of OSM throughout the entire region of Canada is in navigable ready state, we’ve also included the first 50 cities based on population.

So, let’s see some numbers and graphs because everybody likes those. If we start looking at the numbers for the entire region we can see a significant rise in road geometry that was added, around 3% (25,330 miles) out of the total numbers of miles. The same goes for roads that previously did not have name tags with a rise of little over 3.5% (16,799 miles).

A more significant change can be noticed for features that weren’t extensively mapped before in the area, such as turn restrictions rising from 5254 to 54891, or signposts which hadn’t been mapped under the same standardized method. With the help of OpenStreetCam and Mapillary pictures, we’ve managed to add relevant signpost information increasing the number of nodes well over 68%.

If we break down the numbers for the Top 5 areas, the most noticeable changes can be observed for both Toronto and Montreal where oneway tags and signpost information have been improved.

One of our main goals is to focus not only on quantity but especially on quality. This is why we have multiple tools for integrity checking that are ran periodically on the entire region of Canada. These tools cover a wide variety of cases that are being corrected weekly, such as: road name flip-flops, unconnected ways, smoothness problems, misnamed road, road names having their suffixes or prefixes abbreviated and many more.

We make use of different QA tools (KeepRight/Osmose) to search and track issues in OSM that have either been added by mistake or have remained unedited after large imports. We’re also on the look out to improve way accuracy and fix alignment issues.

An overview of our edits.

Below you can see some examples of our improvements.

Road geometry updates.
Road geometry alignment.
Missing geometry and minor refinements.
Turning loops updates.
Facebooktwittergoogle_plus