Importing Data

This document is primarily aimed at JOSM users, sorry. I prefer JOSM over ID or Potlach (both of which are fine editors too), primarily because I work offline frequently. That and I have a tendency to work on large files, which the online editors don't. Also note this is focused on using the source data files on this site, which have already been processed to some degree. That's a whole other tedious process I'll document elsewhere. Somebody else's conversion may come up with different results. I've included all the raw Shapefiles too, which JOSM can read in a limited fashion. To view the original Shapefile with all it's metadata, I typically use my own utility program, or often use QGIS There's also a python utility called shpinfo you can also use. Many GIS application support Shapefiles. I convert everything to OSM format mostly because that is the primary format all the tools use, including my own software.

Bing Building Footprints

While converting the Bing building footprints, I noticed the poor quality of the data. Microsoft used machine-learning to identify buildings from satellite imagery. This appears to have problems with mountains, and many snow fields and large boulders got identified as a building where there are none. I also found it incorrectly identifies big rocks in the desert as houses. Maybe this isn't a problem elsewhere as it appears to primarily happen with above treeline or rocky areas. I found I had to zoom in far for each building in the data to be sure what it is. If you are also using the address data, it's a little easier as you can always start by just adding buildings that have an associated address. As a reality check, I wrote a program that flags any building not near an existing address or a reasonable distance from an official road as a possible error. Then I go check the imagery to make the final determination.

Conflating building footprints is relatively painful, as the data doesn't conflate easily. Subtle differences in the polygon used for the footprint break most automated conflating programs. In the rural areas I mostly work in, there are few, if any, building footprints in OSM already. Before importing anything I search for existing buildings and addresses so I have a base idea of existing data. If you are using JOSM, I do a search for 'modified' after adding data, then validate it. Often the validation will flag a building inside another building, which means it's a duplicate, catching the few buildings that failed conflating. JOSM is pretty good about using color so as you select, paste, etc... You can also get a good feel that you're not duplicating a bunch of buildings. But anyway. you have to be really careful importing building footprints. Also JOSM will gray out background layers, which is a quick way to see differences in the layers.

Building Addresses

Importing house address data is pretty easy. The data conflates reasonably well. One thing to note is often the address isn't at the same location of the building. Often it's the end of the driveway, or just random. When building footprints have already been imported, I often move the address node into the building way. Sometimes there are addresses with no building yet. One big problem with importing address data is the conversion process. Most address data that I've seen uses abbreviations heavily, whereas OSM has a different naming convention. I've written software to handle much of this cleaning up and normalizing names. Some of these files contain additional data, like addr:full or addr:city. Those should be deleted before importing as all OSM needs is the house number and street name for navigation to work. These are mostly included as an aid to validating and debugging the conversion process. Excluding them from any importing is a good way to keep file sizes and storage requirements small. Address data is important to support navigation to house addresses, which is what the CAD messages we get from dispatch contain.

Addresses conflate reasonably well if all the street names have been normalized to the OSM naming conventions. Unlike buildings which have to have their polygons compared to conflate, each address is usually unique, making it easy to find matches between OSM and the addresses being imported. If using the HOT Tasking Manager, export the boundary of the tile you're working on, and load that into JOSM. That'll appear as a purple polygon. When conflating addresses, the problems arise when the areas you are comparing aren't the same size. I use the search function in JOSM to limit the data I select, namely 'addr:housenumber". To use the JOSM conflate plugin, select all existing addresses in the OSM data, and make that the reference dataset. Then switch to the layer containing the new address data, and select only the addresses inside the purple polygon. You can hand select the area, or thare are various others ways to selecting only the data inside the polygon. Then when you generate the matches, the new addresses are under the Source tab. Select all the items under the Source tab, then right click to select only those addresses. Paste them into the OSM data layer, then validate and upload once any issues are fixed.

Importing Road Data

There are several ways of conflating data. Some ar easy, but it's usually a tedious and time consuming manual process. The most accurate process is slow, with multiple validation psses on the data. Much of the existing road data for rural areas was a bulk import in 2008 of the US TIGER data, which is borderline crap, but better than nothing. Ignoring the issue of road alignment, the TIGER metadata sometimes is completely wrong when compared to other sources. And yes, it's often an abstract reality when it comes to road alignment. In my experience, the USDA road alignment is much better. Much of this type of import is often just fixing existing data in OSM and correcting the metadata.

Manual Conflation with JOSM

I find I can do a lot of easy conflation just using Layers in JOSM. Not the fastest way to do things as it involves lots of tedious mouse clicking, but is effective. The one big advantage is most of the time you're able to do validation at the same time because you're dealing with the raw metadata, and in small pieces. It's pretty simple. The background OSM Carto imagery already has the existing OSM data of course. When viewing the OSM layer, data from other layers appears as a black line. It's pretty easy to what is missing. Something to be aware of is the other layers data may overlay the OSM data, so more often you find yourself extending a road to the actual end, rather than adding one. You can trace the black line from the other layer while in the OSM layer, that's usually easier than doing the same with + satellite imagery. Course you want to validate this extension by checking the satellite imagery as well. Lately I've been working with multiple layers this way to add the obscure 4wd only roads we use for rescue and wildland fire access.

That covers the basic road alignment, but for emergency response, we prefer more metadata than that. Often one of the layers has more metadata than others, and can be used as the reference source. For this project, I assume the USDA and USGS produced data is correct when it comes to road names and vehicle access options. At least around here, the OSM metadata is often missing the USFS designation. That road number is often used for directions, so it's important to add that. As a note, OSM style guidelines are the road name should be prefixed with FS, instead of expanded into Forest Service Road. Many rural roads have both a common name, a county designation, and the USFS one.

The metadata we want is anything useful to somebody that hasn't been there before. Imagine yourself responding to a location in the dark, and you want to know if the road will support a big vehicle (fire engine), a smaller one (brush truck), or is only ATV/UTV accessible, or maybe only foot accessible. All this road data has crude ratings for road quality, surface, etc... all of which should be imported. Most Shapefiles have many fields we don't care about, those should be ignored. Unfortunately the comparison of metadata between multiple sources is tedious, but I've yet to find an automated way to do this that didn't cause more trouble than it was worth.

When multiple files are loaded into JOSM, you can use the Layers window to toggle their visibility on and off. You can also use the Imagery menu to enable your choice of satellite imagery and the standard OSM Carto. With the Carto imagery as a background, any additional data not on the basemap will be visible. I then zoom into a smaller area to work. I create a new layer, then download the data for that view in JOSM, or use an existing and up to date OSM data file. If you do this offline, you need to update your changes in JOSM, revalidate, and then upload.

If a road doesn't exist at all, it's easy. Copy the object from the reference source to OSM if it has a compatble license. If it's only the metadata that is missing, then cut & paste the tags. Compare any existing tags with the reference metadata, there may be differences, so be careful replacing any existing tags. If the road was traced from satellite imagery, chances are there is no metadata. Don't forget to simplify the way for any road you are adding, and don't forget to fix everything validation finds.

Things get more interesting when the start and end of the road in OSM doesn't match the USDA locations. As this is important, it needs to be fixed. Often obscure roads blend into one another, and this breakage makes direction finding difficult. I fix this by splitting the way at the road intersection., If you combine it with the correct way, the combined way will have all the right metadata and be a single object. Another problem is long driveways. Some really are roads, but often the driveway is a short spur off the end of the actual road. Satellite imagery may help or it may not.

Conflation Tools

I've never found a conflation tool that works a wide variety of data sources. One of these days I'm going to write one. I have used all of these in the past with varying degrees of success. All of these tools have a pretty hefty learning curve, so maybe I'm just lazy.

JOSM Conflation Plugin
JOSM has a conflation plugin that works reasonably well. You have to be an experienced JOSM user though, but if you're already editing with JOSM, it's useful.
Hootenanny
Hoot has both a web UI, and a command line one. The command line utilities are useful for scripting. It does a reasonably good job, and is well thought out. It isn't very portable though, I have problems keeping it running.
OpenJump
OpenJump has the Road Matcher plugin, and is focused on conflating roads.

I wound up writing my own offline conflation and validation software mostly because I can and have the time. It's currently a working prototype, but will get released under the GPLv3 when more stable.

Top of project, Top of source data

Copyright © 2019,2020 Seneca Software & Solar, Inc