USGS Data Conversion

I used the GDB files from Download National Dataset, after I noticed they get updated more frequently now. I downloaded this copy on May 28, 2019, and then used ogr2ogr to convert them to a Shapefile. Shapefile field names are limited to 10 characters. We don't care as the later conversion changes the field names to OSM tags, which is what we'll be using.

Road Core

Road Core is all the roads maintained by the USDA, and doesn't have much metadata beyond the name. The description of the metadata is here. These are the Shapefile fields kept and converted:

These fields were dropped: RTE_CN, ID, SYMBOL_COD, JURISDICTI, SYSTEM, ROUTE_STAT, PRIMARY_MA, COUNTY, CONGRESSIO, ADMIN_ORG, SERVICE_LI, LEVEL_OF_S, MANAGING_O, LOC_ERROR, SECURITY_I, IVM_SYMBOL, FUNCTIONAL.

Road and Trail MVUM

MVUM maps are the vehicle usage maps, and contain metadata on the types of vehicles that can be used on a road. The description of the metadata is here These are the Shapefile fields we keep and convert:

These fields were dropped: RTE_CN, ID, BMP, EMP, SEG_LENGTH, GIS_MILES, SYMBOL, JURISDICTI, SYSTEM, SEASONAL, PASSENGERV, PASSENGE_1, HIGHCLEARA, HIGHCLEA_1, TRUCK_DATE, BUS_DATESO, MOTORHOME_, FOURWD_GT5, FOURWD_G_1, TWOWD_GT50, TWOWD_GT_1, TRACKED_OH, TRACKED__1, OTHER_OHV_, OTHER_OHV1, OTHER_OH_3, ATV_DATESO, MOTORCYC_1, OTHERWHEEL, OTHERWHE_1, TRACKED__2, TRACKED__3, OTHER_OH_1, OTHER_OH_2, ADMINORG, SECURITYID, FORESTNAME, ROUTESTATU, GLOBALID, TA_SYMBOL, SHAPE_LEN, PFSR_CLASS.

For the smoothness setting, the data itself is a number with a long string. Here's the conversion I'm using:

Here is the OSM definitions for Key:smoothness.

Most of the other conversion is pretty simple. lanes is of course a number. The ATV/UTV etc... access are simple booleans in the source data. The names are in all CAPS, which is tacky, but my conversion software handles that. Using something else you'll need to fix all of the names. Here is a document on OSM naming convrentions.

Converting address Data

Colorado Statewide Addresses This is an aggregate of addresses from many, but not all, Colorado counties. Other people and organizations are also importing this data. Often it's data for rural areas appears to lag behind the same source data available directly from the counties due to a lack of funding for GIS in rural areas. Only a few fields are kept in the conversion process, the other fields, act_stat, county, is_cai, latitude, longitude, date_mod_d, time_mod_d, num_suf, parcel_id, place_id, place_name, post_dir, post_type, pre_dir, pre_type, proc_stat, sauid, unit_type, and zipcode are ignored.

The coordinates are part of the object, I have no idea why there are also latitude and longitude fields, as these are unnecessary. The fields kept are:

which are all OSM requires. addr_full is also extracted to help validate addresses later when making corrections, and deleted before uploading. Some counties use the same fields differently, making it easier to parse addr_full for the street name rather than extract multiple other fields about construct the street name. Also a few counties had a few unit numbers as part of the street, which ws hard to catch until validation time.

Building Footprints

While it's very common to add buildings by manually tracing sat imagery, it is tedious and time consuming. Microsoft used AI to try to identify buildings in the US, and put it online under a public domain license. Multiple other projects are also importing these into OSM. It's a lot of data, but available from here using GIT. There's zero data other than polygons, so no real conversion to do. The data in some areas, particularly deserts and mountains many of the buildings are actually large rocks. I wrote a program to scrub the data for stupidity, which is documented elsewhere. Even after all that I have to visually review using sat imagery before uploading. All I did was greatly reduce what has to be manually reviewed.

Other Conversion Techniques

There are other ways the data was converted, but were more of a manual operation. If only a one-time conversion was needed, sometimes it's slower, but easier to do it this way. Part of the problem is dealing with huge files that choke most map editors. QGIS or JOSM can be used to delete or rename tags. It's slow with big files, but doable if you only have to do it once. Once in OSM format, I've used a text editor to do global deletions and search and replace to clean up bad data found during conversion. I've even on rare occasion used emacs macros that was slow, but faster than writing a custom program. Osmfilter and osmconvert can also be used to delete or change tags, and ogr2ogr will do this as well. I wrote my own software for this. Whatever was used to convert the data, the final result needs to be validated.

Top of project, Top of source data

Copyright © 2019,2020 Seneca Software & Solar, Inc