conference logo

Playlist "State of the Map 2021"

Towards understanding the temporal accuracy of OpenStreetMap: A quantitative experiment

Levente Juhász

This talk presents results of an experiment conducted on the temporal accuracy of OpenStreetMap, and provides insights into the temporal dynamics with which changes in real-life appear in OSM.

The ability to provide timely information compared to traditional collection methods of geographic information is generally considered as one of the main advantages of volunteered geographic information (VGI) since its emergence in the 2000s (Goodchild, 2007). In addition to several anecdotal examples illustrating how VGI data can provide more up-to-date information than authoritative sources, the literature provides ample evidence on the usefulness of VGI in applications that require timely geodata, such as disaster management (Horita et al., 2013; Neis & Zielstra, 2014). For example, the Haiti earthquake relief effort in 2010 laid the foundations of how remote contributors of OpenStreetMap (OSM) and other platforms can make a difference and aid responding humanitarian agencies after a crisis (Zook et al., 2010). The Humanitarian OpenStreetMap Team has made numerous contributions and helped save lives at numerous instances ever since (Herfort et al., 2021). However, apart from these examples, the temporal dimension of VGI has not received much research attention outside the application of disaster management, and there is a huge gap between assessing temporal accuracy and other factors of data quality, such as spatial accuracy (Antoniou & Skopeliti, 2015; Yan et al., 2020). Aubrecht et al. (2017) highlighted the lack of formal acknowledgment of temporal aspects in the concept of VGI and proposed a framework called ‘Volunteered Geo-Dynamic Information’ to fully integrate spatial and temporal aspects of VGI. Other works utilizing the temporal component in VGI often focus on the behavior of contributors rather than the currency and temporal validity of map features they contributed (Bégin et al., 2018; Haklay et al., 2010; Neis & Zipf, 2012), or studied the evolution of data over time (Girres & Touya, 2010; Zielstra & Hochmair, 2011). While these approaches are useful, by nature they cannot provide a quantitative measure of how current OSM (or VGI in general) is. Arsanjani et al. (2013) noted during their investigations that the temporal accuracy of OSM could not be measured using their traditional extrinsic method, because OSM data was compared to authoritative data that did not contain temporal information (i.e. most recent street configuration regardless of when road segments were built or renovated). Another project, ‘Is OSM up-to-date?’ recognizes the lack of information on temporal accuracy and developed a tool that uses an intrinsic approach to visually show features that potentially contain outdated information (Minghini & Frassinelli, 2019). However, by nature, an intrinsic approach can also not provide an absolute measure of how up-to-date OpenStreetMap is.

This research attempts to fill a gap in the literature by conducting an experiment on the currency of VGI. Using OSM data as a case study, it will measure the temporal accuracy of selected map features. This research overcomes previous limitations by using official data provided by the Florida Department of Transportation (FDOT). The dataset contains details about state-funded highway construction projects, including the date these projects were completed, therefore accurately measuring the temporal accuracy of OSM features is possible by comparing dates projects were finished with the time at which corresponding OSM edits in the database were made. This time difference describes how long it took for the OSM community to adapt to real-world changes and update the map database accordingly.

The historical version of highway construction projects was filtered to projects completed between May 15, 2016 and April 1, 2021. Further, only a subset of projects were used, that resulted in either 1) new infrastructure (new roadways, roundabouts or highway ramps), 2) new lanes in existing roadways (excluding bike lanes), and 3) new bike lanes or paths. Other construction projects, such as traffic improvements, road resurfacing, regular maintenance (e.g. bridge rehabilitation), etc. were excluded, since a useful, high-quality road network database can be maintained without the addition of these information, therefore, they are less likely to migrate into OSM. The methodology uses augmented diffs from the Overpass API to find all changes that occurred on OSM highway features (creation, modification and deletion) and are spatially and temporally close to construction projects. These changes are then matched with a record from the highway construction dataset. Irrelevant changes (i.e. changes made to other highway features) are removed. This is done by manually interpreting and evaluating changes and construction projects using a description field (e.g. “SR 61 WAKULLA SPRINGS RD @ CR 2204 OAK RIDGE ROAD INTERSECTION - ROUNDABOUT”). The data extraction algorithm initially queries the Overpass API for changes one week beyond the completion date of a particular project. In case no relevant change can be found, iterative queries for 7-day-long time slices are made until a relevant change is found, or until the current date is reached. Lastly, the time difference between the end date of construction projects and the first OSM change that introduced the change in OSM are calculated. For example, the description field above mentioning State Road 61 (SR61) can be found with the following Overpass query (https://overpass-turbo.eu/s/16XV) that uses the location of the highway construction project. Interpreting whether an extracted change is relevant or not can also be verified using changeset comments: (https://www.openstreetmap.org/changeset/87938707). In this example, the changeset comment “Added new round about.” confirms that the OSM edit is related to the FDOT dataset. Comparing the construction end date (July 3, 2019) and the time when this change appeared in OSM (July 13, 2020) yields 1 year and 10 days, which is the time it took the OSM community to adapt a real-world change and bring the database up-to-date.

This talk will be structured as follows. First, results of a comprehensive literature review on the temporal aspect of OSM research will be given to highlight the lack of data-driven, quantitative research on the temporal component of OSM and VGI. Then, using the filtered FDOT construction dataset that contains 23 new highways and roundabouts, 64 new bike lanes and paths, and 129 new traffic lane additions, the results of an exploratory data analysis about the currency of OSM will be presented. The summary and descriptive statistics of a reasonably large sample will provide insights into the currency of OSM and the dynamics of temporal accuracy. Lastly, limitations of the experiment will also be discussed. These include the reference dataset, that does not contain federally or locally funded projects, therefore misses a large number of constructions, and the methodology, that cannot capture the diversity of the OSM community and also disregards changes beyond the transportation infrastructure.

This experiment is the first attempt to investigate the timeliness and currency of Volunteered Geographic Information using large sets of data. Future work will conduct analysis using more VGI data sources outside the domain of mapping applications (e.g. Points of Interest in check-in trackers and review applications), new methodology using tile-reduce, OSM QA tiles and vector tiles built from other datasets. The new methodology will be scalable and will allow for analysis across world regions. Furthermore, a rule-based decisions approach based on tags and semantics will be used to eliminate the need for manually checking and verifying whether VGI updates correspond to the reference dataset or not.