Analyzing the spatio-temporal patterns and impacts of large-scale data production events in OpenStreetMap

Yair Grinberger

Playlists: 'sotm2019' videos starting here / audio / related events

In this talk, large scale data production events in OSM are identified, characterized, and their spatio-temporal patterns and impacts are analyzed. The results show that remote mapping events produce more data today than bulk imports, yet that the former type has a more lasting impact on representation, hence pointing towards possible steps for maximizing the positive influences of events of different types.

Volunteered geographical information often visions data as a product of individual actions. In OpenStreetMap (OSM) however, contributions are frequently made as part of large-scale data production events. These events, which can take multiple forms (e.g. organized activities of local chapters, mobilization of global communities, and imports of externally collected datasets), do contribute much to the OSM project. Nevertheless, they also hold the potential to significantly affect local representations by changing the development course of data and community, thus biasing representation. Hence, it is important to identify and understand such events, as well as their impacts upon the data.
This talk sets out to contribute to the study of these issues by identifying large-scale data production events in OSM, classifying them, and analyzing their spatio-temporal patterns and impacts. For this, we use the OSM History Database (OSHDB) tool to extract the cumulative number of contribution operations (i.e. the operations made as part of each contribution) by month for different areas. Assuming that in the absence of events the cumulative distribution of monthly operations over time would follow an S-shaped form (since data grows constantly, and even more so when the community grows, until it reaches some form of saturation), we fit a logistic curve to each of these time series. Events are identified as months where observed values are significantly higher than the ones predicted by the fitted curve. Thus, events are defined not only in terms of their absolute size but also according to their relative weight in the development of the data. In the subsequent step, events are clustered into types according to different measures, e.g. the maximal number of contributions made by one user and the share of creations, deletions, tag changes, and geometry changes in all contributions, representing their nature in terms of centralization and contribution themes.
The results show that a significant share of all OSM contributions is made as part of an event, with some data regions almost entirely dominated by these. Furthermore, it does not seem that the number of event contributions is decreasing over time. Looking deeper into the nature of events, we identify two different event types based on the contribution of individuals – local events and remote mapping events – and several bulk import event types, diverging mostly in the share of creations in the events’ contributions. Computing the number of events over time shows that while data creation imports were the most frequent type of events early on, over the last years remote mapping events are contributing the most data. Locally based events also show a significant increase in data production. However, these types of events are not distributed evenly across the globe, with import events frequent mostly in countries with developed economies and remote mapping events being more common in the least developed regions of the world. Interestingly, the negative (and expected) correlation between the time of the event and its impact on the data exists only for import events and not for remote mapping events. Hence, mapping and analyzing large-scale events allows relating the nature representation to socio-economic effects.
This talk further breaks down the spatio-temporal patterns of events, investigating whether the temporal patterns for different regions follow the global ones or are there clusters of temporal change as well. Furthermore, we study the nature of events’ impacts, presenting how the values of measures such as the stability of events’ contributions and change in the number of active mappers vary by event type and area. These results, beyond promoting a deeper understanding of events and representation in OSM, allow assessing the implications for the project of current and expected trends in OSM data production, hence facilitating the formulation of global and local steps aimed at maximizing events’ positive impacts and controlling their adverse influences.