Machine Learning is incredibly popular at this time among researchers working with OSM data and on OSM-related problems. But what impact has this work on ML had on the OSM database or OSM community? We investigate the impact on OSM, if any, the ML work within the academic research community has had over the last few years.
# What has machine learning ever done for us?
Peter Mooney and Edgar Galvan,
Department of Computer Science,
Maynooth University, Maynooth.
Co. Kildare. Ireland.
### Introduction and background
Recently, machine learning (ML) and artificial intelligence (AI) based approaches are being applied frequently to many different types of problems in OpenStreetMap (OSM). Indeed, ML and AI have being used extensively by the research community for a plethora of applications and problems both related and unrelated to OSM. Wagstaff (2012) suggests ML offers "a cornucopia of useful ways to approach problems which defy manual solutions". In specific relation to the geospatial domain, ML approaches have been reported at least as early as a decade ago with work by authors such as Werder et al. (2010) on interpretation of buildings in settlements and detecting road intersections from GPS traces by Fathi and Krumm (2010). Around this time, interest in the combination of ML and OSM began to emerge. Funke et al. (2015) argued that many aspects of OSM data might be suitable for "extrapolation or classification using ML". Many examples have emerged with ML approaches being used to consider problems such as: predicting or recommending tagging for objects, object classification based on contextual or proximity information, tag usage checking, automated mapping approaches, to mention some problems. Jennings et al. (2019) showed that Facebook’s recent mapping campaign in OSM used ML to detect road networks from satellite imagery which are then validated by OSM editors and the local OSM communities. Examples also exist where OSM is used in ML approaches for other geospatial classification problems (Wu et al. (2020), Jacobs and Mitchell (2020)) while authors such as Feldmeyer et al. (2020) used machine and deep learning algorithms with OSM for developing socio-economic indicators. Audebert et al. (2017) provided additional examples and argued that OSM's richness means it can be used in difficult problems such as semantic labeling of aerial and satellite images.
In addition to the observations by Vargas-Munoz et al. (2021) in their recent review of ML approaches in OSM, we can usually observe ML and OSM interaction in one of three ways: (1) ML approaches are used to improve or correct OSM data, (2) instances where OSM is used as a means of training ML models for some specific task such as building segmentation, road speed estimation (Keller et al, 2020 ) or land use classification (Schultz et al. 2017 ) or (3) where the contribution patterns of OSM contributors are analysed using ML techniques as in work such as that by Jacobs and Mitchell (2020). In this submission we ask the following question. With all of the many applications and integration of ML and AI with OSM, over the past number of years, how many of these applications and approaches have been adopted or used by the OSM community? Furthermore, what are the benefits or impact of these efforts from the research community with ML and AI approaches to the OSM project and OSM community? We believe that there is significant scope for ML researchers to make impactful and helful contributions directly within OSM on problems such as tag updating and correction, added intelligence within OSM editing software, intrinsic quality analysis, etc.
### Methodology and Findings achieved
A systematic review of approximately 60 peer-reviewed academic journal and conference papers will be reported. These papers are selected on the following basis that the paper(s): (1) clearly outlines an ML or AI approach using OSM data, (2) tackle a problem known in the OSM community such as tag prediction, contribution patterns, or geometry correction. Paper metadata such as title, keywords, and abstract contents are used to select the papers. Manual checking of the papers is also undertaken to ensure that the content of each paper relates to our selection criteria. A classification of these papers will be developed based on the following set of questions:
* What are the most common ML approaches used by researchers for the three instances outlined above? For example, Learning Problems (supervised, self-supervised, reinforcement), Statistical Inference (Inductive, Deductive), etc.
* What are the most common types of problems in OSM tackled by ML approaches? For example, automated tagging, contribution pattern analysis, intrinsic quality analysis, object classification, etc.
* Are the approaches reproducible and replicable for other regions or areas within OSM? For example, is a particular ML approach limited to a specific geographical area or thematic area (such as roads, buildings, waterways, etc.) in OSM.
On this classification we then report a narrative on our findings on the benefits and impacts of these efforts to the OSM project and the OSM community. We are working on this analysis at the time of writing.
### Final Discussion of scientific contributions
As suggested by Jacobs and Mitchell (2020), ML can "contribute to the diversification and quality of available assessment methods for OSM" while Feldmeyer et al. (2020) argues that the application of ML to OSM can reveal the "untapped potential for knowledge generation" in OSM. In our work, we argue that we must not get carried away with the combination of ML and OSM purely for the sake of it. OSM, as a massive open geospatial database, is a very attractive source of (geo-)data for researchers and practitioners looking to train, benchmark and test ML approaches. Consequently, we can confidently state that, after well over a decade of reported results in this domain, researchers have produced many excellent research and knowledge outputs using the ML and OSM combination. Now we enter a phase of technological and scientific development with ML and OSM where we must ask how can all of this ML knowledge contribute effectively to the OSM database and OSM community.
Grinberger et al. (2019) argue that while efforts to establish and strengthen interaction between the research community interested working with or in OSM and the OSM community itself have generally been positive. However, opportunities exist to enhance interactions between these two communities and perhaps ML could be the catalyst for a new interaction. Based on this the scientific contribution of this work is multi-faceted. Firstly, this paper will stimulate debate about the contribution of these ML approaches to the improvement of OSM data and enhancement of the OSM community. Secondly, this work will highlight situations where these ML approaches have delivered genuinely new and novel outputs of interest to OSM in general. Finally, this work will issue the challenge to the academic community to apply ML to several interesting and open problems which are of mutual interest to both the academic and OSM community.