• Home
  • About

GIScience News Blog

News of Heidelberg University’s GIScience Research Group.

Feed on
Posts
Comments
« The role of data in transformations to sustainability: a critical research agenda
Revamped openrouteservice client for disaster management »

Analysing the Impact of Large Data Imports in OpenStreetMap

Aug 6th, 2021 by GIScienceHD

OpenStreetMap (OSM) is a global mapping project which generates free geographical information through a community of volunteers. OSM is used in a variety of applications and for research purposes. However, it is also possible to import external data sets to OpenStreetMap. The opinions about these data imports are divergent among researchers and contributors, and the subject is constantly discussed. The question of whether importing data, especially large quantities, is adding value to OSM or compromising the progress of the project needs to be investigated more deeply. For a recent study by Witt et al. published Open Access, OSM’s historical data were used to compute metrics about the developments of the contributors and OSM data during large data imports which were for the Netherlands and India. Additionally, one time period per study area during which there was no large data import was investigated to compare results. For making statements about the impacts of large data imports in OSM, the metrics were analysed using different techniques (cross-correlation and changepoint detection). It was found that the contributor activity increased during large data imports. Additionally, contributors who were already active before a large import were more likely to contribute to OSM after said import than contributors who made their first contributions during the large data import. The results show the difficulty of interpreting a heterogeneous data source, such as OSM, and the complexity of the project. Limitations and challenges which were encountered are explained, and future directions for continuing in this field of research are given.

In this study, only the time period was investigated where approximately 80% of the data were added. Therefore, potential influences and impacts outside the observation periods have not been considered.
For the computation of the contributor activity, all contribution types (i.e., creation, deletion, tag changes and geometry changes) were included to get a general understanding of the number of active users per timestamp. As a consequence, contributors who were deleting data were weighted as much as users who were creating or updating elements.
The changepoint detection was used to compute the most significant changepoint in the observation period. For most of the imports, this approach provided useful results which showed the distinct changes of the contributor activity during or after the import. However, for imports such as the 3dShapes import or the BAG import, the development of the user activity was more complex. During the 3dShapes import, the contributor activity was increasing with the start of the import, and dropped down after most of the data were imported. Then, the contributor activity increased again. A similar pattern could be seen during the BAG import, where the contributor activity increased throughout the conduction of the import but decreased afterwards. By computing only one changepoint, these processes were simplified. Additionally, other OSM events could have influenced the changepoint detection. However, this problem is omnipresent when working with OSM data.
The algorithm for finding peaks in the development of contribution types used a multiple of the mean as the detection criterion. Therefore, other OSM events which happened in the same time period might have been detected as a peak, even though they were maybe not related to the import. Moreover, also peaks that happened before an import but within the observation period were counted. Additionally, more analysis is needed to investigate the peaks in more detail to ensure that the peaks are directly related to the import.
For the import, dedicated user accounts have to be used for importing data into OSM. These accounts were included in the results. Moreover, the differences in the total number of contributors who were involved in the imports (e.g., AND import in India with 18 contributors; AND import in the Netherlands with 108 contributors) have to be considered when evaluating and comparing results.
Furthermore, this study did not distinguish between different types of large imports (e.g., automatically or manually conducted imports, or a combination of both). The quantity of imported data was the only criterion for the selection.
This study presented a first approach for getting a deeper understanding about the impact of large data imports in OpenStreetMap by investigating large data imports in the Netherlands and India.
The results were manifold. It was found that for most of the large data imports which were analysed, the contributor activity increased during or after the conduction of the import. Looking at the imports in the early stage of OSM, especially the AND imports in 2007 and 2008, one can see that significantly more contributors were active than before. Imports which happened at a later stage did not show such a strong impact. During the BAG import in the Netherlands and the building import in India, the number of contributors increased. However, after most of the data were imported, the contributor activity slightly decreased again. Nonetheless, one can see that during a large data import the number of unique active contributors rose.
The analysis of the contributor engagement pointed out that the majority of users who were involved in an import were import-inspired; i.e., their first contributions happened during an import. Again, this finding supports the argument that with large data imports, more contributors were actively joining the project. However, mappers who were active beforehand were more likely to keep contributing in the time after the import was concluded. Therefore the study showed that already activate mappers were not driven away from the project. This study did not differentiate between dedicated user accounts which were created only for importing data and regular user accounts which need to be considered when reasoning about the findings.

Regarding the contribution patterns and the development of tag keys, no specific impact of large data imports was found in this study. The number of unique tag keys increased as the number of elements increased, given that external information was mapped to OSM tags. More research is needed to understand how the community is changing OSM data after a large data import.

The study considered the impact of large data imports from a data perspective on a small subset of imports that were conducted. For future research, the analysis of different data imports might also incorporate other aspects of OSM—for example, community events or mapping events and how they are related to imports. The investigation of automated processes, e.g., scripts or bots, could lead to better understanding about how large chunks of imported data are changed. Moreover, the phase of OSM in which an import is conducted could be analysed more thoroughly. This might help to understand if an import could be performed to also support the establishment or growth of a community in a specific region. Additionally, in that regard, the effect of the media or the OSM community creating awareness about data donations and respective data imports needs to be investigated. Additionally, the analysis of OSM contributors could be extended, for example, by considering the locations of contributors who are involved in an import process. Emerging spatial patterns could help to understand how local communities are developing during an import. The attributes of imported elements and how they are evolving over time could be analysed with a focus on the semantics of the data.

Witt, R.; Loos, L.; Zipf, A. (2021): Analysing the Impact of Large Data Imports in OpenStreetMap. ISPRS Int. J. Geo-Inf. 2021, 10, 528. https://doi.org/10.3390/ijgi10080528

https://ohsome.org

Selected earlier & related work:

  • Raifer, Martin; Troilo, Rafael; Kowatsch, Fabian; Auer, Michael; Loos, Lukas; Marx, Sabrina; Przybill, Katharina; Fendrich, Sascha; Mocnik, Franz-Benjamin; Zipf, Alexander (2019): OSHDB: a framework for spatio-temporal analysis of OpenStreetMap history data. Open Geospatial Data, Software and Standards.
  • Herfort, B., Lautenbach, S., Porto de Albuquerque, J., Anderson, J., Zipf, A. (2021): The evolution of humanitarian mapping within the OpenStreetMap community. Scientific Reports 11, 3037 (2021). DOI: 10.1038/s41598-021-82404-z
  • Fritz, O., Auer, M., Zipf, A. (2021). Entwicklung eines Regressionsmodells für die Vollständigkeitsanalyse des globalen OpenStreetMap-Datenbestands an Nahverkehrs-Busstrecken. AGIT ‒ Journal Für Angewandte Geoinformatik. 7-2021
  • Roick, O., Hagenauer, J., & Zipf, A. (2011). OSMatrix—Grid based analysis and visualization of OpenStreetMap. State of the Map EU, Wien.
  • Jokar Arsanjani, J., Mooney, P., Helbich, M., Zipf, A., (2015): An exploration of future patterns of the contributions to OpenStreetMap and development of a Contribution Index, Transactions in GIS, 19(6): 896–914. John Wiley & Sons. DOI: 10.1111/tgis.12139.
  • Grinberger, A.Y., Schott, M., Raifer, M., Zipf, A. (2021): An analysis of the spatial and temporal distribution of large‐scale data production events in OpenStreetMap. Transactions in GIS. 2021; 00: 1– 20. https://doi.org/10.1111/tgis.12746
  • Schott, M., Grinberger, A.Y., Lautenbach, S., Zipf, A. (2021): The Impact of Community Happenings in OpenStreetMap — Establishing a Framework for Online Community Member Activity Analyses. ISPRS Int. J. Geo-Inf. 2021, 10, 164. https://doi.org/10.3390/ijgi10030164
  • Auer, M.; Eckle, M.; Fendrich, S.; Griesbaum, L.; Kowatsch, F.; Marx, S.; Raifer, M.; Schott, M.; Troilo, R.; Zipf, A. (2018): Towards Using the Potential of OpenStreetMap History for Disaster Activation Monitoring. ISCRAM 2018. Rochester. NY. US.
  • Barron, C., Neis, P. & Zipf, A. (2013): A Comprehensive Framework for Intrinsic OpenStreetMap Quality Analysis. , Transactions in GIS, DOI: 10.1111/tgis.12073.
  • Ludwig, C.; Fendrich, S.; Zipf, A. (2020): Regional variations of context‐based association rules in OpenStreetMap. Transactions in GIS. Wiley. https://doi.org/10.1111/tgis.12694
  • Ballatore, A. and Zipf, A. (2015): A Conceptual Quality Framework for Volunteered Geographic Information. COSIT - CONFERENCE ON SPATIAL INFORMATION THEORY XII. 2015. Santa Fe, New Mexico, USA. Lecture Notes in Computer Science, pp. 1-20.
  • Yang, A., H. Fan, N. Jing, Y. Sun, A. Zipf (2016): Temporal Analysis on Contribution Inequality in OpenStreetMap: A Comparative Study for Four Countries. ISPRS Int. Journal of Geo-Information. 5(1), 5.
  • Li, H., Herfort, B., Zipf, A. (2019): Estimating OpenStreetMap Missing Built-up Areas using Pre-trained Deep Neural Networks. 22nd AGILE Conf. on Geographic Information Science, Limassol, Cyprus.
  • Wu, Zhaoyan, Li, Hao, & Zipf, Alexander. (2020). From Historical OpenStreetMap data to customized training samples for geospatial machine learning. In proceedings of the Academic Track at the State of the Map 2020 Online Conference, July 4-5 2020. DOI: http://doi.org/10.5281/zenodo.3923040

Tags: become-ohsome, data quality, intrinsic quality analysis, ohsome, Online Community, OpenStreetMap, OSM

Posted in OSM, Publications, Research

Comments are closed.

  • About

    GIScience News Blog
    News of Heidelberg University’s GIScience Research Group.
    There are 1,642 Posts and 0 Comments so far.

  • Meta

    • Log in
    • Entries RSS
    • Comments RSS
    • WordPress.org
  • Recent Posts

    • Understanding spatiotemporal trip purposes of urban micro-mobility from the lens of dockless e-scooter sharing
    • Audiobeitrag: Das Heidelberg Institute for Geoinformation Technology (HeiGIT) im Campus Radio
    • 3DGeo contributions to ISPRS Congress 2022 now online
    • Recent feature additions to Ohsome Quality analysT
    • Job Offer: Senior Science Manager — Innovation & Research Manager GIScience (m, f, d), 100%, permanent
  • Tags

    3D 3DGEO Big Spatial Data CAP4Access Citizen Science Climate Change Conference crisis mapping Crowdsourcing data quality deep learning disaster DisasterMapping GeoNet.MRN GIScience heigit HOT humanitarian humanitarian mapping Humanitarian OpenStreetMap team intrinsic quality analysis landuse laser scanning Lidar machine-learning Mapathon MapSwipe MissingMaps Missing Maps ohsome ohsome example Open data openrouteservice OpenStreetMap OSM OSM History Analytics Quality quality analysis remote sensing routing social media spatial analysis Teaching VGI Workshop
  • Archives

    • June 2022
    • May 2022
    • April 2022
    • March 2022
    • February 2022
    • January 2022
    • December 2021
    • November 2021
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
    • October 2014
    • September 2014
    • August 2014
    • July 2014
    • June 2014
    • May 2014
    • April 2014
    • March 2014
    • February 2014
    • January 2014
    • December 2013
    • November 2013
    • October 2013
    • September 2013
    • August 2013
    • July 2013
    • June 2013
    • May 2013
    • April 2013
  •  

    August 2021
    M T W T F S S
    « Jul   Sep »
     1
    2345678
    9101112131415
    16171819202122
    23242526272829
    3031  
  • Recent Comments

    GIScience News Blog CC by-nc-sa Some Rights Reserved.

    Free WordPress Themes | Fresh WordPress Themes