Data quality panel

Claudia Müller-Birn, Lucas Werkmeister, Jose Emilio Labra Gayo, Cristina Sarasua and Andra Waagmeester

Playlists: 'wikidatacon2019' videos starting here / audio / related events

Each of the panel members gets 8 minutes for presenting the proposed topic. After all the talks, we have a little panel discussion with all presenters of about 15 minutes. During the panel discussion, the audience is invited to ask questions, share ideas, or connect to other initiatives.
Panel members:

Lucas Werkmeister (Overview of Data Quality Tools on Wikidata)
Abstract: To fight vandalism, mistakes, and other data quality issues on Wikidata, a variety of tools are available. This section gives a brief overview of what is already available to users, as well as an outlook of what’s coming in the future.
Jose Emilio Labra Gayo (University of Oviedo (Spain), Username: Jelabra). (Schema visualization and authoring tools)
Abstract: Historically, data sharing requires data modelling, which implies data structures consensus and dissemination. The UML community has addressed this graphically with languages that rely principally in graphical representations and only secondarily on exchange formats. It focuses on a conceptual model, leaving the data binding as a matter separate from modelling. When modelling with ShEx schemas, we get a physical representation for free. This eliminates an opaque layer from the modelling pipeline, one which trivially introduced interop problems. Wikidata users need the simplest tools possible in order to create coherent and complementary schemas and disseminate these schemas to a non-expert (at least vis-a-vis schema languages) community of contributors and users. This talk focuses on existing tools for expressing graph data, in particular, graphical tools which steepen the learning curve of the user to the extent that they aren’t even aware that they are learning a new system. We will also explore how these tools can be integrated in Wikidata toolchain both for visualizing and authoring schemas.
Cristina Sarasua (University of Zurich (Switzerland), Username:Criscod). Topics to be discussed: Link Quality, data quality dimensions in existing tools.
Abstract: Building on prior experiences from our workshop, we would like to discuss with the community existing challenges and opportunities in the field of data quality monitoring and assurance by focusing on Wikidata’s unique characteristics: its central role in a network of knowledge bases and other peer production projects (like Wikipedia), its ability to host plural statements and illustrate the Web’s misinformation, its multilinguality, its community of humans and machines, as well as its dynamicity. Our discussion is guided by the following questions: What are suitable data quality dimensions and measures in the context of Wikidata that are not yet (fully) addressed? What are needed methods and tools for Wikidata’s editors to edit, maintain and consume data?
Andra Waagmeester (Using automation pipelines to maintain quality of Wikidata content after successful integration)
Abstract: In the Gene Wiki project we maintain a family of bots to synchronise public scientific databases in the life sciences on Wikidata. Using the Wikidata integrator, which is a python pipeline developed for Wikidata data ingestion, these bots continuously monitor for changes in the primary sources of the data added to Wikidata and log these changes. New updates are processed and where possible changelogs are reported back to the curator teams of the primary sources. We use an automation server called jenkins to run and maintain these processes and would like to present this sollution.