Data completeness: How to know what Wikidata knows?


Playlists: 'wikidatacon2017' videos starting here / audio / related events

Wikidata is a great project towards mapping structured information about the world, and exhibits a high degree of correctness. Its degree of completeness, in turn, is much less understood. Anecdotal evidence suggests that it covers many popular topics quite well, but there are few standard means that help in this assessment: At present, editors and consumers have to analyze largely on a case-by-case basis whether given information might be complete or not. This session concerns the automated assessment of the completeness of Wikidata, and consists of two parts: In the first 35 minutes I will survey techniques to assess the completeness of parts of Wikidata. I will talk about three aspects of completeness, values, properties and entities: For values, I will discuss no-value statements and predicates that talk about object counts, and the COOL-WD tool for asserting metadata. For properties, I will review mandatory properties (like P1963) and the completeness status indicator icon via Recoin and tabular views like discussed here and exemplified here. For entities, I will look at what is currently possible with the Class Browser, SQID, and what faceted browsing should hopefully make possible in the future. The second part of the session (15 minutes) shall be an open discussion, guided by the questions What kind of (anecdotal) knowledge about completeness of parts of Wikidata do participants have? What kind of structured knowledge about completeness would participants like to obtain? What tools could help towards this?