Refining Wikidata Taxonomy using Large Language Models

Abstract

Due to its collaborative nature, Wikidata is known to have a com- plex taxonomy, with recurrent issues like the ambiguity between instances and classes, the inaccuracy of some taxonomic paths, the presence of cycles, and the high level of redundancy across classes. Manual efforts to clean up this taxonomy are time-consuming and prone to errors or subjective decisions. We present WiKC, a new version of Wikidata taxonomy cleaned automatically using a com- bination of Large Language Models (LLMs) and graph mining tech- niques. Operations on the taxonomy, such as cutting links or merg- ing classes, are performed with the help of zero-shot prompting on an open-source LLM. The quality of the refined taxonomy is eval- uated from both intrinsic and extrinsic perspectives, on a task of entity typing for the latter, showing the practical interest of WiKC..

Publication
In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM), 2024
Yiwen Peng
Yiwen Peng
Ph.D. Student in Knowledge Graphs

My research focuses on knowledge graphs, with a particular focus on knowledge integration and completion using language models.