Korenčić, Damir; Ristov, Strahil; Šnajder, Jan (2018) Document-based Topic Coherence Measures for News Media Text. Expert Systems with Applications, 114 . pp. 357-373. ISSN 0957-4174
![]() |
PDF
- Accepted Version
- article
Restricted to Registered users only until 30 December 2020. Download (687kB) | Request a personal copy from author |
|
|
PDF
- Submitted Version
- article
Download (621kB) | Preview |
Abstract
There is a rising need for automated analysis of news text, and topic models have proven to be useful tools for this task. However, as the quality of the topics induced by topic models greatly varies, much research effort has been devoted to their automated evaluation. Recent research has focused on topic coherence as a measure of a topic’s quality. Existing topic coherence measures work by considering the semantic similarity of topic words. This makes them unfit to detect the coherence of transient topics with semantically unrelated topic words, which abound in news media texts. In this paper, we intro- duce the notion of document-based topic coherence and propose novel topic coherence measures that estimate topic coherence based on topic documents rather than topic words. We evaluate the proposed measures on two datasets containing topics manually labeled for document-based coherence, on which the proposed measures outperform a strong baseline as well as word-based coherence measures. We also demonstrate the usefulness of document-based coherence measures for automated topic discovery from news media texts.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | topic models; topic coherence; topic model evaluation; text analysis; news text; exploratory analysis |
Subjects: | TECHNICAL SCIENCES > Computing > Artificial Intelligence |
Divisions: | Division of Electronics |
Depositing User: | Damir Korenčić |
Date Deposited: | 28 Aug 2018 12:03 |
Last Modified: | 03 Dec 2019 09:44 |
URI: | http://fulir.irb.hr/id/eprint/4124 |
DOI: | 10.1016/j.eswa.2018.07.063 |
Actions (login required)
![]() |
View Item |