- Tytuł:
- Language independent algorithm for clustering text documents with respect to their sentiment
- Autorzy:
-
Korzeniewski, Jerzy
Idczak, Adam - Tematy:
-
text mining
document sentiment
document clustering - Pokaż więcej
- Wydawca:
- Główny Urząd Statystyczny
- Powiązania:
- https://bibliotekanauki.pl/articles/59149624.pdf  Link otwiera się w nowym oknie
- Opis:
- Determining the sentiment of a written text is an important task in text research. This task can be performed either in the supervised or unsupervised version. In this paper, we propose a novel unsupervised algorithm for documents written in any language using documents written in Polish as an example. The clustering of Polish language texts with respect to their sentiment is poorly developed in the literature on the subject. The novelty of the proposed algorithm involves the abandonment of stoplists and lemmatisation. Instead, we propose translating all documents into English and performing a two-stage document grouping. In the first step of the algorithm, selected documents are assigned to a class of positive or negative documents based on a set of lexical and grammatical rules as well as a set of keyterms. Key-terms do not have to be entered by the user, the algorithm finds them. In the second step, the remaining documents are attached to one of the classes according to the rules based on the vocabulary found in the documents grouped in the first step. The algorithm was tested on three corpora of documents and achieved very good results.
- Dostawca treści:
- Biblioteka Nauki
Artykuł