Temat: document - Prolib Integro

Skocz do pozycji: 1.

Tytuł:: Language independent algorithm for clustering text documents with respect to their sentiment
Autorzy:: Korzeniewski, Jerzy
Idczak, Adam
Tematy:: text mining
document sentiment
document clustering; Pokaż więcej
Wydawca:: Główny Urząd Statystyczny
Powiązania:: https://bibliotekanauki.pl/articles/59149624.pdf Link otwiera się w nowym oknie
Opis:: Determining the sentiment of a written text is an important task in text research. This task can be performed either in the supervised or unsupervised version. In this paper, we propose a novel unsupervised algorithm for documents written in any language using documents written in Polish as an example. The clustering of Polish language texts with respect to their sentiment is poorly developed in the literature on the subject. The novelty of the proposed algorithm involves the abandonment of stoplists and lemmatisation. Instead, we propose translating all documents into English and performing a two-stage document grouping. In the first step of the algorithm, selected documents are assigned to a class of positive or negative documents based on a set of lexical and grammatical rules as well as a set of keyterms. Key-terms do not have to be entered by the user, the algorithm finds them. In the second step, the remaining documents are attached to one of the classes according to the rules based on the vocabulary found in the documents grouped in the first step. The algorithm was tested on three corpora of documents and achieved very good results.
Dostawca treści:: Biblioteka Nauki

Artykuł

na półce

Skocz do pozycji: 2.

Tytuł:: Document Clustering : Concepts, Metrics and Algorithms
Autorzy:: Tarczynski, T.
Tematy:: document clustering
text mining
k-means
hierarchical clustersting
vector space model; Pokaż więcej
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Powiązania:: https://bibliotekanauki.pl/articles/226231.pdf Link otwiera się w nowym oknie
Opis:: Document clustering, which is also refered to as text clustering, is a technique of unsupervised document organisation. Text clustering is used to group documents into subsets that consist of texts that are similar to each orher. These subsets are called clusters. Document clustering algorithms are widely used in web searching engines to produce results relevant to a query. An example of practical use of those techniques are Yahoo! hierarchies of documents [1]. Another application of document clustering is browsing which is defined as searching session without well specific goal. The browsing techniques heavily relies on document clustering. In this article we examine the most important concepts related to document clustering. Besides the algorithms we present comprehensive discussion about representation of documents, calculation of similarity between documents and evaluation of clusters quality.
Dostawca treści:: Biblioteka Nauki

Artykuł

na półce

Skocz do pozycji: 3.

Tytuł:: Text Mining Based Process Identification and Business Process Mapping from Job Description Documents
Autorzy:: Astanti, Ririn Diar
Switasarra, Adelia Veneska
Tematy:: process identification
business process mapping
job description document
NLP
text mining; Pokaż więcej
Wydawca:: Polska Akademia Nauk. Czasopisma i Monografie PAN
Powiązania:: https://bibliotekanauki.pl/articles/59112229.pdf Link otwiera się w nowym oknie
Opis:: Business Process mapping (BP mapping) is important for a company to identify their activities. Previous research suggests several approaches for process identification and BP mapping, which would be easier if the company had already implemented a computer-based information system. The research presented in this paper has the purpose of providing an alternative method for BP mapping especially for the company that does not implement the computer-based information system. A proposed method is using job description documents that the company had to identify elements needed to perform BP mapping which are actor, process, document, and flow of documents. A Natural Language Process (NLP) which is text mining method is used for mining job documents to identify those elements that exist in each job position. To illustrate the applicability of the proposed method, samples of job descriptions of 15 companies are taken. It shows that the proposed method can be applied.
Dostawca treści:: Biblioteka Nauki

Artykuł

na półce

Skocz do pozycji: 4.

Tytuł:: Formalization of Technological Knowledge in the Field of Metallurgy Using Document Classification Tools Supported with Semantic Techniques
Autorzy:: Regulski, K.
Tematy:: application of information technology to the foundry industry
document classification
semantic techniques
knowledge formalization
text mining; Pokaż więcej
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Powiązania:: https://bibliotekanauki.pl/articles/353849.pdf Link otwiera się w nowym oknie
Opis:: The process of knowledge formalization is an essential part of decision support systems development. Creating a technological knowledge base in the field of metallurgy encountered problems in acquisition and codifying reusable computer artifacts based on text documents. The aim of the work was to adapt the algorithms for classification of documents and to develop a method of semantic integration of a created repository. Author used artificial intelligence tools: latent semantic indexing, rough sets, association rules learning and ontologies as a tool for integration. The developed methodology allowed for the creation of semantic knowledge base on the basis of documents in natural language in the field of metallurgy.
Dostawca treści:: Biblioteka Nauki

Artykuł

na półce

Skocz do pozycji: 5.

Tytuł:: A case study in text mining of discussion forum posts: Classification with bag of words and global vectors
Autorzy:: Cichosz, P.
Tematy:: text mining
discussion forum
text representation
document classification
word embedding
eksploracja tekstu
forum dyskusyjne
reprezentacja tekstu
klasyfikacja dokumentów; Pokaż więcej
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Powiązania:: https://bibliotekanauki.pl/articles/330299.pdf Link otwiera się w nowym oknie
Opis:: Despite the rapid growth of other types of social media, Internet discussion forums remain a highly popular communication channel and a useful source of text data for analyzing user interests and sentiments. Being suited to richer, deeper, and longer discussions than microblogging services, they particularly well reflect topics of long-term, persisting involvement and areas of specialized knowledge or experience. Discovering and characterizing such topics and areas by text mining algorithms is therefore an interesting and useful research direction. This work presents a case study in which selected classification algorithms are applied to posts from a Polish discussion forum devoted to psychoactive substances received from home-grown plants, such as hashish or marijuana. The utility of two different vector text representations is examined: the simple bag of words representation and the more refined embedded global vectors one. While the former is found to work well for the multinomial naive Bayes algorithm, the latter turns out more useful for other classification algorithms: logistic regression, SVMs, and random forests. The obtained results suggest that post-classification can be applied for measuring publication intensity of particular topics and, in the case of forums related to psychoactive substances, for monitoring the risk of drug-related crime.
Dostawca treści:: Biblioteka Nauki

Artykuł

na półce

Skocz do pozycji: 6.

Tytuł:: Przegląd zastosowań analizy text miningowej
Overview of uses text mining analysis
Autorzy:: Gładysz, A.
Tematy:: dokument tekstowy
eksploracja danych tekstowych
text mining
data mining
analiza danych tekstowych
przetwarzanie informacji
wyszukiwanie informacji
tłumaczenie automatyczne
nadmiar informacji
business intelligence
information retrieval
data processing
document similarity
machine translation
information overload; Pokaż więcej
Wydawca:: Instytut Naukowo-Wydawniczy "SPATIUM"
Powiązania:: https://bibliotekanauki.pl/articles/311433.pdf Link otwiera się w nowym oknie
Opis:: W artykule omówiona została eksploracyjna analiza danych tekstowych ze szczególnym naciskiem na zastosowania analizy text miningowej. We współczesnym świecie istnieje wiele różnych branż biznesowych w których pracownicy stykają się z nadmiarem napływających informacji. Rozwój społeczeństwa informacyjnego oraz technologii informatycznych pociągnął za sobą w sposób naturalny powstanie zautomatyzowanych systemów wspomagających wyszukiwanie i porządkowanie informacji. Techniki text miningu znajdują coraz większe zastosowanie, zaś szeroki przegląd zastosowań wraz ze wskazaniem praktycznym możliwości zastosowania analizy text miningowej został dogłębnie omówiony w artykule.
The article discussed the text mining with particular emphasis on the use of text mining analysis. In the modern world there are many different business industries where workers are in contact with an excess of incoming information. The development of the information society and information technology entailed a natural rise of automated systems to support search and organize information. Text mining techniques are increasingly applied, and a broad overview of applications, together with an indication of the practical possibilities of the use of text mining analysis has been thoroughly discussed in the article.
Dostawca treści:: Biblioteka Nauki

Artykuł

na półce

Skocz do pozycji: 7.

Tytuł:: New algorithm for determining the number of features for the effective sentiment-classification of text documents
Nowy algorytm ustalania liczby zmiennych potrzebnych do klasyfikacji dokumentów tekstowych ze względu na ich wydźwięk emocjonalny
Autorzy:: Idczak, Adam
Korzeniewski, Jerzy
Tematy:: sentiment analysis
document sentiment classification
text mining
logistic regression
naive Bayes classifier
feature selection
correlation
analiza sentymentu
klasyfikacja dokumentów ze względu na wydźwięk emocjonalny
eksploracja tekstu
regresja logistyczna
naiwny klasyfikator Bayesa
dobór cech
korelacja; Pokaż więcej
Wydawca:: Główny Urząd Statystyczny
Powiązania:: https://bibliotekanauki.pl/articles/18105028.pdf Link otwiera się w nowym oknie
Opis:: Sentiment analysis of text documents is a very important part of contemporary text mining. The purpose of this article is to present a new technique of text sentiment analysis which can be used with any type of a document-sentiment-classification method. The proposed technique involves feature selection independently of a classifier, which reduces the size of the feature space. Its advantages include intuitiveness and computational noncomplexity. The most important element of the proposed technique is a novel algorithm for the determination of the number of features to be selected sufficient for the effective classification. The algorithm is based on the analysis of the correlation between single features and document labels. A statistical approach, featuring a naive Bayes classifier and logistic regression, was employed to verify the usefulness of the proposed technique. They were applied to three document sets composed of 1,169 opinions of bank clients, obtained in 2020 from a Poland-based bank. The documents were written in Polish. The research demonstrated that reducing the number of terms over 10-fold by means of the proposed algorithm in most cases improves the effectiveness of classification.
Analiza sentymentu, czyli wydźwięku emocjonalnego, dokumentów tekstowych stanowi bardzo ważną część współczesnej eksploracji tekstu (ang. text mining). Celem artykułu jest przedstawienie nowej techniki analizy sentymentu tekstu, która może znaleźć zastosowanie w dowolnej metodzie klasyfikacji dokumentów ze względu na ich wydźwięk emocjonalny. Proponowana technika polega na niezależnym od klasyfikatora doborze cech, co skutkuje zmniejszeniem rozmiaru ich przestrzeni. Zaletami tej propozycji są intuicyjność i prostota obliczeniowa. Zasadniczym elementem omawianej techniki jest nowatorski algorytm ustalania liczby terminów wystarczających do efektywnej klasyfikacji, który opiera się na analizie korelacji pomiędzy pojedynczymi cechami dokumentów a ich wydźwiękiem. W celu weryfikacji przydatności proponowanej techniki zastosowano podejście statystyczne. Wykorzystano dwie metody: naiwny klasyfikator Bayesa i regresję logistyczną. Za ich pomocą zbadano trzy zbiory dokumentów składające się z 1169 opinii klientów jednego z banków działających na terenie Polski uzyskanych w 2020 r. Dokumenty zostały napisane w języku polskim. Badanie pokazało, że kilkunastokrotne zmniejszenie liczby terminów przy zastosowaniu proponowanej techniki na ogół poprawia jakość klasyfikacji.
Dostawca treści:: Biblioteka Nauki

Artykuł

na półce

Skocz do pozycji: 8.

Tytuł:: Analysis of methods and means of text mining
Autorzy:: Rybchak, Z.
Basystiuk, O.
Tematy:: text mining
text analytics
data analysis
high-quality information
text categorization
text clustering
document summarization
sentiment analysis
sieć językowa
analiza tekstu
analiza danych
wysoka jakość informacji
klasyfikacja tekstowa
kategoryzacja tekstowa
grupowanie tekstu
streszczenie dokumentów tekstowych
technika sentiment analysis; Pokaż więcej
Wydawca:: Polska Akademia Nauk. Oddział w Lublinie PAN
Powiązania:: https://bibliotekanauki.pl/articles/411072.pdf Link otwiera się w nowym oknie
Opis:: In Big Data era when data volume doubled every year analyzing of all this data become really complicated task, so in this case text mining systems, techniques and tools become main instrument of analyzing tones and tones of information, selecting that information that suit the best for your needs and just help save your time for more interesting thing. The main aims of this article are explain basic principles of this field and overview some interesting technologies that nowadays are widely used in text mining.
Dostawca treści:: Biblioteka Nauki

Artykuł

na półce

Informacja

Wyszukujesz frazę "document" wg kryterium: Temat