Temat: mining data - Prolib Integro

Skocz do pozycji: 1.

Tytuł:: KNAC : an approach for enhancing cluster analysis with background knowledge and explanations
Autorzy:: Nalepa, Grzegorz
Brzegowski, Jakub
Kuk, Michał
Bobek, Szymon
Brzychczy, Edyta
Opis:: Pattern discovery in multidimensional data sets has been the subject of research for decades. There exists a wide spectrum of clustering algorithms that can be used for this purpose. However, their practical applications share a common post-clustering phase, which concerns expert-based interpretation and analysis of the obtained results. We argue that this can be the bottleneck in the process, especially in cases where domain knowledge exists prior to clustering. Such a situation requires not only a proper analysis of automatically discovered clusters but also conformance checking with existing knowledge. In this work, we present Knowledge Augmented Clustering (KNAC). Its main goal is to confront expert-based labelling with automated clustering for the sake of updating and refining the former. Our solution is not restricted to any existing clustering algorithm. Instead, KNAC can serve as an augmentation of an arbitrary clustering algorithm, making the approach robust and a model-agnostic improvement of any state-of-the-art clustering method. We demonstrate the feasibility of our method on artificially, reproducible examples and in a real life use case scenario. In both cases, we achieved better results than classic clustering algorithms without augmentation.
Dostawca treści:: Repozytorium Uniwersytetu Jagiellońskiego

Artykuł

na półce

Skocz do pozycji: 2.

Tytuł:: Augmenting automatic clustering with expert knowledge and explanations
Autorzy:: Bobek, Szymon
Nalepa, Grzegorz
Wydawca:: Springer International Publishing
Opis:: Cluster discovery from highly-dimensional data is a challenging task, that has been studied for years in the fields of data mining and machine learning. Most of them focus on automation of the process, resulting in the clusters that once discovered have to be carefully analyzed to assign semantics for numerical labels. However, it is often the case that such an explicit, symbolic knowledge about possible clusters is available prior to clustering and can be used to enhance the learning process. More importantly, we demonstrate how a machine learning model can be used to refine the expert knowledge and extend it with an aid of explainable AI algorithms. We present our framework on an artificial, reproducible dataset.
Dostawca treści:: Repozytorium Uniwersytetu Jagiellońskiego

Inne

na półce

Skocz do pozycji: 3.

Tytuł:: Enhancing cluster analysis with explainable AI and multidimensional cluster prototypes
Autorzy:: Szelążek, Maciej
Bobek, Szymon
Nalepa, Grzegorz
Kuk, Michał
Opis:: Explainable Artificial Intelligence (XAI) aims to introduce transparency and intelligibility into the decision-making process of AI systems. Most often, its application concentrates on supervised machine learning problems such as classification and regression. Nevertheless, in the case of unsupervised algorithms like clustering, XAI can also bring satisfactory results. In most cases, such application is based on the transformation of an unsupervised clustering task into a supervised one and providing generalised global explanations or local explanations based on cluster centroids. However, in many cases, the global explanations are too coarse, while the centroid-based local explanations lose information about cluster shape and distribution. In this paper, we present a novel approach called ClAMP (Cluster Analysis with Multidimensional Prototypes) that aids experts in cluster analysis with human-readable rule-based explanations. The developed state-of-the-art explanation mechanism is based on cluster prototypes represented by multidimensional bounding boxes. This allows representing of arbitrary shaped clusters and combines the strengths of local explanations with the generality of global ones. We demonstrate and evaluate the use of our approach in a real-life industrial case study from the domain of steel manufacturing as well as on the benchmark datasets. The explanations generated with ClAMP were more precise than either centroid-based or global ones.
Dostawca treści:: Repozytorium Uniwersytetu Jagiellońskiego

Artykuł

na półce

Informacja

Wyszukujesz frazę "mining data" wg kryterium: Temat