1. Science
  2. Publications
  3. Scientific Works of Kharkiv National Air Force University
  4. 2(56)'2018
  5. Structure of the knowledge base for choosing the data clustering algorithm

Structure of the knowledge base for choosing the data clustering algorithm

V. Burdaev
Annotations languages:


Description: The urgency of the classification problem without learning (clustering) for multidimensional objects of different nature is considered in the article. In clustering algorithms, the most important and least formalized is the definition of the concept of homogeneity or a measure of the closeness of objects, clusters and the quality of the partitioning of objects into groups (the objectivity of the groups obtained), which primarily determines the final result of the classification. All this indicates that the implementation of such algorithms in the form of application programs in batch mode is inefficient. Therefore, for an optimal heuristic solution of clustering tasks, the researcher must actively use the knowledge of experts on cluster analysis. The choice of this or that measure of the distance between clusters depends on the geometric figures that form objects in the space of characteristics. The application of the "nearest neighbor" distance has good clustering results when objects in the feature space form a chain. The distance of the "distant neighbor" is used when objects form ball clouds. In the case where objects form ellipsoids, it is recommended to use the distances between their centers of gravity. The problem of clustering is that for each particular type of data, the structure of the location of objects in the space of objects, you either need to choose the right algorithm correctly, either adapt it, or develop a new one. To solve this problem, experts' knowledge is widely used. The algorithm of the method of dynamic condensations for clustering heterogeneous data is proposed and adapted. Results are obtained for creating a knowledge base on cluster analysis. A knowledge base was built for the selection of algorithms: "K-intra-group means", "ISODATA", hierarchical and fuzzy clustering for different types of attribute s. Examples of rules and frames that are used by the hierarchical functional system for making a decision on the choice of the clustering algorithm in the "KARKAS" system are given. To assess and compare the quality of the partitioning into clusters, different splitting quality functionals are used: "average intra-cluster scattering", "measure of the concentration of objects corresponding to the partitioning" and their combination. The results of clustering are presented: a table of distances between the centers of clusters, a table of variances to obtain an idea of the relative arrangement of images within the cluster.The knowledge base allows the expert obtain additional information about the number, shape and compactness of clusters, the number of cluster centers and their coordinates, the distance between clusters and the dimension of "anomalous" clusters.


Keywords: cluster analysis, knowledge base, expert system

Reference:
Burdaiev, V.P. (2018), “Struktura bazy znan dlia vyboru alhorytmu klasteryzatsii danykh” [Structure of the knowledge base for choosing the data clustering algorithm], Scientific Works of Kharkiv National Air Force University, Vol. 2(56), pp. 82-88. https://doi.org/10.30748/zhups.2018.56.11.