Please use this identifier to cite or link to this item:
University: Manav Rachna International University
Completed Date: 2018
Abstract: Data Mining is a process of drawing out useful patterns or knowledge from the huge data collected in information systems and to use these patterns in taking safe and smart decisions. The predefined methods and algorithms that are used to extract these useful patterns are called Data Mining Techniques. newlineClustering is a data mining technique of dividing the given dataset into groups or clusters such that the objects in one group are more similar to each other than the objects in the other group. Many clustering algorithms have been proposed in the literature. These clustering algorithms are broadly classified into two categories, Hierarchical and Partitional. The newlineK-Means algorithm is one of the commonly used techniques in the Partitional category. newlineK-Means is a simple algorithm known for its speed. The algorithm is inexpensive in terms of computational cost and works well with high dimensional and large datasets. However, there exist some limitations of this algorithm. One major limitation is the requirement to specify a pre-defined value of number of clusters (K) as input. Providing value of K is domain specific. Sometimes it is difficult to predict the number of clusters required in advance as the dataset is unknown or new and in that case inefficient grouping of data may emerge. These limitations of K-Means are carried forward to its extensions K-Modes and K-Prototype. Various extensions of K-Means for numerical, categorical and mixed datasets to overcome the limitation of providing K as input have been proposed in the literature but these algorithms either require some input parameter other than K or they are computationally complex. newlineThe K-Modes, an extension of the K-Means algorithm for categorical data, is an algorithm famous for its simplicity and speed. Since K-Modes algorithm is used for categorical data, Simple Matching Dissimilarity measure is used instead of Euclidean distance and the Modes of clusters are used instead of Means .
Appears in Departments:Department of Computer Science Engineering

Files in This Item:
File Description SizeFormat 
01_thesis front page.pdfAttached File38.16 kBAdobe PDFView/Open
02_thesis declaration.pdf46.76 kBAdobe PDFView/Open
03_certificate.pdf61.52 kBAdobe PDFView/Open
04_acknowledgement.pdf6.62 kBAdobe PDFView/Open
05_list of publications.pdf51.97 kBAdobe PDFView/Open
06_abstract.pdf55.33 kBAdobe PDFView/Open
07_table of contents.pdf31.78 kBAdobe PDFView/Open
08_list of tables.pdf116.63 kBAdobe PDFView/Open
09_list of figures.pdf14.77 kBAdobe PDFView/Open
10_chapter1.pdf548.19 kBAdobe PDFView/Open
11_chapter 2.pdf189.72 kBAdobe PDFView/Open
12_chapter 3.pdf789.16 kBAdobe PDFView/Open
13_chapter 4.pdf624.77 kBAdobe PDFView/Open
14_chapter 5.pdf578.22 kBAdobe PDFView/Open
15_chapter 6.pdf62.98 kBAdobe PDFView/Open
16_references.pdf245.25 kBAdobe PDFView/Open
17_appendix a.pdf226.22 kBAdobe PDFView/Open
18_brief profile of scholar.pdf4.32 kBAdobe PDFView/Open

Items in Shodhganga are protected by copyright, with all rights reserved, unless otherwise indicated.