Implementation of Topological Data Analysis and Support Vector Machine for MNIST Dataset Classification
Nur Nilam Sari and Intan Muchtadi-Alamsyah

Faculty of Mathematics and Natural Sciences, Institut Teknologi Bandung, Bandung, Indonesia gain Later


Abstract

The advancement of information technology and artificial intelligence has fostered innovation in pattern recognition, particularly on the MNIST dataset, a classic collection of handwritten digits. MNIST comprises two main components: image data X and labels y. This research focuses on exploring the application of topological data analysis concepts, specifically through persistence barcode analysis. Furthermore, the classification process employs machine learning techniques, specifically the support vector machine with a Radial Basis Function (RBF) kernel. Each digit in the MNIST dataset is represented as a 28x28 matrix, with matrix elements ranging from 1 to 255. The preprocessing steps include converting grayscale matrices to binary, skeletonization using the Zhang-Suen thinning method, forming embedded graphs, determining filtration values, and constructing persistence barcodes. Features are extracted from the persistence barcodes using the Adcock-Carlsson Coordinates method. To enhance accuracy, each image in the MNIST dataset undergoes four rotations (north, south, west, east), resulting in 32 extracted features per image. These features serve as inputs for the classification algorithm. The MNIST dataset is divided into training data (80\%\, 56,000 samples) and test data (20\%\, 14,000 samples). The chosen parameters include a gamma value of -0.006551285568595509- and a C value of -138.94954943731375-. Through these processes, the achieved accuracy on the test data reaches 70\%\.

Keywords: MNIST dataset, persistence barcode, feature extraction, support vector machine

Topic: Minisymposia Geometry and Topology

ICONMAA 2024 Conference | Conference Management System