A binarization strategy for modelling mixed data in multigroup classification

Youssef Masmoudi, Metin Turkay, Habib Chabchoub

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

This paper presents a binarization pre-processing strategy for mixed datasets. We propose that the use of binary attributes for representing nominal and integer data is beneficial for classification accuracy. We also describe a procedure to convert integer and nominal data into binary attributes. Expectation-Maximization (EM) clustering algorithms was applied to classify the values of the attributes with a wide range to use a small number of binary attributes. Once the data set is pre-processed, we use the Support Vector Machine (LibSVM) for classification. The proposed method was tested on datasets from the literature. We demonstrate the improved accuracy and efficiency of presented binarization strategy for modelling mixed and complex data in comparison to the classification of the original dataset, nominal dataset and binary dataset.

Original languageEnglish
Title of host publication2013 International Conference on Advanced Logistics and Transport, ICALT 2013
Pages347-353
Number of pages7
DOIs
StatePublished - 2013
Externally publishedYes
Event2013 International Conference on Advanced Logistics and Transport, ICALT 2013 - Sousse, Tunisia
Duration: 29 May 201331 May 2013

Publication series

Name2013 International Conference on Advanced Logistics and Transport, ICALT 2013

Conference

Conference2013 International Conference on Advanced Logistics and Transport, ICALT 2013
Country/TerritoryTunisia
CitySousse
Period29/05/1331/05/13

Keywords

  • Classification
  • Clustering of Attribute Values
  • Expectation-Maximization Algorithm (EM)
  • Feature Binarization
  • Pre-processing Data

Fingerprint

Dive into the research topics of 'A binarization strategy for modelling mixed data in multigroup classification'. Together they form a unique fingerprint.

Cite this