Volume( 11) - Issue( 1) 2023 pp 1-13 DOI: /ijcn_q1_v11_no1_23_01

Leveraging Machine Learning for Tourism Analytics: A Comparative Study of Text Mining Approaches in Customer Segmentation

Title

Leveraging Machine Learning for Tourism Analytics: A Comparative Study of Text Mining Approaches in Customer Segmentation

Abstract

Tourism industry generates vast amounts of unstructured textual data through customer reviews, social media posts, and feedback platforms. This study presents a comprehensive comparative analysis of machine learning-based text mining approaches for customer segmentation in tourism analytics. We evaluate the performance of five distinct methodologies: traditional clustering algorithms (K-means, hierarchical clustering), topic modeling techniques (Latent Dirichlet Allocation, Non-negative Matrix Factorization), sentiment-based segmentation using BERT transformers, hybrid approaches combining multiple features, and deep learning models including autoencoders and neural networks. Our experimental framework utilizes a dataset of 50,000 customer reviews from major tourism platforms spanning hotels, restaurants, and attractions across multiple geographic regions. The study implements comprehensive pre-processing pipelines including text normalization, feature extraction using TF-IDF and word embeddings, and dimensionality reduction techniques. Results demonstrate that hybrid approaches combining sentiment analysis with topic modeling achieve superior segmentation accuracy (87.3%) compared to traditional methods (72.1%). The BERT-based transformer model shows exceptional performance in capturing semantic nuances, achieving 89.6% accuracy in customer categorization. Our findings reveal that machine learning-enhanced text mining significantly improves customer segmentation precision, enabling tourism businesses to develop targeted marketing strategies, personalize customer experiences, and optimize service delivery. The research contributes to tourism informatics by providing empirical evidence for ML-driven customer analytics and establishes a framework for scalable implementation in diverse tourism contexts.

Keywords

Text Mining, Customer Segmentation, Tourism Analytics, Machine Learning, Sentiment Analysis, BERT Transformers

Copyright © 2013-2026 ERES Publications