Research Article

, 15 Aug 2025 | 10.6234610.62346/ijcn_q1_v11_no1_23_01
Year : 2023 | Volume: 11 | Issue: 1 | Pages : 1-13

Leveraging Machine Learning for Tourism Analytics: A Comparative Study of Text Mining Approaches in Customer Segmentation

Biron Gifty 1 *, Shailendra Kumar
  • 1Anna University Chennai, Department of Computer Science and Engineering, Bethlahem Institute of Engineering, Karungal, Tamilnadu, IN
Tourism industry generates vast amounts of unstructured textual data through customer reviews, social media posts, and feedback platforms. This study presents a comprehensive comparative analysis of machine learning-based text mining approaches for customer segmentation in tourism analytics. We evaluate the performance of five distinct methodologies: traditional clustering algorithms (K-means, hierarchical clustering), topic modeling techniques (Latent Dirichlet Allocation, Non-negative Matrix Factorization), sentiment-based segmentation using BERT transformers, hybrid approaches combining multiple features, and deep learning models including autoencoders and neural networks. Our experimental framework utilizes a dataset of 50,000 customer reviews from major tourism platforms spanning hotels, restaurants, and attractions across multiple geographic regions. The study implements comprehensive pre-processing pipelines including text normalization, feature extraction using TF-IDF and word embeddings, and dimensionality reduction techniques. Results demonstrate that hybrid approaches combining sentiment analysis with topic modeling achieve superior segmentation accuracy (87.3%) compared to traditional methods (72.1%). The BERT-based transformer model shows exceptional performance in capturing semantic nuances, achieving 89.6% accuracy in customer categorization. Our findings reveal that machine learning-enhanced text mining significantly improves customer segmentation precision, enabling tourism businesses to develop targeted marketing strategies, personalize customer experiences, and optimize service delivery. The research contributes to tourism informatics by providing empirical evidence for ML-driven customer analytics and establishes a framework for scalable implementation in diverse tourism contexts.

References

1.        Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993-1022.

2.        Banerjee, S., & Chua, A. Y. (2016). In search of patterns among travellers' hotel ratings in TripAdvisor. Tourism Management, 53, 125-131.

3.        Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123-140.

4.        Calinski, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics-theory and Methods, 3(1), 1-27.

5.        Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16-28.

6.        Chen, L., Zhang, D., & Mark, L. (2017). Understanding user intent in community question answering. In Proceedings of the 26th International Conference on World Wide Web (pp. 823-828).

7.        Cohen, E. (1972). Toward a sociology of international tourism. Social Research, 39(1), 164-182.

8.        Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(2), 224-227.

9.        DemΕ‘ar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1-30.

10.      Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

11.      Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2), 137-144.

12.      Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.

13.      Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157-1182.

14.      Han, J., & Kamber, M. (2006). Data mining: concepts and techniques. Morgan Kaufmann.

15.      Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference (pp. 50-57).

16.      Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Prentice-Hall.

17.      Kim, B., Kim, H., & Kim, K. (2019). A review classification algorithm for recommending reviews using deep learning. Expert Systems with Applications, 129, 204-214.

18.      Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (pp. 1137-1143).

19.      Kumar, V., Dixit, A., Javalgi, R. G., & Dass, M. (2016). Research framework, strategies, and applications of intelligent agent technologies in marketing. Journal of the Academy of Marketing Science, 44(1), 24-45.

20.      Li, J., Xu, L., Tang, L., Wang, S., & Li, L. (2018). Big data in tourism research: A literature review. Tourism Management, 68, 301-323.

21.      Marine-Roig, E., & ClavΓ©, S. A. (2015). Tourism analytics with massive user-generated content: A case study of Barcelona. Journal of Destination Marketing & Management, 4(3), 162-172.

22.      Mazanec, J. A. (1992). Classifying tourists into market segments: A neural network approach. Journal of Travel & Tourism Marketing, 1(1), 39-60.

23.      Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

24.      O'Connor, P. (2010). Managing a hotel's image on TripAdvisor. Journal of Hospitality Marketing & Management, 19(7), 754-772.

25.      Plog, S. C. (1974). Why destination areas rise and fall in popularity. Cornell Hotel and Restaurant Administration Quarterly, 14(4), 55-58.

26.      Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3), 21-45.

27.      Powers, D. M. (2011). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies, 2(1), 37-63.

28.      Rogers, A., Kovaleva, O., & Rumshisky, A. (2020). A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research, 57, 615-686.

29.      Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53-65.

30.      Sivarajah, U., Kamal, M. M., Irani, Z., & Weerakkody, V. (2017). Critical analysis of Big Data challenges and analytical methods. Journal of Business Research, 70, 263-286.

31.      Smith, V. L. (1977). Hosts and guests: The anthropology of tourism. University of Pennsylvania Press.

32.      Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427-437.


Keywords: Text Mining, Customer Segmentation, Tourism Analytics, Machine Learning, Sentiment Analysis, BERT Transformers

Citation: Biron Gifty *,Biron Gifty ( 2023), Leveraging Machine Learning for Tourism Analytics: A Comparative Study of Text Mining Approaches in Customer Segmentation. , 11(1): 1-13

Received: 03/01/2023; Accepted: 27/01/2023;
Published: 15/08/2025

Edited by:

Mr.ERES JOURNALS

Reviewed by:

Copyright: @ERES Journals.

*Correspondence: Biron Gifty , giftyshideout@gmail.com


Copyright Β© 2013-2026 ERES Publications