No Current Issue
Architectural Design and Performance Evaluation of Machine Learning-Based Speaker Recognition Systems
This describes an implemented speaker identification system leveraging a 1D Convolutional Neural Network (CNN). The classifier processes simulated Mel-Frequency Cepstral Coefficient (MFCC) features to distinguish between 4 unique speakers. The system circumvents real audio data acquisition by generating 80 fixed-length feature vectors (length 100), where the distinct acoustic signatures are simulated by assigning a unique mean offset to the feature distribution of each speaker. After reshaping the features for the Conv1D input and splitting the data, the defined CNN architectureβwhich includes two Conv1D layers and MaxPooling1D blocksβis trained. The model effectively demonstrates the capacity of 1D CNNs for sequence classification in biometric tasks, yielding near-perfect accuracy owing to the highly separable nature of the generated voice features.
Speech Signals, Feature Extraction, Classification, Convolutional Neural Network (CNN), Emotion recognition system (ERS), Facial Emotion Recognition (FER).
Copyright Β© 2013-2026 ERES Publications