Audio Classification - Feature Dimensional Analysis

Onasoga O.A.; Yusoff, N; Harun N.H.

Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/1996

DC Field	Value	Language
dc.contributor.author	Onasoga O.A.	en_US
dc.contributor.author	Yusoff, N	en_US
dc.contributor.author	Harun N.H.	en_US
dc.date.accessioned	2021-12-15T01:47:25Z	-
dc.date.available	2021-12-15T01:47:25Z	-
dc.date.issued	2021	-
dc.identifier.isbn	978-303069220-9	-
dc.identifier.issn	23673370	-
dc.identifier.uri	http://hdl.handle.net/123456789/1996	-
dc.description	Scopus	en_US
dc.description.abstract	An audio signal is an analogue signal representation in one-dimensional function x(t) with t the continual variable depicting time. Such signals, generated from diverse sources, can be discerned as music, speech, noise or any combination. For machines to understand, these audio signals must be represented such as the extraction of its features which are representations of the composition of the audio signal and behavior over time. Audio feature extraction can enhance the efficacy of audio processing and hence a benefit for numerous applications. We are presenting an emotion classification analysis with reference to audio representation (1 Dimensional and 2 Dimensional) with focus on audio recordings obtainable in Ryerson Audio-Visual Database of Emotion Speech and Song (RAVDESS) dataset, classification is based on eight (8) different emotions. We scrutinize the accuracy evaluation metric on the average of five (5) iterations for each audio signal (raw audio, normalized raw audio and spectrogram) representation. This presents the extraction of features in 1D and 2D as input using the Convolutional Neutral Network (CNN). A Variance of analysis (ANOVA - single factor) analysis was done to test the hypotheses on obtained accuracy values to show significance between the different audio signal representations of the dataset. Results obtained on F-ratio is greater than the critical F-ratio hence this value lies in the critical region. Thus, a shred of evidence that at 0.05 significance level, the true mean of the varied dataset does differ.	en_US
dc.language.iso	en	en_US
dc.publisher	Springer Science and Business Media Deutschland GmbH	en_US
dc.subject	ANOVA	en_US
dc.subject	Deep learning	en_US
dc.subject	Emotion detection	en_US
dc.subject	Feature extraction	en_US
dc.subject	RAVDESS	en_US
dc.title	Audio Classification - Feature Dimensional Analysis	en_US
dc.type	National	en_US
dc.relation.conference	Lecture Notes in Networks and Systems	en_US
dc.identifier.doi	10.1007/978-3-030-69221-6_59	-
dc.description.page	775 - 788	en_US
dc.volume	194	en_US
dc.relation.seminar	International Conference on Business and Technology, ICBT 2020	en_US
dc.date.seminarstartdate	2020-11-14	-
dc.date.seminarenddate	2020-11-15	-
dc.description.placeofseminar	Istanbul	en_US
dc.description.type	Indexed Proceedings	en_US
item.languageiso639-1	en	-
item.grantfulltext	none	-
item.openairetype	National	-
item.fulltext	No Fulltext	-
crisitem.author.dept	Universiti Malaysia Kelantan	-
crisitem.author.orcid	0000-0003-2703-2531	-
Appears in Collections:	Faculty of Data Science and Computing - Proceedings

Show simple item record

Google Scholar^TM

Check

Google Scholar^TM

Altmetric

Altmetric

Google ScholarTM

Altmetric

Altmetric

Google Scholar^TM