Decoding stress with computer vision-based approach using audio signals for psychological event identification during COVID-19

Kumar, Ankit, Godse, Snehal, Kolekar, Sagar, Saini, Dilip Kumar Jang Bahadur, Pandita, Deepak and Tiwari, Pulkit (2024) Decoding stress with computer vision-based approach using audio signals for psychological event identification during COVID-19. Journal of Electrical Systems, 20 (2). pp. 2716-2727. ISSN 1112-5209

[thumbnail of PULKIT_JGBS_RESEARCH PAPER.pdf] Text
PULKIT_JGBS_RESEARCH PAPER.pdf - Published Version
Available under License Creative Commons Attribution No Derivatives.

Download (554kB)


Interpreting psychological events can be costly and quite complex. It is simple to translate such experiences into a person's spoken and nonverbal cues. The suggested model investigates a computer vision-based method for using an individual's audio signal to identify stressful psychological events. Different people's input speech signals are recorded and compared to the common questionnaire. A series of inquiries pertaining to the second stage of COVID-19 events are included in the questionnaire set. Through additional processing, these speech signals are converted into frequency components by means of the Fast Fourier transformation (FFT) method. A long short-term memory module processes each frequency component and produces temporal information from each frequency band. The features of speech signals are extracted into the temporal frames by this module. The VGG 16 algorithm is used to further classify each temporal frame into stress and un-stress classes. A classifier with 16 layers of architecture is called VGG 16. A feed-forward convolutional neural network called VGG 16 is used to divide the vast array of speech signal features into classes: stressed and unstressed. The proposed model attempts to recognize speech signals as stress indicators. A standard set of questionnaires with a series of interrogation-style questions has been used to develop the stress symptoms in an individual's mind. The audio signals generated by each person's responses are recorded and subsequently analyzed for stress and un-stress classes. The proposed model was able to identify stress in speech signals with 98% accuracy. The time and cost implications of the suggested model are relevant. Medical research is typically costly and time-consuming.LSTM; VGG 16; CNN model; data preprocessing; speech signal.

Item Type: Article
Keywords: LSTM | VGG 16 | CNN model | Data preprocessing | Speech signal.
Subjects: Physical, Life and Health Sciences > Computer Science
Social Sciences and humanities > Psychology > General Psychology
JGU School/Centre: Jindal Global Business School
Depositing User: Subhajit Bhattacharjee
Date Deposited: 14 May 2024 15:23
Last Modified: 14 May 2024 15:23
Official URL:


Downloads per month over past year

Actions (login required)

View Item
View Item