Correlating Speech Processing in Deep Learning and Computational Neuroscience

Shefali Garg (11678)
Smith Gupta (11720)

Motivation

Speech perception and recognition has been a topic of interest from a long time. In the field of Neuroscience many theories have been proposed to explain the task of speech processing so far. With advent of Deep Learning Networks, researchers came up with many new ideas to computationally achieve the task of speech processing.

However, not much work has been done to correlate the phenomena of learning of speech in human brain to the learning process in DNN layers. A research in 2009 [1] uses Convolutional Deep Belief Network for audio classification and performs tasks like speaker identification, gender classification, phone classifications etc. We, in this project, build upon the study and use it for the task of audio digit classification and form a bridge between the learning of intermediate layers and the actual computations done in a human brain to achieve the task of speech recognition from a computational neuroscience perspective.

Related Work

Many methods and theories have been made to explain speech processing in brain. Traditional methods using MFCC features have been used. Deep learning has also shown good performance on audio and natural processing tasks such as digit classification [2] by learning feature hierarchy from unlabeled speech data [1]. Some cognitive theories have also tried to explain how the brain creates and understands language [3] [4]. Wernicke's area [5] and Broca's area [6] are considered to be two areas in brain important for speech processing.

Dataset

A version of TIDIGITS dataset will be used for implementation of digit classification.

Proposed Methodology

We will be implementing digit classification using deep learning.
The audio digit will be converted to a spectrogram with reduced dimensionality using PCA whitening.
The key features can be extracted using sliding window approach with Restricted Boltzmann Machines.
Convolutional neural network and deep belief networks will be used for training of layers and for classification. Learning of the network can be visualized in the spectrogram space.
Learning of intermediate layers will be then correlated to learning in neural layers in human brain.

References

[1]Audio Feature Extraction with Deep Belief Networks    Visit Page
[2]Neural coding of continuous speech in auditory cortex during monaural and dichotic listening, Nai Ding , Jonathan Z. Simon     Paper
[3]Sensorimotor Integration in Speech Processing: Computational Basis and Neural Organization, Gregory Hickok, John Houde, Feng Rong     Paper
[4]Wernicke's area    Wikipedia
[5]Broca's area    Wikipedia