Skip to page navigation menu Skip entire header
Brown University
Skip 13 subheader links

Two Novel Algorithms for Microphone Arrays: Detecting Facial Orientation Using a Surrounding Microphone Array and Phase-Based Binary Time-Frequency Masking using a Two-Element Microphone Array

Description

Abstract:
Talker-orientation information in large-aperture microphone array improves beamforming and source-location/tracking algorithms and allows better camera selection in a video conference situation. Chu and Warnock, 2002 have quantified the average frequency-dependent magnitude of the human speech source(source-radiation pattern) showing a front-to-back difference in magnitude that increases with frequency by about 8dB/decade reaching about 18dB at 8000Hz. These amplitude differences, while severely masked by both coherent and non-coherent noise in a real environment, are the most extractable phenomena from a talker's orientation when compared to phase differences due to the source or effects due to diffraction at the mouth. In this work, we propose a robust, source-radiation-pattern-based method for extraction of the azimuth angle of a single talker for whom an accurate point-source location estimate is known. The method requires no \textbf{a priori} training and has been tested in more than 100 situations with \textbf{real human talkers}. We compare these results against earlier published algorithms and find that the method proposed herein is significantly more robust. Isolating the speech from a single talker in a multi-talker setting using remote microphones has been a widely researched problem in the audio-signal processing community. While methods that work well in controlled environments have been published in the last thirty years, they show considerable degradation in a real-reverberant room environment. In this work, after a brief review of applicable published material, we describe a novel algorithm that relies on the binary time-frequency framework to isolate a wide-band speech signal arriving from a known source position. Our algorithm relies on the phase of the cross-power spectrum of two microphones to generate a binary-time frequency mask. To make a decision for a particular time-frequency point, we use a principle of locality in frequency. Masks generated by our algorithm are demonstrated as effective through the use of objective measures and data from a real room environment. Our performance is compared to ground truth as well as to a basic method that does not use locality. Isolating the speech from a single talker in a multi-talker setting using remote microphones has been a widely researched problem in the audio-signal processing community. While methods that work well in controlled environments have been published in the last thirty years, they show considerable degradation in a real-reverberant room environment. In this work, after a brief review of applicable published material, we describe a novel algorithm that relies on the binary time-frequency framework to isolate a wide-band speech signal arriving from a designated and known source position. Like a few others, our algorithm uses two microphones and relies on the phase of the cross-power spectrum of these to generate a binary-time frequency mask. However, to make a decision for a particular time-frequency point, we use a principle of locality in frequency to help create a better mask. Masks generated by our algorithm are demonstrated as effective through the use of objective measures and data from a real room environment. Our performance is compared to ground truth as well as to a basic method that does not use locality.
Notes:
Thesis (Ph.D. -- Brown University (2012)

Access Conditions

Rights
In Copyright
Restrictions on Use
Collection is open for research.

Citation

Levi, Avram, "Two Novel Algorithms for Microphone Arrays: Detecting Facial Orientation Using a Surrounding Microphone Array and Phase-Based Binary Time-Frequency Masking using a Two-Element Microphone Array" (2012). Electrical Sciences and Computer Engineering Theses and Dissertations. Brown Digital Repository. Brown University Library. https://doi.org/10.7301/Z0NV9GJZ

Relations

Collection: