Bio-Inspired Broadband Sonar: Methods for Acoustical
Analysis of Bat Echolocation and Computational Modeling
                   of Biosonar Signal Processing


                                      By
                              Jason E. Gaudette
                 M.S., University of Rhode Island, May 2005
               B.S., Worcester Polytechnic Institute, May 2003


Submitted in partial fulfillment of the requirements for the degree of Doctor of
  Philosophy in the Center for Biomedical Engineering at Brown University


                           Providence, Rhode Island
                                   May 2014
© Copyright 2014 by Jason E. Gaudette
   This dissertation by Jason E. Gaudette is accepted in its present form
          by the Center for Biomedical Engineering as satisfying the
       dissertation requirement for the degree of Doctor of Philosophy.


Date
                                   James A. Simmons, Advisor


                   Recommended to the Graduate Council


Date
                                   Elie L. Bienenstock, Reader


Date
                                   Rodney J. Clifton, Reader


Date
                                   Diane Hoffman-Kim, Reader


Date
                                   Sherief Reda, Reader


Date
                                   John R. Buck, External Reader


                     Approved by the Graduate Council


Date
                                   Peter M. Weber, Dean of the Graduate School


                                      iii
Curriculum Vitae
Jason E. Gaudette was born on October 9th , 1980 and raised with his younger sister
Renee in Raynham, Massachussets to Edward and Mary Gaudette. Graduating from
Bridgewater-Raynham High School in 1999, he continued on to Worcester Polytech-
nic Institute to pursue a degree in Electrical Engineering. While an undergraduate
Jason studied abroad on three occasions in Madrid, Spain; San Juan, Puerto Rico;
and Limerick, Ireland. He received his Bachelor of Science in 2003 with distinction, a
concentration in Computer Engineering, and a minor in International Studies. Imme-
diately following graduation, Jason began his career at the Naval Undersea Warfare
Center in Newport, RI as an Electrical Engineer. He enrolled in the graduate program
at the University of Rhode Island in the Fall of 2003 and graduated in 2005 to obtain
the Master of Science in Electrical Engineering. Soon thereafter, Jason married his
wife Elena and had two children, Lucas and Alexander, born in 2006 and 2008. In
the Fall of 2008 Jason enrolled in the Biomedical Engineering program at Brown Uni-
versity. Working with his advisor, Prof. James A. Simmons, Jason has been part of a
highly interdisciplinary team of researchers studying bat echolocation. As an active
member of this laboratory, Jason has co-authored several peer-reviewed journal arti-
cles, conference proceedings and abstracts, invited presentations, numerous research
proposals, and a technical patent.


                                          iv
                         Jason E. Gaudette
                             jason.e.gaudette@navy.mil

                         Naval Undersea Warfare Center
                               1176 Howell Street
                               Newport, RI 02841


Professional Experience
Naval Undersea Warfare Center, Newport, RI                        2003 – present
Electrical Engineer and Research Scientist
  • Lead engineer for electronics design and acoustic signal processing on various
    sonar programs, including acoustic countermeasure devices and forward-looking
    active sonar systems
  • Principal investigator for bio-inspired broadband sonar research
  • Experienced with design of low-noise acoustic transducer interface electronics,
    acoustic signal processing and analysis, and embedded systems development
Analog Devices, Inc., Limerick, Ireland                                 Fall 2002
Precision Digital to Analog Converters
  • Developed electronics and software for two customer evaluation board designs
  • Completed WPI Senior design team project (MQP) in 10 weeks abroad
Analog Devices, Inc., Wilmington, MA                               Summer 2002
High-Speed Networking (HSN) Engineering Intern
  • Developed and tested an integrated circuit communication interface using Agi-
    lent VEE and the I2 C protocol
  • Characterized high-speed transceiver electronics for laser diode driver IC


Education
Brown University, Providence, RI                                May 2014 (exp.)
Ph.D. Biomedical Engineering
Advised by Dr. James A. Simmons

University of Rhode Island, Kingston, RI                               May 2005
M.S. Electrical Engineering

Worcester Polytechnic Institute, Worcester, MA                         May 2003
B.S. Electrical Engineering with Distinction
Concentration in Computer Engineering
Minor in International Studies

                                        v
Awards and Honors
 1. Full Member, Sigma Xi, Scientific Research Society, Brown University Chapter,
    (2014).
 2. J. E. Gaudette, L. N. Kloepper, M. Warnecke and J. A. Simmons, “Arrayzilla
    Lives! Visualizing the dynamic beam pattern of an echolocating bat,” 1st place
    video entry in the Gallery of Acoustics displayed at the 164th Meeting of the
    Acoustical Society of America, Kansas City, MO, (October 2012).
 3. “Special Achievement Award for Excellence in the Area of Basic and Applied
    Research,” Swampworks Lightweight Torpedo Project Team, Naval Undersea
    Warfare Center, Newport, RI, (2007).
 4. “Special Achievement Award for Excellence in the Area of Basic and Applied
    Research,” Biorobotic Research Team, Naval Undersea Warfare Center, New-
    port, RI, (2006).
 5. Member, Eta Kappa Nu, Electrical Engineering Honor Society, Gamma Delta
    Chapter at Worcester Polytechnic Institute, Worcester, MA, (2003).


Grants and Fellowships
 1. 2014–2016, ONR Research Grant, Code 341 Bio-Inspired Autonomous Systems
    Program, (J. E. Gaudette, Principle Investigator), $275K, “Computational
    modeling and experimental evaluation of a bio-inspired broadband sonar sys-
    tem.”
 2. 2014–2016, NUWC Division Newport FY14 Independent Applied Research
    (IAR) Award, (J. E. Gaudette, Principal Investigator), $300K, “Bio-inspired
    broadband sonar system for high-resolution acoustic imaging applications.”
 3. 2014–2016, NUWC Division Newport FY14 In-House Laboratory Independent
    Research (ILIR) Award, (J. DiCecco, P. I.; J. E. Gaudette, Associate Investi-
    gator), $300K, “Novel reconfigurable neuromorphic computing architectures for
    neural information processing.”
 4. 2011–2013, NUWC Division Newport FY11-FY13 In-House Laboratory Inde-
    pendent Research (ILIR) Award, (J. E. Gaudette, Principal Investigator),
    $300K, “Bio-inspired broadband sonar receiver for clutter reduction: Computa-
    tional modeling and system evaluation.”
 5. 2010, NUWC Division Newport Academic Fellowship Award, (J. E. Gaudette,
    Principal Investigator), one-year sabbatical leave to Simmons’ Laboratory,
    Brown University, Providence, RI.


                                       vi
 6. 2009, NUWC Division Newport FY09 Virtual In-House Laboratory Independent
    Research (V-ILIR) Award, (J. E. Gaudette, Principal Investigator), $85K.
    “Bio-inspired broadband sonar receiver for clutter reduction.”


Peer-Reviewed Journal Articles
 1. J. E. Gaudette, L. N. Kloepper and J. A. Simmons, “Modeling of bio-inspired
    broadband sonar for high-resolution angular imaging,” J. Acoust. Soc. Am., (in
    prep.).
 2. L. N. Kloepper, J. E. Gaudette, J. R. Buck, and J. A. Simmons, “Influence
    of mouth opening and gape angle on the transmitted signals of big brown bats
    (Eptesicus fuscus),” J. Acoust. Soc. Am., (in prep.).
 3. L. N. Kloepper and J. E. Gaudette, “Exploring the dynamics of mammalian
    vocal-motor processes with emerging advanced technologies,” J. PostDoc. Res.,
    (in review.).
 4. J. E. Gaudette, L. N. Kloepper, M. Warnecke and J. A. Simmons, “High res-
    olution acoustic measurement system and beam pattern reconstruction method
    for bat echolocation emissions,” J. Acoust. Soc. Am., 135 (1), 513–520 (2014).
    doi: [10.121/1.4829661]
 5. J. DiCecco, J. E. Gaudette and J. A. Simmons, “Multi-component separation
    and analysis of bat echolocation calls,” J. Acoust. Soc. Am., 133 (1), 538–546
    (2013). doi: [10.121/1.4768877]
 6. J. A. Simmons and J. E. Gaudette, “Biosonar echo processing by frequency-
    modulated bats,” Radar Sonar Navig. IET, 6 (6), 556–565 (2012). doi:
    [10.1049/iet-rsn.2012.0009]


Conference Papers and Abstracts Presented
 1. J. E. Gaudette† and J. A. Simmons, “Encoding phase information is critical for
    high resolution spatial imaging in biosonar,” in J. Acoust. Soc. Am., Providence,
    RI, May 2014
 2. J. E. Gaudette† and J. A. Simmons, “Modeling of bio-inspired broadband sonar
    for high-resolution angular imaging,” in J. Acoust. Soc. Am., San Francisco,
    CA, December 2013, p. 4052. doi: [10.1121/1.4830787]
 3. L. N. Kloepper† , J. A. Simmons, J. E. Gaudette, R. Himmelwright and D.
    Robitzski, “Timing patterns of strobe groups for echolocating big brown bats
 †
     presented


                                        vii
   performing a target detection task,” in J. Acoust. Soc. Am., San Francisco, CA,
   December 2013, p. 4119. doi: [10.1121/1.4831129]
 4. J. E. Gaudette, L. N. Kloepper† and J. A. Simmons, “Object selection by head
    aim and acoustic gaze in the big brown bat,” in J. Acoust. Soc. Am., 133 (5),
    Montreal, Quebec, June 2013, p. 3406. doi: [10.1121/1.4805938]
 5. J. A. Simmons, J. E. Gaudette and L. N. Kloepper† , “Object selection by head
    aim and acoustic gaze in the big brown bat,” in Proc. Meetings on Acoustics,
    Vol. 19, (010036), June 2013. doi: [10.1121/1.4800651]
 6. J. E. Gaudette, L. N. Kloepper† and J. A. Simmons, “Large reconfigurable
    microphone array for transmit beam measurements of echolocating bats,” in
    J. Acoust. Soc. Am., 131 (4), Hong Kong, China, May 2012, p. 3361. doi:
    [10.1121/1.4708666]
 7. J. E. Gaudette† and J. DiCecco, “Bio-inspired broadband sonar and multi-
    component time-frequency analysis,” presented at the Maritime Systems and
    Technology (MAST) Americas Conference, Washington, DC, 14 November 2011.
 8. J. E. Gaudette† J. M. Knowles, J. R. Barchi, and J. A. Simmons, “Computa-
    tional model of a bio-inspired broadband receiver for sonar clutter reduction,”
    in J. Acoust. Soc. Am., 129 (4), Seattle, WA, 25 May 2011, p. 2507. doi:
    [10.1121/1.3588282]
 9. J. M. Knowles† , J. E. Gaudette, J. R. Barchi and J. A. Simmons, “Recon-
    structing echolocation behavior using time difference of arrival localization and
    a distributed microphone array as a virtual Telemike,” in J. Acoust. Soc. Am.,
    129 (4), Seattle, WA, 23-27 May 2011, p. 2574. doi: [10.1121/1.3588496]
10. J. DiCecco† and J. E. Gaudette† , “Analysis of Active Sonar Waveform Design
    by Echolocating Mammals,” presented at the Nato Undersea Research Center
    (NURC) Maritime Rapid Environmental Assessment (MREA10) Conference,
    Lerichi, Italy, 13 October 2010.
11. J. E. Gaudette† and J. A. Simmons, “Modeling of precise onset spike timing
    for echolocation in the big brown bat, Eptesicus fuscus,” in J. Acoust. Soc.
    Am., 127 (3), Baltimore, MD, April 2010, p. 1861. doi: [10.1121/1.3384433].
12. J. R. Barchi† , J. E. Gaudette, J. M. Knowles and J. A. Simmons, “Bioa-
    coustic and behavioral correlates of spatial memory in echolocating bats,” in
    J. Acoust. Soc. Am., 127 (3), Baltimore, MD, April 2010, p. 2030. doi:
    [10.1121/1.3385329].


Invited Lectures
 1. J. E. Gaudette† and J. A. Simmons, “Bio-inspired broadband sonar: Exploiting
    biological solutions to simplify acoustic imaging,” Keynote Speaker for Winter

                                        viii
   Meeting of the Acoustical Society of America, Narragansett Chapter, 24 Febru-
   ary 2014. Middletown, RI.
 2. J. E. Gaudette† and J. A. Simmons, “Bio-inspired broadband sonar: Compu-
    tational modeling and system evaluation,” NUWC Newport – Naval Research
    Laboratory (NRL) Joint Lecture Series, 18 June 2013. Washington, DC.
 3. J. E. Gaudette† and J. A. Simmons, “Bio-inspired broadband sonar: Compu-
    tational modeling and system evaluation,” NUWC ILIR Science and Technology
    ILIR Seminar Series, 2013. Newport, RI.
 4. J. E. Gaudette† and J. A. Simmons, “Bio-inspired broadband sonar: Exploiting
    biological solutions to simplify acoustic imaging.” Virtual teleconference presen-
    tation - ONR N-STAR lecture series, 3 April 2013. NUWC Division Newport,
    RI; Office of Naval Research, Arlington, VA; NSWC Panama City, FL.
 5. J. E. Gaudette† and J. A. Simmons, “Bio-inspired broadband sonar for clutter
    reduction.” Presentation at the UMASS Dartmouth – NUWC Newport Joint
    Technical Seminar Series, Dartmouth, MA, 2 November 2012
 6. J. E. Gaudette† and J. A. Simmons, “Bio-inspired broadband sonar: Compu-
    tational modeling and system evaluation,” NUWC ILIR Science and Technology
    ILIR Seminar Series, 10 February 2012. Newport, RI.
 7. J. E. Gaudette† and J. A. Simmons, “Bio-inspired broadband sonar: Compu-
    tational modeling and system evaluation,” Brown University Biomedical Engi-
    neering Graduate Seminar Lecture, 7 February 2012. Providence, RI.
 8. J. E. Gaudette† and J. A. Simmons, “Bio-inspired broadband sonar receiver
    for clutter reduction: Computational modeling and system evaluation,” Brown
    University Biomedical Engineering Graduate Seminar Lecture, 18 April 2011.
    Providence, RI.
 9. J. E. Gaudette† and J. A. Simmons, “Bio-inspired broadband sonar receiver
    for clutter reduction: Computational modeling and system evaluation,” NUWC
    ILIR Science and Technology ILIR Seminar Series, 30 March 2011. Newport,
    RI.


Poster Sessions
 1. J. E. Gaudette† and J. A. Simmons, “Bio-inspired broadband sonar for micro-
    aperture imaging,” poster presented at the FY2013 In-House Laboratory Inde-
    pendent Research (ILIR) Annual Program Review, 29 October 2013, Newport,
    RI.
 2. J. E. Gaudette† and J. A. Simmons, “Bio-inspired broadband sonar,” poster
    presented at the N-STAR symposium, June 2012, Arlington, VA.

                                         ix
 3. J. E. Gaudette† and J. A. Simmons, “Bio-inspired broadband sonar: Com-
    putational modeling and system evaluation,” poster presented at the FY2012
    In-House Laboratory Independent Research (ILIR) Annual Program Review,
    October 2012, Newport, RI.
 4. J. M. Knowles† , J. A. Simmons, J. M. Barchi, J. E. Gaudette, S. S. Horowitz
    and A. M. Simmons, “Cochlear processing in biosonar: Modeling sound trans-
    duction and the cochlear microphonic in echolocating bats,” poster presented at
    the Society for Neuroscience, 477.D.02, November 2011, Washington, DC.
 5. J. E. Gaudette† and J. A. Simmons, “Bio-inspired broadband sonar: Com-
    putational modeling and system evaluation,” poster presented at the FY2011
    In-House Laboratory Independent Research (ILIR) Annual Program Review,
    October 2011, Newport, RI.
 6. J. E. Gaudette† and J. A. Simmons, “Sonar clutter reduction using bio-inspired
    broadband template matching,” poster presented at the FY2009 In-House Lab-
    oratory Independent Research (ILIR) Annual Program Review, October 2009,
    Newport, RI.


Teaching Experience
 1. BN065, Biology of Hearing, guest lecturer, designed and delivered lecture notes
    with computer examples to approx. 80-100 students, “Fourier transform and
    spectral analysis related to acoustics and the auditory system,” 1 February 2012.
    Brown University, Providence, RI.
 2. Sheridan Teaching Certificate: Level I Seminar Program, May 2010, Sheridan
    Center for Teaching and Learning, Brown University, Providence, RI.
 3. BN065, Biology of Hearing, guest lecturer, presented two consecutive seminars
    of approx. 100-120 students, “Computational modeling of the auditory system,”
    10 and 12 March 2010. Brown University, Providence, RI.


                                         x
Preface and Acknowledgments
From the commencement of my graduate studies, my intention was to focus on some-
thing unique and interesting. I think most people would agree that researching bat
sonar is exactly that. So much has been learned through this experience, both pro-
fessionally and personally. Ultimately, the most important lesson is that time is truly
our most valuable and limited resource and it must be spent wisely.
      I would first like to thank my wonderful wife, Elena. You have kept me going
through the many times of uncertainty and frustration, reviewed my endless supply
of presentations and manuscript revisions, and supported me in all of my endeavors.
This was certainly a long journey and I could not have done it without your devotion.
      To my parents, I would like to say that this is all your fault. You encouraged
me to learn, and taught me the value of education, but forgot to tell me when to
stop. Nevertheless, I will always appreciate everything you have done for me. I can
only hope that I am able to instill the same set of values into my children.
      Among the many other people who deserve acknowledgment for this dissertation
are my family, close friends, and many of my teachers and colleagues at Brown, URI,
and NUWC. My personal drive stems from all of these relationships and I would be
remiss to overlook this fact.
      There are far too many people to thank individually, but I would regret not
mentioning at least a few. My earliest interests in bio-inspired engineering stemmed
from working closely with Alberico Menozzi, Henry Leinhos, David Beal, and Pro-
mode Bandyopadhyay at NUWC, and it was this initial exposure to biorobotics that
has had a lasting impact. It was also by great fortune that I met John DiCecco,
as his ideas on non-linear time-frequency analysis are what shaped the early parts
of this dissertation. From the bat lab, I feel honored to have worked closely with

                                          xi
Jeff Knowles, an outstanding academic with whom I’ve shared many a philosophi-
cal felafel and who also launched my sailing career; Michaela Warnecke, who quickly
transformed into the German I ask for answers to everything; Alyssa Wheeler, who
made me appreciate the sheer difficulty of lab work; and Laura Kloepper, who taught
me how to write good. Among the many people to review various drafts of my dis-
sertation, I would also like to thank Andrea Simmons, David Segala, Robin Murray,
and Jennifer Wardell for their many helpful comments and suggestions.
      I would like to thank all of the members of my thesis committee for their com-
ments, suggestions, criticisms, and overall guidance of my research. Shaping such
broad objectives into substantial research requires the highly interdisciplinary ex-
pertise afforded by this group. I sincerely appreciate the considerable commitment
toward this effort. I am particularly indebted to Prof. John Buck who, as an exter-
nal advisor from UMass Dartmouth, asked the difficult questions that helped me to
improve the overall quality of this research.
      I owe a great deal of thanks to my advisor, Prof. James Simmons, who deserves
most of the credit for inspiring the research in this dissertation. Jim’s devotion,
creative ideas, cheerfulness, and infinite patience are just a few of the reasons that
keep me passionate about this work.
      Finally, all of my graduate courses and research to date has been funded through
internal investments by the Naval Undersea Warfare Center in Newport, Rhode Is-
land. I am pleased and extremely grateful to the Chief Technology Office as well as
my management and colleagues for committing to employees’ professional and educa-
tional goals. Without this continued support, none of this could have been achieved.


                                          xii
Dedication

To my inquisitive children, Lucas and Alexander.


                                       xiii
Table of Contents

Table of Contents                                                                      xiv

List of Figures                                                                        xvii

List of Symbols                                                                         xx

1 Introduction                                                                           1
  1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       1
  1.2 Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       5
       1.2.1 Time-Frequency Analysis and the Auditory System . . . . . .                 5
       1.2.2 Dynamic Behavior and Adaptation in Echolocation . . . . . .                 6
       1.2.3 Toward the Design of a Bio-Inspired Broadband Sonar System                  7
  1.3 Dissertation Objectives and Overview . . . . . . . . . . . . . . . . . .           9
  References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    10

2 Background                                                                            13
  2.1 Acoustic Information Sensing and Processing by Mammals . . . . . .                13
       2.1.1 The Mammalian Auditory System . . . . . . . . . . . . . . . .              14
       2.1.2 Neural Information Processing by the Auditory System . . . .               16
       2.1.3 Auditory Cues for Passive Localization in Biological Systems .             17
       2.1.4 Specializations for High-Resolution Active Acoustic Imaging .              19
  2.2 Acoustic Imaging in Technological Systems . . . . . . . . . . . . . . .           24
       2.2.1 Conventional Array Signal Processing . . . . . . . . . . . . . .           24
       2.2.2 Beam Patterns and Angular Resolution . . . . . . . . . . . . .             26
  2.3 Model-Based Approach to Bio-Inspired Acoustic Imaging . . . . . . .               30
       2.3.1 Auditory Modeling Insights and Oversights with Filter Banks                31
       2.3.2 Signal Processing Models for High-Resolution Range Estimates               32
       2.3.3 Models for Angular Target Localization and Acoustic Imaging                36
       2.3.4 Mathematical Models of Echolocation Performance . . . . . .                37
       2.3.5 Hardware Prototypes as Exploratory Models . . . . . . . . . .              38
  References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    39

3 Multi-Component Separation and Analysis of Bat Echolocation Calls                     53
  3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      54
  3.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       57
  3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       58
      3.3.1 Separation of Harmonic Components . . . . . . . . . . . . . .               58
             3.3.1.1 Fractional Fourier Transform . . . . . . . . . . . . .             59

                                           xiv
               3.3.1.2 Rough Approximation of Instantaneous Frequency                 .   60
               3.3.1.3 Zero-Phase Component Filtering . . . . . . . . . .             .   62
        3.3.2 Monocomponent Decomposition . . . . . . . . . . . . . . . .             .   63
               3.3.2.1 Empirical Mode Decomposition . . . . . . . . . . .             .   63
               3.3.2.2 Hilbert Spectral Analysis . . . . . . . . . . . . . .          .   65
        3.3.3 Waveform Synthesis and Ground Truth . . . . . . . . . . . .             .   66
   3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    .   67
        3.4.1 Telemike Data Series . . . . . . . . . . . . . . . . . . . . . .        .   67
        3.4.2 Synthesized Multi-Component FM Analysis . . . . . . . . .               .   68
   3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     .   69
   3.6 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . .        .   72
   A    Multi-Component Frequency-Modulated Waveforms . . . . . . . . .               .   72
   B    Hilbert Spectral Analysis of Modulated Waveforms . . . . . . . . .            .   73
   References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   .   75

4 High Resolution Acoustic Measurement System and Beam Pattern
  Reconstruction Method for Bat Echolocation Emissions                                    79
  4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        80
  4.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .         82
  4.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .         85
       4.3.1 Beam Pattern Reconstruction . . . . . . . . . . . . . . . . . .              85
       4.3.2 Microphone and System Calibration . . . . . . . . . . . . . . .              88
  4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       90
       4.4.1 Example Beam Pattern of a Circular Electrostatic Projector .                 90
       4.4.2 Example Beam Pattern of the Big Brown Bat, Eptesicus fuscus                  93
  4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        94
  4.6 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . .           98
  References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      98

5 Modeling Bio-Inspired Broadband Sonar for High-Resolution Angu-
  lar Imaging                                                                      101
  5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
  5.2 Modeling Broadband Acoustic Information . . . . . . . . . . . . . . . 102
       5.2.1 Environmental Acoustics . . . . . . . . . . . . . . . . . . . . . 103
             5.2.1.1 The Transformation of Broadband Information in the
                       Physical Environment . . . . . . . . . . . . . . . . . 103
             5.2.1.2 Application of Broadband Transmission Loss to the
                       Active Sonar Equation . . . . . . . . . . . . . . . . . 105
       5.2.2 Transducer Directivity Patterns . . . . . . . . . . . . . . . . . 109
             5.2.2.1 Broadband Spectral Information in Conventional Trans-
                       ducers . . . . . . . . . . . . . . . . . . . . . . . . . . 109
             5.2.2.2 Bio-Acoustic Baffle Structures and Implications for
                       Modeling . . . . . . . . . . . . . . . . . . . . . . . . 111
       5.2.3 Reflective Scatterer Structure and Composition . . . . . . . . 114
       5.2.4 The Broadband Echo Spectrum in the Range-Azimuth Plane . 116
  5.3 Extraction of Broadband Spatial Information from Echoes . . . . . . 117


                                           xv
        5.3.1 Quantifying the Angular Resolution Limit . . . . . . . . . . . 117
        5.3.2 Broadband Acoustic Focusing with a Single Piston Transducer 120
        5.3.3 Broadband Acoustic Focusing with a Bio-Inspired Array . . . 121
        5.3.4 Mutual Interference and the Diffraction Patterns of Scatterers 123
   5.4 Performance Comparison with Conventional Acoustic Imaging . . . . 125
        5.4.1 Processing Broadband Signals with Suboptimal Element Spacing126
        5.4.2 Coherent Summation of Broadband Signals . . . . . . . . . . . 129
        5.4.3 Limitations to Conventional Beamforming Comparisons . . . . 131
   5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
   5.6 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
   A    Applying Biosonar Modeling to Underwater Acoustic Imaging . . . . 135
   References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

6 Discussion, Applications, Future Directions, and Concluding Re-
  marks                                                                              143
  6.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
  6.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
       6.2.1 Multi-Component Signals and Time-Frequency Analysis . . . 146
       6.2.2 Beam Pattern Measurement Instrumentation and Techniques . 147
  6.3 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
       6.3.1 Time-Frequency Analysis of Bio-Acoustic Signals . . . . . . . 148
       6.3.2 Acoustic Measurement and Visualization of the Multi-Dimensional
              Sound Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
       6.3.3 Bio-Inspired Broadband Sonar for Micro-Aperture Imaging . . 150
  6.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
  References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

A Modeling of Precise Onset Spike Timing for Echolocation                              154
  A.1 Motivation for a Biophysical Model . . . . . . . . . . . . . . . . . . .         154
       A.1.1 Coincidence Detection and Population Coding in the Auditory
              System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     156
  A.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      158
       A.2.1 Peripheral System . . . . . . . . . . . . . . . . . . . . . . . . .       159
              A.2.1.1 Outer and Middle Ear . . . . . . . . . . . . . . . . .           159
              A.2.1.2 Cochlea and Basilar Membrane . . . . . . . . . . . .             159
              A.2.1.3 Meddis Auditory Peripheral Model . . . . . . . . . .             160
              A.2.1.4 Spike Refractory Equations . . . . . . . . . . . . . .           161
       A.2.2 Cochlear Nucleus . . . . . . . . . . . . . . . . . . . . . . . . .        162
              A.2.2.1 Leaky IaF Model . . . . . . . . . . . . . . . . . . . .          162
  A.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    164
       A.3.1 Auditory Stimuli . . . . . . . . . . . . . . . . . . . . . . . . .        164
              A.3.1.1 Meddis Auditory Peripheral Model . . . . . . . . . .             165
              A.3.1.2 IaF Neurons . . . . . . . . . . . . . . . . . . . . . . .        165
              A.3.1.3 Integration with BiSCAT . . . . . . . . . . . . . . .            166
  A.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     166
  References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   170


                                           xvi
List of Figures

 1.1   Close-up photograph of the big brown bat, Eptesicus fuscus and time-
       frequency diagram (spectrogram) for an example E. fuscus echoloca-
       tion call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   2
 1.2   The measured transmit and receive acoustic directivity, or beam pat-
       terns, of E fuscus are plotted across the azimuth plane . . . . . . . .          3

 2.1   The mammalian auditory system mapped from the cochlea to the cortex             15
 2.2   Beam patterns in air from a line array of N = 10 omni-directional
       elements that are spaced at d = 1.72 cm . . . . . . . . . . . . . . . .         27
 2.3   Active underwater sonar data collected from the site of a shipwreck in
       Narragansett Bay, Rhode Island . . . . . . . . . . . . . . . . . . . . .        29
 2.4   The magnitude, phase, and group delay response for a gammatone
       filter bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   33
 2.5   Block diagram of the Spectrogram Correlation and Transformation
       (SCAT) receiver model . . . . . . . . . . . . . . . . . . . . . . . . . .       34

 3.1   Four different time-frequency distributions of an FM echolocation call
       from E. fuscus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    55
 3.2   Rotation-fraction domain of the E. fuscus signal from the FrFT . . .            61
 3.3   Overview of harmonic component separation using a least-squares cu-
       bic approximation of instantaneous frequency, fi (t) . . . . . . . . . .        63
 3.4   Results of the empirical mode decomposition on the separated second
       harmonic, FM2, from E. fuscus . . . . . . . . . . . . . . . . . . . . .         64
 3.5   Hilbert spectral analysis results showing instantaneous amplitude, ai (t),
       and frequency, fi (t), for each harmonic component of the E. fuscus call        66
 3.6   Multi-component analysis performed on call sequences from radioteleme-
       try recordings of E. fuscus and three Asian bat species . . . . . . . .         67
 3.7   Multi-component analysis results from the telemike data series plotted
       separately for FM1 and FM2 . . . . . . . . . . . . . . . . . . . . . . .        69
 3.8   Standard time-frequency representations and multi-component analy-
       sis results for synthetic signals . . . . . . . . . . . . . . . . . . . . . .   70

 4.1   Photograph of fully constructed microphone array and close-up view of
       a microphone preamplifier circuit board showing the integrated MEMS
       microphone unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     84
 4.2   Flow chart describing the signal processing steps to reconstruct each
       beam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    86


                                         xvii
4.3    Diagram showing microphone sensor positions mapped to spherical co-
       ordinates with the sound source positioned at the origin . . . . . . . .         87
4.4    Aspect view and contour plot of the reconstructed transmit beam pat-
       tern of a 2 cm diameter transducer at its resonant frequency of 60
       kHz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      91
4.5    Theoretical beam pattern of a piston transducer with 2 cm diameter
       in air . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   92
4.6    Aspect view and 6 dB contour plot of the reconstructed beam patterns
       for a single E. fuscus transmit pulse . . . . . . . . . . . . . . . . . . .      94

5.1    The total absorption effect in air and the three individual components
       that dominate in different frequency regions . . . . . . . . . . . . . . 106
5.2    Absorption vs. frequency at 50% relative humidity plotted for temper-
       atures between 0◦ C and 40◦ C in steps of 5◦ . . . . . . . . . . . . . . . 107
5.3    Combined transmission loss components due to both spherical spread-
       ing and absorption . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.4    Relative echo strength vs. distance at different frequencies for an ideal
       0 dB point reflector . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.5    Theoretical directivity pattern for a piston transducer in air with a
       fixed circular aperture of 0.94 cm . . . . . . . . . . . . . . . . . . . . 111
5.6    Example beam pattern data measured from an obliquely truncated horn113
5.7    The target strength of individual fish at dorsal aspect versus length . 115
5.8    Relative echo intensity as a function of range, azimuth, and frequency 118
5.9    The region of focus after applying the L1 spectral distance around
       4.5 m at 0◦ azimuth (a) and 25◦ off-axis . . . . . . . . . . . . . . . . 120
5.10   A bio-inspired broadband sonar array utilizing only three circular piston-
       like elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.11   The region of focus after applying the L1 spectral distance around
       4.5 m at 0◦ azimuth for a single transmitter and a pair of identical
       receive transducers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.12   The time difference of arrival between two receiving transducers when
       separated by 1.4 cm in air . . . . . . . . . . . . . . . . . . . . . . . . 123
5.13   The region of focus after combining binaural spectrogram correlation
       and TDOA estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.14   The beam patterns of an array with N = 10 omni-directional elements
       spaced at d = 1.4 cm in air . . . . . . . . . . . . . . . . . . . . . . . 128
5.15   The beam patterns of an array with N = 2 omni-directional elements
       spaced apart by d = 1.4 cm in air . . . . . . . . . . . . . . . . . . . . 128
5.16   Summed beam patterns for a simple array of N = 2 elements spaced
       apart by d = 1.4 cm in air . . . . . . . . . . . . . . . . . . . . . . . . 130
5.17   Absorption coefficient in water vs. frequency at various temperatures
       between -5◦ C and 35◦ C, depth of 0 m, salinity of 35 ppt, and acidity
       of 8.0 pH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

A.1 Action potentials recorded from a rat when presented with a low fre-
    quency sinusoidal stimulus . . . . . . . . . . . . . . . . . . . . . . . . 157
A.2 Proposed neural network architecture of the auditory population coding158

                                          xviii
A.3 Block diagram of the Meddis IHC model . . . . . . . . . . . . . . . .           160
A.4 Time series and spectrogram of a synthetic linear FM and 2 pairs of
     echoes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   164
A.5 Magnitude and phase plot of 4 channels in a gammatone filterbank
     between 25kHz and 100kHz . . . . . . . . . . . . . . . . . . . . . . .         165
A.6 Example gammatone filterbank output using the signal as shown above
     and generated at 4 arbitrary frequencies . . . . . . . . . . . . . . . .       166
A.7 Internal states of the Meddis model (k, q, c, & w) in response to a
     synthesized acoustic stimulus . . . . . . . . . . . . . . . . . . . . . .      167
A.8 Pspike and resulting spike train for 40 LSR auditory nerve fibers . . .         167
A.9 Membrane potential and spikes with 4 integrate-and-fire neurons . . .           168
A.10 Integrate-and-fire neurons (M=4) with random, but overlapping synap-
     tic input (N=100) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    169
A.11 Layout of each of three tabbed panels in the BiSCAT GUI . . . . . .            172


                                        xix
List of Symbols
This dissertation spans many fields, including acoustics, biology, and engineering.
Where noted in the descriptions below, the application of symbols is context spe-
cific. Acoust: acoustics and acoustic modeling, Anat: anatomy, ASP: array signal
processing, Model: Auditory modeling and linear filter theory, TFA: time-frequency
analysis.

Abbreviations
AC     Anat: auditory cortex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     4
AN     Anat: auditory nerve . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       14
ARMA Model: auto-regressive moving-average . . . . . . . . . . . . . . . . . .             86
ATR automatic target recognition . . . . . . . . . . . . . . . . . . . . . . . . .         116
AVCN Anat: anteroventral cochlear nucleus . . . . . . . . . . . . . . . . . . . .           15
BM     Anat: basilar membrane . . . . . . . . . . . . . . . . . . . . . . . . . . .         15
CN     Anat: cochlear nucleus . . . . . . . . . . . . . . . . . . . . . . . . . . . .       14
CRLB Cramer-Rao lower bound . . . . . . . . . . . . . . . . . . . . . . . . . .             37
DCN Anat: dorsal cochlear nucleus . . . . . . . . . . . . . . . . . . . . . . . .           15
DRNL Model: dual-resonance non-linear . . . . . . . . . . . . . . . . . . . . .             31
EMD TFA: empirical mode decomposition . . . . . . . . . . . . . . . . . . . .               64
FFT TFA: fast Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . .          88
FM     frequency modulated . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       80
FPGA field programmable gate array . . . . . . . . . . . . . . . . . . . . . . . .           7
FrFT TFA: fractional Fourier transform . . . . . . . . . . . . . . . . . . . . . . .         5
FT     TFA: Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      5
HPBW ASP: half-power beam width . . . . . . . . . . . . . . . . . . . . . . . .            28

                                             xx
HRTF Acoust: head-related transfer function . . . . . . . . . . . . . . . . . . .            18
IC     Anat: inferior colliculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    4
IHC    Anat: inner hair cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      16
IID    Acoust: interaural intensity difference . . . . . . . . . . . . . . . . . . .         18
IIR    Model: infinite impulse response . . . . . . . . . . . . . . . . . . . . . . .        31
IMF TFA: intrinsic mode function . . . . . . . . . . . . . . . . . . . . . . . .             64
ITD    Acoust: interaural time difference . . . . . . . . . . . . . . . . . . . . . .        18
JAMF TFA: joint acoustic and modulation frequency . . . . . . . . . . . . . . .               6
LSO Anat: lateral superior olive . . . . . . . . . . . . . . . . . . . . . . . . . .         15
LTI    Model: linear time-invariant . . . . . . . . . . . . . . . . . . . . . . . . . .       6
MA     Model: moving average . . . . . . . . . . . . . . . . . . . . . . . . . . . .         88
MEMS micro electro-mechanical systems . . . . . . . . . . . . . . . . . . . . .              83
MRA Acoust: main response axis . . . . . . . . . . . . . . . . . . . . . . . . . . .          3
MSO Anat: medial superior olive . . . . . . . . . . . . . . . . . . . . . . . . .            15
NLL Anat: nucleus of the lateral lemniscus . . . . . . . . . . . . . . . . . . .             15
NTB Anat: nucleus of the trapezoidal body . . . . . . . . . . . . . . . . . . .              15
OHC Anat: outer hair cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . .         15
PVCN Anat: posteroventral cochlear nucleus . . . . . . . . . . . . . . . . . . .             15
RCF Model: rectify, compress, and filter . . . . . . . . . . . . . . . . . . . . .           34
RWT TFA: Radon-Wigner transform . . . . . . . . . . . . . . . . . . . . . . .                59
SCAT Model: spectrogram correlation and transformation . . . . . . . . . . .                 34
SOC Anat: superior olivary complex . . . . . . . . . . . . . . . . . . . . . . .             14
SPL    Acoust: sound pressure level . . . . . . . . . . . . . . . . . . . . . . . . .        90
STFT TFA: short-time Fourier transform . . . . . . . . . . . . . . . . . . . . . .            5
TDOA time difference of arrival . . . . . . . . . . . . . . . . . . . . . . . . . .          85
TFR TFA: time-frequency representation . . . . . . . . . . . . . . . . . . . . .             56
VLSI very-large scale integrated . . . . . . . . . . . . . . . . . . . . . . . . . .         38
VRDR Model: variable resolution and detection receiver . . . . . . . . . . . .               36


                                             xxi
WVD TFA: Wigner-Ville distribution . . . . . . . . . . . . . . . . . . . . . . . .            5

Variables
α       Acoust: frequency dependent acoustic absorption coefficient . . . . . . .            88
α       TFA: normalized fractional angle of rotation . . . . . . . . . . . . . . . .         59
β       angle of truncation for an acoustic horn . . . . . . . . . . . . . . . . . .        112
λ       Acoust: wavelength in the medium . . . . . . . . . . . . . . . . . . . . .           18
φ       TFA: angle of fractional rotation in radians . . . . . . . . . . . . . . . .         59
φ(f ) Model: phase response of a filter . . . . . . . . . . . . . . . . . . . . . .          33
φ0      TFA: initial phase of a modulated signal . . . . . . . . . . . . . . . . . .         66
φi (t) TFA: instantaneous phase law . . . . . . . . . . . . . . . . . . . . . . . .          62
ρ       Acoust: atmospheric pressure . . . . . . . . . . . . . . . . . . . . . . . .        104
ψ       ASP: steered angle of an array . . . . . . . . . . . . . . . . . . . . . . . .       25
xˇ(t)   TFA: original analytic signal, demodulated . . . . . . . . . . . . . . . .           62
yˇ(t)   TFA: isolated analytic component, demodulated . . . . . . . . . . . . .              62
df (θ) ASP: array steering vector, 1 × N . . . . . . . . . . . . . . . . . . . . . .         26
x˜(t)   TFA: original analytic signal, unmodulated . . . . . . . . . . . . . . . .           60
y˜(t)   TFA: isolated analytic component, unmodulated . . . . . . . . . . . . .              62
ai (t) TFA: instantaneous amplitude . . . . . . . . . . . . . . . . . . . . . . . .          65
D       Acoust: depth in water, m . . . . . . . . . . . . . . . . . . . . . . . . . .       136
d       ASP: distance between sensors . . . . . . . . . . . . . . . . . . . . . . . .        18
d       Acoust: acoustic propagation distance . . . . . . . . . . . . . . . . . . .          88
d0      Acoust: reference distance of a sound source . . . . . . . . . . . . . . . .         88
f       frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   104
fi (t) TFA: instantaneous frequency . . . . . . . . . . . . . . . . . . . . . . . .          60
fs      sampling rate of a discrete-time signal . . . . . . . . . . . . . . . . . . .        65
hr      Acoust: relative humidity . . . . . . . . . . . . . . . . . . . . . . . . . . .     104
N       ASP: number of elements in an array . . . . . . . . . . . . . . . . . . . .          25
pH      Acoust: acidity, pH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     136

                                              xxii
S     Acoust: salinity, ppt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   136
T     Acoust: temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       104
TL    Acoust: transmission loss . . . . . . . . . . . . . . . . . . . . . . . . . . .     88
u     TFA: fractional dimension between time and frequency . . . . . . . . .              59
W     ASP: aperture shading matrix, diagonal N × N . . . . . . . . . . . . . .            26
x     ASP: array data vector, 1 × N . . . . . . . . . . . . . . . . . . . . . . . .        26
Y (f, ψ) ASP: frequency domain array response . . . . . . . . . . . . . . . . . .         26


                                            xxiii
Chapter 1

Introduction
The biosonar system of echolocating bats, dolphins, and whales represents the most
advanced acoustic imaging solution known to exist. The sophistication of biosonar
lies not in its complexity, but in the real-time performance that is achievable by a
minimalistic set of hardware; a few acoustic baffles1 and a compact network of neural
circuitry. The primary focus of this dissertation is on improving our understanding
of how animals perceive images of objects from the packets of acoustic echoes. The
motivation behind this research is presented first, followed by the significance in the
context of the current state-of-the-art. The last section states the research objectives
and provides an overview of the remaining dissertation chapters.


1.1        Motivation
Echolocation is a complex active sensory system in which animals forage and nav-
igate in their environment primarily using emitted acoustic signals. By producing
intense, ultrasonic signals and receiving their returning echoes, echolocating animals
can identify, discriminate and track prey, often in highly cluttered environments.
Bats and toothed whales (Microchiroptera and Odontoceti) are two distinctly dif-
   1
     Acoustic baffles refer to any physical boundary layers or structures in close proximity to the sound
transmission source or receiving sensors. Acoustic baffles serve to block or guide sound waves propagating in
a particular direction. In biosonar, acoustic baffles refer specifically to a bat’s mouth or nose for transmission
and its ears for reception. The baffles of underwater marine mammals consist of the melon for sound emission
and the mandibles for sound reception. In general, the head may be included when it has a significant impact
on propagating sound waves.


                                                        1
ferent suborders of mammals that convergently evolved echolocation, and both have
been intensely investigated to understand their mechanisms that may translate to
man-made sonar and radar systems [1].
        The big brown bat, Eptesicus fuscus, is an ideal model organism for investigating
echolocation. These bats produce short broadband signals with ultrasonic frequencies
between 20 and 100 kHz and with a bandwidth-to-center frequency ratio greater than
unity (Fig. 1.1b). The signals are downward FM sweeps with three harmonically
related components spanning several octaves. The duration and the repetition rate of
the signals depend on the distance of nearby objects, with both decreasing as the bat
approaches targets [2]. Based on the intensity of emitted sounds, transmission losses,
and strength of acoustic reflections from insect prey, big brown bats can detect prey
at distances up to 20 m [3].

    A
                                                                120                                                          35
                                                                                                             FM3             30
                                                                100
                                                                                           FM2
                                              Frequency (kHz)


                                                                                                                             25
                                                                80
                                                                                                                             20
                                                                60                        FM1                                15
                                                                40                                                           10

                                                                20                                                           5
                                                                                B                                            0
                                                                 0
                                                                      35 15 0       0.5    1     1.5    2    2.5   3   3.5
                                                                        dB                       Time (ms)


Figure 1.1. (a) A close-up photograph of the big brown bat, Eptesicus fuscus, is shown to highlight
the complex set of acoustic baffles – its ears and mouth. The spatial beam or directivity patterns
are determined by the geometry of these baffles, which transform the magnitude and phase of sound
waves propagating into the inner ears or out from the larynx. (b) The time-frequency diagram
(spectrogram) is shown for an example E. fuscus echolocation call along with the corresponding
time series (top) and spectral density (side) of the same call. This bat species emits broadband
signals that consist of harmonically related components spanning several octaves. The ratio of the
bandwidth to center frequency provides an indirect measure of how much a directivity pattern will
change naturally over the entire operating frequency range. In the case of E. fuscus, this ratio is
greater than unity, but quantities less than 0.2 are common for most man-made active sonar systems.


        The echolocation signals of big brown bats are produced in the larynx and
transmitted through the mouth. The center of the directed energy, or main response
axis (MRA), is straight forward at zero degrees across all frequencies. The angular


                                                                2
                                   A


                                     Hartley & Suthers (1989)
 B                                                                                      C


       Aytekin et al. (2004)                                                            Aytekin et al. (2004)


                                   D


                                                                Simmons et al. (1983)


Figure 1.2. The measured transmit and receive acoustic directivity, or beam patterns, of E fuscus
are plotted across the azimuth plane at the specific frequencies of 25 (red), 40 (green), 60 (blue), and
80 kHz (yellow). (a) The transmit beam is emitted through the bat’s mouth. The main response
axis (MRA) is straight forward at 0◦ across all frequencies and can be reasonably approximated by a
4.7 mm radius piston transducer [4]. (b and c) The sound reception pattern as measured bilaterally
through each ear [5]. Notice that the MRA shifts from off-axis at low frequencies toward on-axis at
high frequencies, which is a characteristic of the shape of the ears and can be approximated as an
obliquely truncated horn [6]. Due to the limited acoustic aperture, the beam patterns are very broad
in angle, even as they become narrower at high frequencies. (d) Despite having very broad beam
widths, the angular acuity as measured by a behavioral discrimination task is 1.5◦ in azimuth [7]
and 3.0◦ in elevation [8]. This is surprising, because man-made imaging sonar systems generally
depend upon narrow transmit and/or receive beams, which require a much larger acoustic aperture
(physical or synthetic) for the same frequencies considered here.


width of the energy can be reasonably approximated by a 4.7 mm radius piston
transducer (Fig. 1.2a) [4]. The returning echoes are received bilaterally through each
ear. The receiver MRA shifts from off-axis at low frequencies toward on-axis at high
frequencies due to the shape of the ears, which can be approximated as obliquely
truncated horns (Fig. 1.2b-c) [5, 6]. A common characteristic among biosonar is
that these beams are broad in angle, even at high frequencies. Despite having broad
beams, these bats are able to achieve angular acuity of 1.5◦ and 3◦ in azimuth and
elevation, respectively (Fig. 1.2d) [7, 8].
       The fundamental question is how can bats achieve such fine degrees of acuity

                                                        3
with broad beamwidths? A conventional sonar system operating in air over the same
frequency range as the big brown bat would require an array length, or aperture,
of approximately 1.1 m to achieve 1.5◦ angular resolution in azimuth. Furthermore,
element-to-element spacing of 1.7 mm would need to be maintained to avoid ambigu-
ous localization [9], which demands approximately 640 array elements in total. This
array design becomes completely intractable if the requirement of 3.0◦ is simultane-
ously imposed for elevation. Remarkably, the big brown bat requires only two ears
spaced 1.4 cm apart (Fig. 1.1a) – a reduction in array aperture of about 80 times and
at least two orders of magnitude less sensors.
      Behavioral and neurophysiological evidence show that bats perform spatial
imaging by exploiting three pieces of salient information: 1) the absolute time delay
between an emitted pulse and incident echoes, 2) the relative time delay of echoes
between ears, and 3) the broadband spectral patterns encoded internally by the bat’s
complex acoustic baffles and externally by the environment and reflective scatter-
ers. Acoustic imaging in azimuth requires fusing this information together, whereas
imaging in elevation is achieved with only the spectral information available to each
ear. More specifically, it is known that the spatial imaging process relies upon precise
neural timing of echoes arriving at each ear [10, 11] and neural decoding of the fre-
quency dependent spectral patterns introduced by the unique structure of the bats’
ears [8, 12].
      Biosonar research, indeed neuroscience in general, has advanced prodigiously in
a relatively short period of time. Nevertheless, this field is still in its infancy compared
to the direction it is heading. Numerous mysteries remain about the underlying mech-
anisms for animal echolocation and also how the biological solution can be exploited
for improving man-made technologies. Ultimately, the persistence of researchers in
this field will be rewarded by a higher level of understanding of acoustic information
processing in the mammalian brain. Although mimicking biosonar may not be an
optimal solution for all aspects of engineered acoustic sensing and imaging, there are


                                             4
a multitude of important applications where biosonar has the potential to change the
way future generations of acoustic imaging systems are conceptualized and designed.


1.2       Significance

1.2.1     Time-Frequency Analysis and the Auditory System

Time-frequency analysis, at the most basic level, is the extraction or interpretation of
information from a signal that varies in time. It has traditionally been understood as
a decomposition of individual sine waves of different frequencies and amplitudes, i.e.,
the Fourier transform and its time-varying counterpart, the short-time Fourier trans-
form (STFT). Considerable effort has been spent on understanding the relationship
between time and frequency, or perhaps time and other domains (e.g. scale). Today,
we have alternative developments such as the quadratic representations (Wigner-
Ville distribution (WVD), Altes Distribution, etc.) [13, 14], the scalogram, fractional
Fourier transform (FrFT) [15], reassignment method [16], wavelets and synchrosqueez-
ing [17]. Most of this work has been toward the creation of tools for humans and
machines to better understand, analyze, and visualize complex time-based signals,
especially for propagating waves in acoustics, electromagnetics (including radar and
light), seismic waves, etc. that are abundant in the real physical world.
        In the field of bio-acoustics, time-frequency analysis is an essential tool for
researchers to understand and interpret the sounds emitted by animals; however, in-
tercepting and recording the sounds of live animals is only part of the problem. We
currently have a great number of mathematical and computational models of the
auditory system. These include models of the cochlea at the molecular level, mechan-
ical micro-models of the elastic basilar membrane, random stochastic models of the
auditory-to-neural transduction, and linear time-invariant (LTI) filter bank models.
There also exist a great number of models that seek to interpret sound mathematically


                                            5
using alternative transforms (e.g. spectro-temporal modulations [18], joint acoustic
and modulation frequency (JAMF) [19]) or higher-order statistics [20]. Despite all of
these models, the basic relationship that links pitch, timbre, and loudness to time-
frequency analysis eludes us, because these characteristics are psychologically and
physiologically induced effects, not physical manifestations of sound. Even so, these
effects are unambiguously understood and agreed upon by all humans when we listen
to the difference between a note played on the piano and that same note played on the
guitar. The relationship between time and frequency within the auditory system is
at the core of understanding the intricate nuances of music, speech, communication,
and biosonar.


1.2.2     Dynamic Behavior and Adaptation in Echolocation

Echolocating animals exhibit a great deal of adaptability with their sonar systems.
This dynamic control is seen in time-frequency pulse design [21], as well as the spa-
tial directivity of the emitted signals [22]. Even at the reception of acoustic echoes,
echolocating animals can rapidly change their receiver directivity patterns by me-
chanical adjustments to the acoustic baffles [23]. The ultimate example of biosonar
adaptation lies within the neural computations of the brain. Short-term plasticity in
the auditory system is responsible for adapting to environmental uncertainties and
maintaining highly precise internal spatial representations [24, 25]. Neural adaptation
is the reason echolocation has been so successful across the many different species of
echolocating bats, dolphins, and whales. Without this adaptation, animals would be
ill-equipped to handle any new challenges found in the natural world. We are now at
the forefront of exploring dynamic behavior in echolocation and have only recently
begun to realize the extent to which it is used [26, 27, 28, 29].
        Understanding the nature of this dynamic behavior in echolocation requires
new and creative approaches to experimental design. For example, past approaches
at measuring beam patterns have been hampered by assuming that transmit and

                                           6
receive beams remain constant from pulse-to-pulse. This choice was partly a conse-
quence of limitations to measurement technology, but also because these assumptions
are highly convenient. Advances in sensing and computing are enabling the creation
of new tools and methods for studying behavioral dynamics that were never before
possible. In particular, field programmable gate arrays (FPGA) are being used to
rapidly build customized digital hardware with increasing complexity. One impor-
tant application for FPGAs is acoustic measurement systems that demand a large
number of data acquisition channels. Data collection must be performed in paral-
lel to maintain synchronous sampling, and without these new technologies, options
are prohibitively complex or expensive to implement. The data volume requirements
that go along with this new capability are also expanding, which implies the use of
high-throughput high-density transceivers and storage devices. One difficulty is that
as sensing and measurement become easier, data dimensionality increases and new
visualization techniques are needed. Fortunately, computing power and data process-
ing have paced sensing developments. Amongst the vast amount of bio-diversity in
echolocating mammals, there remain countless discoveries to make of dynamic be-
havior and physiological adaptations. As researchers, we must acknowledge that our
assumptions may be questionable and find new, intelligent ways of correcting and
validating our hypotheses.


1.2.3   Toward the Design of a Bio-Inspired Broadband Sonar System

The implications of developing a bio-inspired broadband sonar system are profound
and far-reaching. Biosonar is not a merely theoretical development, it is a proven
high-resolution acoustic imaging system that is functional and robust. The excep-
tional performance and adaptability by animal echolocators in the midst of dense
clutter is what draws engineers and scientists to marvel at its simplicity. Section 2.2
describes how conventional beamforming is done and shows a clear example that this
acoustic imaging approach is in wide use today. Advanced sonar systems are consid-

                                          7
ered advanced because they employ some way to improve acoustic imaging perfor-
mance beyond the fundamental limitations imposed by the wavelength-to-aperture
ratio, λ/L. Performance gain always comes with tradeoffs, which could be extra pro-
cessing or making bold assumptions that limit widespread application. Resolution
improvements of 2 to 5 times are immediately championed as a success, but biosonar
has shown that it is possible to achieve the same angular resolution with orders of
magnitude less hardware complexity.
     Besides achieving higher resolution with fewer sensors, biosonar is superior in
numerous aspects over conventional sonar systems. The versatility and adaptability
already mentioned are traits that man-made systems severely lack. Echolocating
bats use strobe groups to avoid pulse-echo ambiguity and increase pulse-repetition
rates when more information is needed. Dynamic usage of echolocation beams is not
new, but the way in which bats, dolphins, and whales direct their beams off-axis is.
Animals are clearly capable of sonar self-calibration as a superior form of matched-
field processing. Any sonar system that can mimic biosonar in these respects would
be capable of functioning in a broader range of environments and situations, such
as dense foliage in air, or cluttered harbors in shallow water. Biomimetic sonar
systems will ultimately bring advanced sensing and imaging capabilities to smaller
autonomous systems and wearable augmented sensing devices for humans.
     In the very near future, a slew of new processing methods will be developed
while attempting to replicate the neural information processing of the auditory sys-
tem. Alongside these developments come the general advancement of neuroscience
on the auditory system. The ability to truly understand and replicate the neural
dynamics and architectures at various stages of the auditory system will bring new
brain-machine interfaces for the hearing impaired. Advances in technology for speech
recognition and synthesis are already showing promise for many commercial applica-
tions, such as automated call routing, portable phone and GPS devices, and instant
language translation. With the advent of such technological advancements, humans


                                         8
are not far from the creation of fully autonomous systems and machines that hear,
interpret, and produce sound in exactly the same manner as animals.


1.3     Dissertation Objectives and Overview
The research objectives of this dissertation are to 1) improve our understanding of
acoustic imaging in biosonar from an engineering perspective, and 2) apply this in-
sight toward the development of a compact bio-inspired broadband sonar system.
Chapter 2 presents the background information necessary for the rest of the disserta-
tion. Chapters 3 and 4 are comprised of recently published methods for bio-acoustic
analysis. In particular, Chapter 3 addresses the need for a set of new time-frequency
analysis methods needed to study multi-harmonic waveforms, such as bat echoloca-
tion signals. This new approach enables bioacousticians to perform multi-component
signal analysis with improved resolution and accuracy. The robust method enables
automatic extraction of useful information from a large ensemble of transmitted sig-
nals. Chapter 4 describes the design and construction of an apparatus for capturing
the beam patterns of bats’ consecutive transmit pulses with high fidelity. Also de-
scribed is a method for processing the acoustic signals to reconstruct the beam pat-
terns for visualization and further analysis. Such a system is unprecedented and will
elucidate the dynamics of bats’ beam patterns during controlled echolocation experi-
ments. Chapter 5 outlines a numerical model of the physical acoustics to understand
the rich set of information available in broadband bio-acoustic echoes. The modeling
approach is unique, because it is the first study of its kind to look in detail at how
broadband signals are transformed in the frequency domain from sound emission to
reception of echoes. This chapter also shows a simple method for first quantifying
the achievable resolution and then analyzing the sensitivity of resolution to changing
environmental parameters. A significant development here is to demonstrate that
high-resolution can be achieved using only a few transducers without any complex


                                          9
acoustic baffles. Chapter 6 presents a discussion on applications, future directions,
and concluding remarks. Finally, Appendix A describes a biophysical model of the
bat’s auditory peripheral system and demonstrates a simple example of event-based
neuronal coincidence detection.


References
 [1] W. Au and J. Simmons, “Echolocation in dolphins and bats”, Phys. Today 60,
     40–45 (2007).
 [2] A. Surlykke and C. F. Moss, “Echolocation behavior of big brown bats, Eptesicus
     fuscus, in the field and the laboratory”, J. Acoust. Soc. Am. 108, 2419–2429
     (2000).
 [3] A. Surlykke, P. E. Nachtigall, R. R. Fay, and A. N. Popper, eds., Biosonar,
     volume 51 of Springer Handbook of Auditory Research (Springer, New York)
     (2014).
 [4] D. Hartley and R. Suthers, “The sound emission pattern of the echolocating bat,
     Eptesicus fuscus”, J. Acoust. Soc. Am. 85, 1348–1351 (1989).
 [5] M. Aytekin, E. Grassi, M. Sahota, and C. Moss, “The bat head-related transfer
     function reveals binaural cues for sound localization in azimuth and elevation”,
     J. Acoust. Soc. Am. 116, 3594–3605 (2004).
 [6] N. H. Fletcher and S. Thwaites, “Obliquely truncated simple horns: Idealized
     models for vertebrate pinnae”, Acustica 65, 194–204 (1988).
 [7] J. A. Simmons, S. A. Kick, B. D. Lawrence, C. Hale, C. Bard, and B. Escudie,
     “Acuity of horizontal angle discrimination by the echolocating bat, Eptesicus
     fuscus”, J. Comp. Physiol. A 153, 321–330 (1983).
 [8] J. Wotton, T. Haresign, M. Ferragamo, and J. Simmons, “Sound source elevation
     and external ear cues influence the discrimination of spectral notches by the big
     brown bat, Eptesicus fuscus”, J. Acoust. Soc. Am. 100, 1764–1776 (1996).
 [9] D. H. Johnson and D. E. Dudgeon, Array Signal Processing: Concepts and Tech-
     niques (Prentice Hall PTR, Upper Saddle River, NJ) (1993).
[10] C. Moss and J. Simmons, “Acoustic image representation of a point target in
     the bat Eptesicus fuscus: Evidence for sensitivity to echo phase in bat sonar”,
     J. Acoust. Soc. Am. 93, 1553–1562 (1993).
[11] J. A. Simmons and J. E. Gaudette, “Biosonar echo processing by frequency-
     modulated bats”, IET Radar Sonar Navig. 6, 556–565 (2012).

                                         10
[12] J. Wotton and J. Simmons, “Spectral cues and perception of the vertical position
     of targets by the big brown bat, Eptesicus fuscus”, J. Acoust. Soc. Am. 107,
     1034–1041 (2000).
[13] A. Papandreou, F. Hlawatsch, and G. Boudreaux-Bartels, “The hyperbolic class
     of quadratic time-frequency representations. I. Constant-Q warping, the hyper-
     bolic paradigm, properties, and members”, IEEE Trans. Signal Process. 41,
     3425–3444 (1993).
[14] A. Papandreou-Suppappola and L. T. Antonelli, “Use of quadratic time-
     frequency representations to analyze cetacean mammal sounds”, Technical Re-
     port 11,284, Naval Undersea Warfare Center, Newport, RI (2001).
[15] H. M. Ozaktas, M. A. Kutay, and D. Mendlovic, “Introduction to the fractional
     Fourier transform and its applications”, Adv. Imag. Elect. Phys. 106, 239–291
     (1999).
[16] F. Auger and P. Flandrin, “Improving the readability of time-frequency and time-
     scale representations by the reassignment method”, IEEE Trans. Signal Process.
     43, 1068–1089 (1995).
[17] F. Auger, P. Flandrin, L. Qiang, S. McLaughlin, S. Meignen, T. Oberlin, and
     H.-T. Wu, “Time-frequency reassignment and synchrosqueezing: An overview”,
     IEEE Signal Process. Mag. 30, 32–41 (2013).
[18] T.-S. Chi and C.-C. Hsu, “Multiband analysis and synthesis of spectro-temporal
     modulations of Fourier spectrogram”, J. Acoust. Soc. Am. 129, EL190–EL196
     (2011).
[19] L. Atlas and S. A. Shamma, “Joint acoustic and modulation frequency”,
     EURASIP Journal on Applied Signal Processing 2003, 668–675 (2003).
[20] S. Bourennane and A. Bendjama, “Locating wide band acoustic sources using
     higher order statistics”, Applied Acoustics 63, 235–251 (2002).
[21] S. Hiryu, M. E. Bates, J. A. Simmons, and H. Riquimaroux, “FM echolocating
     bats shift frequencies to avoid broadcast-echo ambiguity in clutter”, Proc. Natl.
     Acad. Sci. U.S.A. 107, 7048–7053 (2010).
[22] N. Matsuta, S. Hiryu, E. Fujioka, Y. Yamada, H. Riquimaroux, and Y. Watan-
     abe, “Adaptive beam-width control of echolocation sounds by CF-FM bats, Rhi-
     nolophus ferrumequinum nippon, during prey-capture flight”, J. Exp. Biol. 216,
     1210–1218 (2013).
[23] L. Gao, S. Balakrishnan, W. He, Z. Yan, and R. M¨      uller, “Ear deformations
     give bats a physical mechanism for fast adaptation of ultrasonic beam patterns”,
     Phys. Rev. Lett. 107, 214301 (2011).
[24] B. J. Fischer, L. J. Steinberg, B. Fontaine, R. Brette, and J. L. Pe˜ na, “Effect
     of instantaneous frequency glides on interaural time difference processing by au-
     ditory coincidence detectors”, Proc. Natl. Acad. Sci. U.S.A. 108, 18138–18143
     (2011).

                                         11
[25] R. Rao and T. Sejnowski, “Spike-timing-dependent Hebbian plasticity as tem-
     poral difference learning”, Neural Comput. 13, 2221–2237 (2001).
[26] L. N. Kloepper, P. E. Nachtigall, M. J. Donahue, and M. Breese, “Active echolo-
     cation beam focusing in the false killer whale, Pseudorca crassidens”, J. Exp.
     Biol. 215, 1306–1312 (2012).
[27] P. H. S. Jen, “Adaptive mechanisms underlying the bat biosonar behavior”,
     Front. Biol. 5, 128–155 (2010).
[28] M. Aytekin, B. Mao, and C. F. Moss, “Spatial perception and adaptive sonar
     behavior”, J. Acoust. Soc. Am. 128, 3788–3798 (2010).
[29] P. W. Moore, L. A. Dankiewicz, and D. S. Houser, “Beamwidth control and angu-
     lar target detection in an echolocating bottlenose dolphin (Tursiops truncatus)”,
     J. Acoust. Soc. Am. 124, 3324–3332 (2008).


                                         12
Chapter 2

Background
This chapter introduces background material relevant to the research found in the
next several chapters of the dissertation. The first section reviews the general mam-
malian auditory system, auditory cues for passive sound source localization, and
specializations that enable bats to perform high-resolution active acoustic imaging.
Following Section 2.1, Section 2.2 describes the current technological means of acous-
tic imaging and contrasts conventional array signal processing with the biosonar solu-
tion. The last background topic in Section 2.3 introduces the model-based approach
to understanding and replicating biosonar and discusses recent progress in this area.


2.1     Acoustic Information Sensing and Processing
        by Mammals
Acoustic waves are produced and sensed by nearly all motile animals. Sound provides
a fundamental means of communication, detection and classification of predator and
prey, localization of sound sources, and orientation relative to the immediate envi-
ronment. Most animals rely upon sound for survival, but a select few have developed
a refined sense of hearing. Nocturnal birds such as the barn owl excel at passive
localization for capturing prey at night [1]. A specialized group of mammals (e.g.
microchiropteran bats and odontocetes) have evolved to use acoustic waves as their
primary active sense in the absence of visual information in the electromagnetic spec-
trum [2]. These echolocating mammals have developed an extreme acuity and agility

                                         13
with which their external world is precisely reconstructed from the stream of echoes
received; however, the exact physical and neuronal mechanisms responsible for this
precision are not well understood nor are they matched by any existing technological
system. The following sections provide a brief overview of the mammalian auditory
system, acoustic neural information processing, sound source localization by mam-
mals, and specializations required for echolocation.


2.1.1     The Mammalian Auditory System

The mammalian auditory system utilizes a complex set of sensory organs at its periph-
ery – the external ear, ossicular chain, and cochlea – that are tightly integrated with
neural circuitry in the cochlear nucleus (CN) by way of the auditory nerve (AN) fibers
as illustrated in Figure 2.1 [3]. Originating in the CN there are multiple ascending,
as well as descending pathways throughout the auditory system [4, 5]. While many
of these pathways are monaural, there are several neural stages where specific nuclei
in the midbrain receive bilateral input and integrate the information between the
ipsilateral and contralateral auditory circuitry (e.g. superior olivary complex (SOC)
and inferior colliculus (IC)). The entire auditory system from the cochleae up through
the auditory cortex (AC) has a tonotopic organization where neural nuclei at specific
regions of the brain appear to be spatially organized by frequency selectivity.
        The location where acoustic-to-neural transduction occurs is within the inner
hair cells (IHC) of the cochlea [9, 10]. The primary information required to localize
sound sources lies in the onset response of IHCs tuned to different frequencies [11].
AN fibers mark the onset of sound with a time-delay (i.e. first spike latency) that is
related non-linearly to the acceleration of the acoustic pressure waves [12, 13, 14, 15,
16]. Subsequent neural spikes encode other features of the sound, such as duration,
intensity, and relation to other frequency channels. Beginning with the AN fibers,
all acoustic information is carried by neural spikes throughout the complex of neural
pathways mirrored on either side of the brain. Neural spikes are essentially point

                                          14
                            AC                                                                              AC
                                                             MGB                     MGB


                                              IC                                               IC


                                            NLL                                               NLL

                                           DNLL                                               DNLL

                                            INLL                                              INLL

                                           VNLLc                                             VNLLc

                                          VNLLm                                              VNLLm


                                                            SOC                       SOC

                                                   LSO
                                                    ILD
                                                                  MSO
                                                                   ITD
                                                                                   MSO      LSO


                                                            NTB                       NTB

                                                  LNTB        MNTB                 MNTB     LNTB


                                            CN                                                    CN

                                          DCN
                                            AGC
                                                                                                   DCN
                                           PVCN
                                            Spectral
                                                                                              PVCN
                                             AVCN                                            AVCN
                                      c            Timing


                       AN
                              b
                      OHC
                  d         IHC
                                  a


                      Cochlea                                            Midline                         Cochlea


Figure 2.1. The mammalian auditory system mapped from the cochlea to the cortex. Monaural
and binaural projections from one cochlea are shown. Auditory input from the right cochlea has
been omitted for clarity, but all pathways are mirrored across the brain’s midline. Excitatory and
inhibitory synaptic connections are marked by triangles and bars, respectively. (a) Acoustic-to-
neural transduction begins with the inner hair cells (IHC) of the cochlea. (b) The auditory nerve
(AN) fibers respond to the neurotransmitter chemicals released by the IHC in response to sound
pressure waves. (c) The cochlear nucleus (CN) receives all ipsilateral AN inputs in three subregions:
dorsal, anteroventral, and posteroventral cochlear nucleus (DCN, AVCN, PVCN). (d) The DCN
projects efferent connections to the outer hair cells (OHC) in the cochlea, which are thought to
provide a mechanism for automatic gain control by amplifying the mechanical vibrations in the
cochlea’s basilar membrane (BM). Numerous specializations have been identified in echolocating
bats, including a significantly hypertrophied IC and peculiarly organized VNLLc [6, 7, 8]


                                                                           15
processes, where the probability of a neuron firing a spike is proportional to the group
activity level of attached synapses in the network. Acoustic events are encoded by
the stochastic response of neural populations tuned to different amplitude ranges. To
date, the relationship between morphological connectivity and physiological functions
of the mammalian auditory system is not completely understood [17].


2.1.2     Neural Information Processing by the Auditory System

At the peripheral stage of the mammalian cochlea, acoustic information arrives rapidly
compared to the time scale of a single neural spike [12]. To encode this information,
AN fibers that innervate the cochlea must remain highly sensitive to acoustic stimuli,
but this also increases spontaneous spiking (i.e. noise)[13]. To compensate for this,
AN fibers are overrepresented at each narrow frequency band along the cochlea’s
basilar membrane (BM). The frequency selective regions along the BM contain many
redundant IHCs, and every IHC has many redundant AN fibers synapsed to it. As
the BM is deflected in response to an acoustic wave, IHCs release bursts of neuro-
transmitter, and AN fibers take up this neurotransmitter to respond with a spike sent
into the CN [18]. The simultaneous coincidence of neural spikes from many redun-
dant AN fibers is the reason the auditory system is able to encode precisely timed
acoustic information. Coincidence detection is therefore a critical responsibility of
the CN and it is performed through the population response of a large number of AN
fibers – essentially averaging out the noise of spontaneous responses [19].
        The CN is the gateway of acoustic information into the brain, because this is
where all AN fibers innervate. If precision of spike timing is important anywhere in
the brain, it is here in the CN, because once this precisely timed acoustic information
is lost it cannot be recovered through any amount of data processing [20]. The CN
contains an assortment of cell types, many of which are not fully understood [21, 22,
23]. Above the CN, a large portion of the neural complex in the auditory brainstem
is used in the feedback necessary for motor control and does not contribute directly

                                          16
to sound source localization; for example, reflexes controlling head aim or automatic
gain control of the OHC in the cochlea and muscles [24].
        There are a class of general models of neural information processing that are
based on registering the timing of spikes across different neurons (i.e. coincidence
detection cells) [25]. These models are usually put forth as generalized networks of
cortical information processing using the timing of individual spikes across cells rather
than conventional spike-rate codes [26]. The relevance of these models to auditory
processing in the brainstem is that specific spike timing models have been proposed
for the perception of sound pitch [15, 27, 14, 12, 28], for sound localization using
interaural timing cues [29, 1], and for determination of target range of echo delay in
bats [30, 31, 32]
        Many attempts have been made at understanding and quantifying the informa-
tion content in neural spikes, particularly with respect to precise timing [33, 34, 35,
36]. Neural spikes must carry all information about peripheral stimuli throughout the
brain and the brain must be able to interpret this information without any supplemen-
tary guidance [37]. Synfire chains, for example, are models where spike timing plays a
crucial role in self-constructing complex binding networks and compositionality [25].
Polychronization has also surfaced as a neural information processing mechanism that
relies upon understanding the neuronal dynamics [38, 39]. Effectively, all spike timing
models can be reduced to having coincidence detecting neurons at a higher level look-
ing downward to detect the simultaneity of spikes along multiple inputs. For sound
localization, even at the level of the AC, “spatial acoustic information is represented
by relative timings of pyramidal cell output” [40].


2.1.3     Auditory Cues for Passive Localization in Biological Systems

Traditionally, the mammalian auditory system has been understood as having two
primary methods for localizing sound sources: Interaural time difference (ITD) and
interaural intensity difference (IID). Recent work has shed light on a third critical

                                           17
piece of information, which is the angular dependent spectra of broadband sounds,
also known as the head-related transfer function (HRTF) [17].
      ITD is the relative time delay for a propagating sound wave to reach both ears.
This delay is used by mammals to localize a sound’s point of origin. In perceptual
tasks, human listeners are typically presented with sounds from an array of loud-
speakers or a stereo headset and are asked to localize the source [41, 42, 43]. Based
on early psycho-acoustic results from tonal stimuli, ITD was historically only consid-
ered useful for frequencies with a wavelength greater than the distance between ears.
The reason ITD works in these experiments is that the neural response to continuous
tones can phase lock on each period of the wave and encode location based on the
relatively small time difference between ears [29, 19]. Since the refractory period for
neural spikes exceeds the time period for frequencies above approximately 1 kHz,
ITD is generally considered useful for low-frequency sound source localization in the
horizontal plane, or azimuth [44].
      These ITD experimental results are not valid for sounds that occur naturally,
especially for echolocation signals. The primary reason is that acoustic signals in
nature are not continuous pure tones; but are instead short transient waveforms. For
example, the broadband clicks produced by echolocating dolphins and short frequency
modulated pulses by bats consist of frequencies well above the phase locking threshold,
yet ITD is a crucial auditory cue for these animals. Such short transient signals
contain very few cycles within a particular frequency band and there are not enough
wave periods to phase-lock. Instead of phase locking, the auditory system encodes
the onset response to these transient events with extremely high timing precision –
approximately 100 µs [3] in a general mammalian model, which is 10 times less than
the width of a single neural spike [15, 45, 46]. These acoustic signals arrive relatively
sparsely in time, leaving sufficient margin for auditory neurons to recover from their
refractory period before the next sound event.
      IID is the acoustic intensity difference between each ear and has been attributed


                                           18
as a major auditory cue for high-frequency sound localization in azimuth. For humans,
the head acts as an acoustic baffle, masking contralateral sound sources such that the
two ears receive different amplitude levels. In other mammals commonly studied
(e.g. cats and guinea pigs), the ears are positioned more dorsal and rostral than
primates, so the head does not play as large of a role. Nevertheless, the structure of
the external ear, or pinna, in many of these mammals can be reasonably approximated
as obliquely truncated horns [47, 48]. These horns provide spatial directivity, which
means that the amplitude of a sound wave changes depending upon the angle of
incidence. Therefore, IID is manifested in these animals by the shape and orientation
of the external ears that form acoustic receiving baffles.
        One notable problem with the basic concept of IID is that it does not encode
sufficient information to localize sound sources in elevation. Most acoustic signals in
nature are inherently broadband or at least contain some degree of harmonic structure
and span multiple frequencies. When a signal arrives at the ears, each acoustic
baffle modifies the sound by encoding unique spectral characteristics for any given
angle. Therefore, to localize sounds in elevation, the full spectrum of a received
sound is compared with the a priori spatial intensity patterns of the ears, which
is the HRTF [49]. The HRTF is a complicated function of frequency and angle,
but this complexity is necessary to encode a unique spectrum for any particular
direction, either monaurally or binaurally. One important piece that is missing from
the truncated horn model is the tragus, which encodes notches specifically used for
vertical localization [50, 51]. The full spectral characteristics of the HRTF are not
only useful for localization in elevation, but also azimuth and range.


2.1.4     Specializations for High-Resolution Active Acoustic Imaging

The passive localization cues as described above are commonly exploited by many
species [52, 28]. The active perception systems of echolocating bats, dolphins, and
whales have improved upon passive hearing mechanisms by broadcasting high fre-

                                          19
quency acoustic sounds into the environment, whose echoes can then be accurately
localized. In this sense, acoustic echoes are just sound sources originating from many
different reflecting objects. Thus, echolocation enables precise control over the acous-
tic localization process and results in high-resolution spatial images from the contin-
uous flow of information [2].
      From the same basic mammalian auditory system, echolocators have evolved
to fit the specific needs prescribed by individual echolocation strategies [53]. The
types of specializations extend from the physical acoustic baffles of sound reception
and transmission [47, 54], to the specific waveforms used for echolocation [55], and
even throughout the brain at the various neural complexes [6]. These biological
specializations can be thought of as an iterative process of design optimization. The
biosonar optimization criteria are not just maximizing performance (e.g. acoustic
field-of-view, spatial resolution, signal-to-noise ratio); an equally important criterion
for animals is minimizing the energy required to achieve “good-enough” performance.
As a result, evolution has produced significant biodiversity in echolocating mammals
while still maintaining the minimalist approach to acoustic design.
      The sound production mechanisms are one of the most important developments
for echolocation. Marine mammals such as dolphins and toothed whales produce
sound through a highly unique structure in the melon of their head [56, 57]. The
intense sounds are produced pneumatically by forcing air through a set of phonic
lips, recapturing the air held in sacs, and repeating the process. The broadband
echolocation signals are best described as short transient “clicks” that are typically
on the order of 10 to 100 µs in duration. The sound pressure waves are guided by
bone and tissue through lipids, or acoustic fats, in the melon where it is then prop-
agated outward into the water [58, 57, 59]. Bats have evolved their echolocation
strategies to fit a particular foraging environment [47]. The result is an extremely
diverse set of acoustic baffle structures and echolocation waveforms. For example,
to augment their vision Egyptian fruit bats (Rousettus aegyptiacus) echolocate using


                                           20
broadband transient “clicks” of their tongue [60]. Other bats (mostly from the subor-
der Microchiroptera) emit a variety of frequency modulated signals using the larynx
through either the oral or nasal cavities. The noseleaf structures of nasally emit-
ting bats are notoriously complex and prominent [61, 62]. The types of echolocation
waveforms may be classified as frequency modulated (FM), constant frequency (CF),
or both (CF-FM) [63]. CF waveforms are useful for bats detecting Doppler shifts
from moving prey in an open environment [6]. FM waveforms provide excellent range
resolution and are better suited for operating in densely cluttered environments, but
are Doppler invariant [64]. The echolocation signals produced by both bats and dol-
phins are usually stereotypical such that a particular species can be identified by the
characteristics of its time-frequency signature.
      The reception of acoustic waves by echolocating mammals is hyper-sensitive [65].
Although the sounds emitted for echolocation are generally high intensity, the re-
flected signals that return to the ears are many orders of magnitude lower. The
dissipation and absorption of acoustic energy enforces an upper limit on the useful
range of animal echolocation. To compensate, echolocators have evolved auditory
systems with high sensitivity and large dynamic range. Many of these specializations
exist within the brain, such as an overrepresentation of AN fibers in the cochlea,
hypertrophied auditory nuclei (e.g. IC, CN, and LL) [6], and extreme timing preci-
sion at the early neural processing stages [7]. Other specializations appear obvious,
such as acoustic baffles and directivity patterns that are well matched to the emitted
sounds [47]. Perhaps not-so-obvious is the mechanism by which underwater marine
mammals receive acoustic echoes. Although the topic was historically controver-
sial [66, 67, 68, 69, 70, 71], dolphins and toothed whales receive sounds bilaterally
at the mandible. The hollow bone structures form an acoustic waveguide for sound
pressure waves to travel within acoustic fats and to each inner ear [58, 57]. There
are certainly many other neurological and anatomical specializations for echolocation
that have yet to be discovered.


                                          21
      The role of vision in echolocating animals depends upon the species. Some
mammals (i.e. Megachiroptera and Delphinids) rely a great deal on vision for guid-
ance, foraging, and other routine behaviors. However, animals that must function
in the complete absence of light use their auditory system as the primary sensory
modality. In these animals vision can still aid the senses to some degree, but the en-
vironment is actively probed and perceived through sound. A fundamental question
is, what do these animals “see” in terms of acoustic images and how does it differ
from vision?
      Spatial resolution provides a direct measure of the three-dimensional image
quality perceived by echolocating animals. In this context, resolution is the minimum
spacing between two distinct acoustic echoes that can be unambiguously differenti-
ated [72]. Spatial resolution is typically characterized by three separate, but related
quantities: Angle, range, and range-rate (i.e. Doppler) [73]. Angular resolution can
be further separated by azimuth and elevation. Echolocating mammals such as the
big brown bat (Eptesicus fuscus) and the bottlenose dolphin (Tursipos truncatus) are
well-known for their high-resolution sonar systems, especially in range [2]. Although
high-resolution is a subjective term, in the context of biosonar it refers to the abil-
ity of an echolocating bat, dolphin, or whale to perceive spatial images with greater
detail than a man-made sonar given the same set of signals and acoustic apparatus.
      One aspect of echolocation that has been studied extensively is the extreme
range-resolution for bats [30, 74, 75, 76, 77, 78, 79, 80] and cetaceans [59, 81, 82].
When two or more acoustic waves overlap in time, they constructively and destruc-
tively interfere to produce spectral interference patterns. The big brown bat (E.
fuscus) exploits these patterns of interference to deconvolve the echoes and produce
a “hyper-resolution” image in range. These broadband spectral patterns have been
shown to persist throughout the auditory system in this species [83, 84, 85] and appear
to contribute reliable information to the bat’s acoustic imaging process.
      Angular localization, in general, has been studied behaviorally [86, 87, 88, 89,


                                          22
90], analytically [91], and computationally [58, 92, 93, 94]; however, angular perfor-
mance in the presence of multiple closely-spaced targets (i.e. angular resolution, as
defined above) has not been a primary focus. Nevertheless, a few experiments do exist
where angular resolution was directly or indirectly measured in E. fuscus [95, 96] and
T. truncatus [86]. Behavioral evidence has shown that E. fuscus primarily utilizes the
spectral notches encoded by its HRTF to encode elevation information [51, 87, 88, 89].
In addition, recent work has shown that off-axis echoes of echolocation signals can be
completely rejected even when overlapping in time [97, 98]; an echolocation version
of the cocktail-party problem.
      Decades of behavioral studies have been performed on bats, dolphins, and
whales to provide additional clues about the resolution limits of echolocation. Unlike
bats, however, echolocation research in marine mammals is restricted to behavioral
tasks and infrequent necropsies from strandings. Furthermore, the costs associated
with marine mammal research are much greater, because of substantial investment
in acoustic facilities, the larger physical size of the animals and all their supporting
equipment and food, and the difficulties with testing in an aquatic environment. For
these reasons, significantly more is known about echolocation in bats; specifically the
neurophysiological and morphology of the auditory system.
      Regardless of the type of echolocation waveforms used by bats (i.e. CF or
FM), a common signal characteristic is the presence of multiple harmonics. Multi-
harmonic waveforms have the advantage of increasing the natural bandwidth of a
signal to one or more octaves, significantly improving performance in range [72]. The
relative phase coherence between harmonics in an echo is also important for angular
imaging [99]. Furthermore, given that broadband spectral information are the only
known mechanism by which bats can localize echoes in elevation, it seems unlikely
that they would successfully evolve by emitting a narrowband CF pulse having only
a single component – exactly the type of waveforms that pervade man-made sonar.


                                          23
2.2        Acoustic Imaging in Technological Systems
The technological development of acoustic imaging was borne out of necessity. In
seawater, the electromagnetic radiation spectrum is significantly attenuated by the
density of the medium [100, 101], which means that neither the visible light spec-
trum nor radio waves are useful beyond very short distances. This fact is particularly
troublesome in naval applications, where information is critical to situational aware-
ness for large ships, submarines, and unmanned undersea vehicles. The problem is
addressed by using acoustic waves since they propagate quickly over long distances,
exhibit strong reflections, and pass relatively uninhibited in the dense medium [102].
        The invention of piezoelectric materials enabled the design of acoustic trans-
ducers to convert electrical signals into sound pressure waves and vice versa. Early
devices were fairly basic and consisted of a single source and receiver that permitted
echo ranging in the open ocean [103]. With the coupling of multiple piezoelectric
sensors came the advent of array signal processing and the ability to produce cross-
range images of objects from sound waves [102, 104]. Apart from its undersea origins,
acoustic imaging has found uses in a wide range of applications such as biomedical
diagnostics, geophysical tomography, and devices for the visually impaired.


2.2.1     Conventional Array Signal Processing

Array signal processing is the method used to produce images from an array of dis-
crete acoustic elements. The critical piece of information to localize sound sources is
the relative time delay of acoustic waves as they propagate across the entire array.
With knowledge of the array geometry and the speed of sound propagation, pressure
waveforms at each transducer element can be delayed in time and summed to cor-
respond with any incident direction (defined as the steered angle). This concept is
known as a delay and sum beamformer and represents the most basic idea in acoustic
imaging.


                                           24
       When an acoustic wave arrives from a direction matching the steered angle,
the correlated signals combine additively and the beamformer produces the strongest
response. An acoustic wave arriving from some different angle will not align properly
and the beamformer produces a weakened response due to lack of correlation. Noise,
which can be acoustic, thermal, or electronic will not produce a strong response
unless it is correlated in time between elements. For example, in the presence of
uncorrelated ambient noise, a correlated signal across N array elements will have an
improved signal-to-noise ratio (i.e. array gain) of 10 log10 N , one important advantage
to using an array [105, p. 306].
       In practice, a beamformer is almost always implemented in the frequency do-
main [106, 107], since discrete-time delays would require high-order interpolation or
fractional delay filters [108]. The response of an N element array at frequency, f , for
the steered angle, θ, is computed as

                                             N
                                             X
                                Y (f, θ) =         dj (f, θ)wj Xj (f )               (2.1)
                                             j=1


where dj (f, θ) is the delay (frequency and angle dependent) of the j th element, wj
is the aperture shading coefficient applied to element j, and Xj (f ) is the frequency
domain data of the j th element [109, Ch. 4]. In the frequency domain, dj (f, θ) is a
phase shift that is equivalent to the time delay relative to some fixed point on the
array, given f and θ:


                                      dj (f, θ) = e−ik∆j .                           (2.2)

Here, k = 2π/λ is the acoustic wavenumber and ∆j is the distance from element j to
a fixed reference point along the projected direction θ. ∆j = ~δj · ζ~ for distance vector,
~δj , and unit vector, ζ~ = eiθ .

       In matrix form, Equation 2.1 simplifies to


                                                   25
                                 Y (f, θ) = df (θ)WxTf                             (2.3)


where df (θ) is the 1 × N steering vector of complex phase delays, W is a diagonal
N × N aperture shading matrix, and xf is the 1 × N complex data vector (T denotes
the transpose), all corresponding to frequency, f [110].


2.2.2     Beam Patterns and Angular Resolution

A commonly used method to describe an array’s imaging performance is through the
directivity, or beam pattern. The beam pattern of an array is simply the beamformer’s
angular response to an ideal unity-power acoustic source located in the direction of
ψ. This can be computed by replacing the complex data vector, xf , in Equation 2.3
by the complex steering vector, df (ψ):


                              D(f, θ) = df (θ)Wdf (ψ)T .                           (2.4)

For a line array, when df (ψ) is steered to 0◦ all of its elements are equal to 1 and we
are left with the array’s natural response, D(f, θ) = df (θ)W. Figure 2.2 illustrates
the beam pattern of an N = 10 element uniformly-spaced line array steered to 0◦ and
45◦ at two different frequencies. With proper element spacing, d ≤ λ/2, the beam
pattern response is approximately D(f, θ) = sinc(L/λ cosθ), for an array aperture
length, L = d(N − 1).
        A phase-delay beamformer is equivalent to applying a Fourier transform in the
spatial domain. As such, the discrete elements suffer from spatial aliasing in exactly
the same way as a signal sampled in the time domain. The presence of grating lobes
is simply an aliasing artifact introduced by designing an array with improper element
spacing (d > λ/2). The consequence is that there will be ambiguity regarding what
angle the sound wave originated from. There is also a direct corollary between the


                                          26
                          Beam Response (ψ=0°, N=10, d=1.72cm)                                           Beam Response (ψ=45°, N=10, d=1.72cm)
                   10                                                                             10
                                                                         A                                                                           C

      Mag. (dB)


                                                                                     Mag. (dB)
                    0                                                                              0

                  −10                                                                            −10

                  −20                                                                            −20

                         −80   −60    −40   −20   0     20     40   60   80                             −80   −60   −40   −20   0     20   40   60   80
                                             10 kHz        60 kHz                                                          10 kHz      60 kHz

                    1                                                                              1
                                                                         B                                                                           D
    Amplitude


                                                                                   Amplitude
                   0.5                                                                            0.5
                    0                                                                              0
                  −0.5                                                                           −0.5
                   −1                                                                             −1
                         −80   −60    −40   −20   0     20     40   60   80                             −80   −60   −40   −20   0     20   40   60   80
                                        Bearing Angle, θ (deg.)                                                       Bearing Angle, θ (deg.)


Figure 2.2. Beam patterns are the angular response of an array due to the presence of an ideal
acoustic source located in the steered direction. They are traditionally plotted on a log-magnitude
scale and phase is ignored, but in reality the response exhibits a 180◦ phase reversal when the
amplitude response becomes negative. Shown here are example beam patterns in air from a line
array of N = 10 omni-directional elements that are spaced at d = 1.72 cm. Steer angles are plotted
for 0◦ (a - log, b - linear) and 45◦ (c - log, d - linear). No aperture shading function is applied to
this example, so W is the identity matrix. Each plot shows two different frequencies, 10 kHz (blue)
and 60 kHz (green), which correspond to proper element spacing of λ/2 and undersampled spacing
of 3λ, respectively. The width of the main lobe is one measure of angular resolution. Although the
60 kHz pattern has better resolution, the elements are not spaced properly and the result becomes
ambiguous due to grating lobes. Regardless of frequency, the main and sidelobe responses are wider
at angles off to the side. This is due to the effective array aperture decreasing with the cosine of the
angle, θ.


window function used for spectral analysis and the array aperture shading function
used in array signal processing. Selecting the aperture shading weights is a tradeoff
between mainlobe resolution and sidelobe reduction [111, Ch. 10].
                  The angular resolution of a uniformly spaced line array can be defined as the
minimum angular spacing between two point sources of equal strength, whereby both
can be simultaneously resolved [109, p. 142]. This limit occurs at the half-power
beam width, β, of the beam pattern’s mainlobe and can be approximated through
series expansion [110] as

                                                                                                 
                                                      −1                λ         −1              λ
                                     β(ψ) ≈ sin             cosψ − γwin     − sin     cosψ + γwin                                                         (2.5)
                                                                        L                         L

where γwin is an aperture shading constant (e.g. γwin = 0.402 for uniform weighting;
γwin = 0.484 for 26-dB Chebychev weighting). L and λ are the array aperture length


                                                                              27
and wavelength, as defined previously. As seen in Figure 2.2, β is dependent upon the
steer angle, ψ. The maximum achievable resolution for a line array is when ψ = 0◦ :

                                                                 
                                                     −1         λ
                                      β3dB ≈ 2sin          γwin     .                                 (2.6)
                                                                L


These equations show that resolution of an array is critically dependent upon the
ratio, λ/L. By increasing the aperture of the array, L, this improves the resolution
by reducing the width of the main lobe. Alternatively, resolution can be improved
by increasing the operating frequency, thereby reducing λ. It is clear that improv-
ing resolution requires adding more elements, finding ways to increase the effective
aperture, or handling insufficient element spacing in some other way.
       Under conventional beamforming, acoustic imaging is achieved by iterating the
beamformer through multiple overlapping angles, θ, and repeating over subsequent
time windows.1 The magnitude of each complex result from Equation 2.3 is plotted
at the corresponding range and angle to produce an image of the spatially distributed
acoustic energy. Figure 2.3 shows an example of high-resolution acoustic imaging in
the range-azimuth plane produced by beamforming the underwater sonar data from a
shipwreck. Images from consecutive transmit-receive cycles on a moving vehicle can
be stitched together to map a much larger area. This concept of acoustic imaging
with conventional beamforming is readily extended to a second angular dimension
(e.g. azimuth and elevation).
       There is an impressive amount of literature on the various theories, methods,
and implementations that improve upon classical array signal processing as described
above. Some noteworthy techniques are sub-optimally spaced arrays (e.g. sparse, co-
prime, Costas) [112, 113, 114, 115], synthetic aperture sonar (SAS) [116, 117, 118],
and monopulse direction finding [119, 120]. Other methods, such as split aper-
   1
     Time, t, corresponds directly to range, r, in an active sonar system. The translation is r = tc/2, where
c is the speed of sound in the medium and the factor of two accounts for the two-way propagation path.


                                                     28
Figure 2.3. The concept of acoustic imaging in the range-azimuth plane is demonstrated using ac-
tive underwater sonar data collected from the site of a shipwreck in Narragansett Bay, Rhode Island.
The sonar array (SeaBat 7130 prototype, Teledyne-Reson, Denmark) is a forward-looking 635 kHz
line array with N = 256 elements spaced at λ/2 (d = 1.1 mm, L = 0.3 m). The active transmit
waveform is a 17 ms, 30 kHz linear FM pulse (4.7% bandwidth-to-center-frequency ratio). This
image was produced from a single transmit-receive cycle (66 ms) using a phase-delay beamformer
and has 0.48◦ angular resolution at θ = 0◦ . Brightness in the image corresponds to the beamformer’s
magnitude response when steered at a particular range and azimuth. The brightest locations are
specular reflections alongside the ship’s hull and the darker red areas consist mostly of returns from
the sea floor. Two faint rings of energy can be seen around 17 and 21 m, which are caused by the
most intense ship reflections being present in the sidelobes when steered to other angles. The large,
well-defined dark region behind the ship is an acoustic shadow created from the occlusion of acoustic
energy by the ship. Note that the beams are only steered to ±60◦ due to limited transmit beam
coverage and widening receive beams. Data were collected and processed by the Naval Undersea
Warfare Center, Newport, RI.


ture processing [105, p. 329] and Vernier interferometry [121, 122], are based off
of the narrowband phase comparison between widely spaced elements. A variety
of high-resolution techniques have been applied successfully, but performance de-
grades when their many assumptions break down (e.g. minimum variance and adap-
tive beamforming [105, 123, 124], eigenvector and multiple-signal classification (MU-
SIC) [102, 105, 125, 126], and matched field processing [127, 128]). There have also


                                                 29
been some interesting departures from the traditional line-array concepts; in partic-
ular, blazed arrays [129, 130] and vector sensing2 [133, 134]. This is by no means
an exhaustive list of existing high-resolution angular techniques in array signal pro-
cessing. A full review of this field lies beyond the scope of this section, but we can
generalize many of these methods with respect to their intended goals and the infor-
mation they use for acoustic imaging.
       Array signal processing traditionally uses the signal correlation and time delay
between elements to localize sound sources and perform acoustic imaging. Many of the
advanced techniques mentioned above serve to improve array resolution beyond the
aperture constraints in Equations 2.5 and 2.6. They often achieve these performance
gains at great cost by increasing the effective aperture, synthesizing more elements,
or taking advantage of destructive interference of grating lobes and sidelobes. By
contrast, biosonar uses very broad beam patterns and exploits the additional infor-
mation contained in broadband, multi-harmonic signals. This enables bio-inspired
broadband sonar to achieve high-resolution acoustic imaging with extremely small
apertures and a minimal number of sensors. In this manner, biosonar represents a
significant departure from the conventional approach that is in common use today.


2.3        Model-Based Approach to Bio-Inspired Acous-
           tic Imaging
The model-based approach is a generic term used to describe numerical solutions to
a variety of signal processing problems [135]. Models that include additional infor-
mation about a physical process and its dynamics should, in theory, improve overall
performance. These models usually consist of linearized systems, such as linear and
adaptive filters [136, 137]; state-space estimation, e.g. Kalman filtering and its many
   2
    Most piezoelectric sensors are simple pressure-field measurement devices, while vector sensors measure
both magnitude and direction from the particle velocity component of the acoustic wave. Since mammalian
ears do not have a means of measuring particle velocity [131, 132], this additional information is explicitly
omitted from further consideration.


                                                     30
adaptive and non-linear variants [138, 139]; statistical processors, like Markov chains
and support vector machines [140, 141]; or neural networks, including classical firing-
rate based and dynamical spiking neural models [142, 39]. The model-based approach
may be used for creating new technological systems, or to better understand an ex-
isting physical system. In the context of biosonar, we are interested in using the
model-based approach for both purposes – gaining insight about animal echolocation
and applying this toward development of new innovative acoustic imaging systems.


2.3.1     Auditory Modeling Insights and Oversights with Filter Banks

Auditory modeling has embraced the idea of using parallel banks of linear filters to
mimic the frequency selectivity of the cochlea’s mechanical response. To construct
these filter banks, hearing researchers began mimicking the physiological and psycho-
logical findings from various auditory studies in humans, cats, and guinea pigs. Early
attempts to capture the critical bandwidth and asymmetrical roll-off characteristics
of hearing used low-order band-pass Roex and Gammatone filters [49], which have
an infinite impulse response (IIR). These filter designs are purely linear and time-
invariant models of the cochlear mechanics. As neurophysiology provided new insight
about the active non-linear feedback processes of the OHCs, more complicated filter
shapes emerged; such as the Gammachirp and Dual-Resonance Non-Linear (DRNL)
filters [143, 144]. These filter types expanded upon the existing models by including
time-variant compression that is based upon the amplitude of the acoustic stim-
uli [17]. Filter banks have become ubiquitous in many aspects of auditory research,
from human audition to bat echolocation, and they remain a highly valuable tool for
learning about how the auditory system encodes acoustic information. Using filter
bank models, the benefit of decades of linear systems theory can be applied. There
are unfortunately some drawbacks to this tool as well.
        One problem with using a filter bank model of the cochlea is the phase response.
Great care has been taken to capture the exact amplitude response of these band-pass

                                           31
auditory filters, yet little or no attention has been paid to the phase response of the
filter and, perhaps most importantly, the implications for group delay. Figure 2.4
shows the frequency response of an auditory filter bank for the ultrasonic range of
frequencies between 20 kHz and 100 kHz; those relevant to biosonar hearing in bats
and cetaceans. Filters are usually spaced on a logarithmic frequency axis to reflect
the distribution of neurons in the auditory system [4, 145]. The magnitude response
matches fairly close to what has been found in other mammals at reasonable sound
intensities. The phase response varies predictably near the poles and zeros of each
band-pass filter such that the phase response changes most rapidly within the pass-
band. The group-delay of a filter is simply the negative derivative of the phase
response and can be understood as the literal time-delay of a signal passing through
the filter. Signals passed through the filter bank will be amplified or attenuated
based on the magnitude response, but delayed in time according to the group delay.
If the group delay varies over frequency, signals with any bandwidth will become
dispersive in time. This artifact becomes important when modeling the auditory
system’s response to complex acoustic signals. In many cases, using an auditory
filter bank is an appropriate model of the cochlea; however, accounting for phase
is especially important when modeling a broadband system like the bat’s that can
process information down to the microsecond [74, 146] or even nanosecond scale [147,
148].


2.3.2   Signal Processing Models for High-Resolution Range Estimates

Some of the earliest computational modeling work related to bat echolocation was de-
veloped to explain the results of hyper-resolution experiments on range-discrimination [74,
146, 147]. These behavioral experiments were highly controversial [75], because they
showed that bats were clearly achieving timing resolution well beyond what was
thought possible (at the time) by neural coding in the auditory system [150]. Many
questions about the neural mechanisms remain unanswered, even decades later. Nev-

                                          32
                               Gammatone Filterbank Frequency Response                               Gammatone Filterbank Group Delay


        Magnitude (dB)
                                                                                              300                                          100
                           0
                                                                    A                                                                C     85
                         −20
                                                                                              250
                         −40                                                                                                               72


                                                                           Group Delay (µs)
                         −60                                                                                                               62
                                                                                              200
                         −80                                                                                                               53
                            0       25     50    75    100    125   150


                                                                                                                                                fc (kHz)
                                                                                              150                                          45
                           0                                                                                                               38
                                                                    B                         100
   Phase (°)


                   −180                                                                                                                    32
                   −360                                                                                                                    28
                                                                                               50
                   −540                                                                                                                    23
                   −720                                                                         0                                          20
                       0            25     50    75    100    125   150                          0      25    50    75   100   125   150
                                           Frequency (kHz)                                                   Frequency (kHz)


Figure 2.4. The gammatone filter bank is an example of an auditory cochlear model that is
commonly used in hearing and echolocation research. Each band-pass filter represents the vibratory
motion at a single physical point along the basilar membrane (BM) of the cochlea. This location
is where numerous afferent AN fibers synapse with each local cluster of IHCs and translates BM
displacement to neural spikes. (a) The magnitude response shows the logarithmic spacing of a
gammatone filter bank designed from 20 to 100 kHz. The bandwidth-to-center-frequency ratio is
normally kept constant to match the widening of the auditory critical bands at higher frequencies.
This consistent ratio also ensures a constant overlap between filter channels. Only 11 channels are
shown here for illustration, but practical models of bat echolocation require at least 80 channels
per ear [149]. (b) The phase response, φ(f ), varies significantly in the pass-band of each filter. (c)
Group delay, which is the negative derivative of phase (− dφdf ), is a commonly overlooked artifact of
using a linear filter model. The consequence of non-constant group-delay is that broadband signals
become dispersive within and between channels – that is, they are delayed in time by different
amounts depending on frequency. This effect can have unknown consequences for auditory modeling,
especially since the interaural time delay for a bat (0 to 40 µs) is one to two orders of magnitude
lower than the group delay for a gammatone filter bank. Color is used to separate overlapping lines
and corresponds to the center frequency of each filter channel.


ertheless, signal processing models were developed to understand how animals might
be achieving hyper-acuity in the range dimension.
                         The Spectrogram Correlation and Transformation (SCAT) receiver [78, 80] is
a biosonar model that mimics the echolocating bat’s hyper-resolution of a closely
spaced pair of point scatterers. SCAT was the first known computational model that
attempted to mimic bat echolocation based upon experimental evidence of neural
information processing. SCAT has served as the basis for many later models of bat
echolocation, and therefore requires a short description of how it functions. Figure 2.5
shows a block diagram of the monaural model, which includes a constant-Q filter
bank to separate time series auditory input into multiple narrowband channels and
convert time series waveforms into neural spikes. Following the cochlear filter bank


                                                                          33
are two distinct spectrogram functions (correlation and transformation) that operate
in parallel across all frequency channels.


Figure 2.5. Block diagram of the Spectrogram Correlation and Transformation (SCAT) receiver
model. Time series data enters the model through the cochlear filter bank, which consists of 2nd
order Butterworth band-pass filters (hyperbolically spaced) followed by half-wave rectification, non-
linear compression, and low-pass filtering (RCF) for each frequency channel. Neural spikes are
produced at the output of each frequency channel to mimic information encoded by the auditory
nerve fibers. The spectrogram correlation block produces a response with course echo resolution
for detection. Once an echo is detected, the spectrogram transformation block is triggered to split
this echo into multiple high-resolution echoes by a process of spectral deconvolution. The result is
a hyper-resolution receiver that exceeds the resolution of a conventional cross-correlation receiver.


      The spectrogram correlation block takes the narrowband spike events and per-
forms the neural equivalent to a parallel cross-correlation in time. When an echolo-
cation pulse is emitted, it triggers a broadband onset response across all frequencies.
Any echoes received will also produce a broadband onset response at the appropriate
time delay. The coincidence of spikes across multiple channels indicates the reception
of one or more target echoes. Due to the inherent time-delay in the FM signals,
some narrowband frequency channels will spike earlier than others. This apparent
incoherence (or time separation) across channels will match the incoherence between
the outgoing pulse and any received echoes, thereby eliminating the need to de-chirp
received signals.
      Although the detection of a single pulse-echo pair is sufficient to estimate target
range, the SCAT receiver goes further to deconvolve the spectral information into
hyper-resolution images. Closely spaced point targets will produce acoustic echoes
that overlap in the time-frequency plane. When this occurs, deterministic interference
patterns arise in the form of spectral notches. Each pair of echoes separated in time
                                                 1
by ∆T produces the first notch at f0 =          2∆T
                                                    ,   and subsequent notches at intervals of

                                                 34
             1
fj = fj−1 + ∆T for j = 1, 2, 3 . . . For signals with bandwidth between 20 and 100 kHz,
these spectral notches occur for ∆T > 5µs = 1.7 mm until the echoes no longer
overlap in the time-frequency plane. Unlike a traditional cross-correlation receiver,
the spectrogram transformation block uses this additional spectral information to
produce fine delay estimates.
      In the original SCAT model, the spectrogram transformation block is imple-
mented as a “voting mechanism” with a set of cosine basis functions. Each frequency
channel contains its own unique basis function with a period proportional to the center
frequency of the filter. The amplitude of a basis function was scaled by the received
echo level in each frequency channel. Despite its simplicity and lack of biological rel-
evance, the summation across all channels produces impulses at the correct locations
of two overlapping spikes. As pointed out by Peremans and Hallam [151], the SCAT
model incorrectly estimates the times of two echoes having different amplitudes and
produces artificial phantom echoes. Even with these nonlinearities, the SCAT model
remains one of several models to date that can replicate bats’ hyper-resolution images
of two-point targets.
      A recent review by Park and Allen [152] has likened the spectrogram transforma-
tion process to a pattern recognition problem, where notches are actively detected and
matched to corresponding time delays. This is in contrast to the original model that
detects spectral energy and simply ignores the contributions from channels containing
spectral notches. The cosine basis functions in the spectrogram transformation block
produce many oscillatory peaks that can be incorrectly classified as point targets.
Park and Allen proposed a method to suppress these unwanted peaks by predicting
their locations and canceling them out. The goal of this process is comparable to
the way interference cross-terms in a Wigner-Ville time-frequency distribution are
smoothed [153]. Just as in Wigner-Ville smoothing, we sacrifice some resolution for
reduced cross-term interference.
      Since SCAT was first published, other models have emerged that take on the


                                          35
idea of spectral deconvolution for hyper-resolution range estimates. For example,
Sanderson and Neretti used auditory filter bank models to address the question of
biological relevance of the SCAT model [77, 76, 154]. By modifying the low-pass
smoothing parameters at the RCF stage, they found that despite the low-temporal
resolution of higher cortical areas in auditory system, there is indeed sufficient infor-
mation across the time-frequency representation to register the interference patterns
of two or more closely spaced echoes. Matsuo has applied Gaussian chirplet filter-
banks [155] to the two-point resolution problem without relying upon an acoustic-
to-neural transduction component [156, 157, 158]. More recently, Sharma and Buck
proposed the variable resolution detection receiver (VRDR) without requiring filter
banks [159, 160]. The VRDR model approaches the ideal impulse resolution of an
inverse filter while maintaining a stable filter that can adapt to noise levels using a
tuning parameter. Many of these modeling developments have focused on the prob-
lem of achieving greater range resolution based on the hyper-resolution exemplified
by echolocating bats. An equally intriguing problem is how echolocating animals are
able to achieve hyper-acuity in angle.


2.3.3   Models for Angular Target Localization and Acoustic Imaging

A binaural version of SCAT, named Artificial SCAT, was created to reconstruct two-
dimensional images of simple objects in the range-azimuth plane [79]. The superior
range resolution allowed two separate SCAT processes to be used to localize in az-
imuth by comparing ITD. Echoes from wires and spheres were recorded using a pair of
microphones and a loudspeaker. The stereo time series recordings were presented to
the SCAT processing model one channel at a time and triangulation with intersecting
ellipses generated the 2-dimensional images from each time series signal. Although
implementation details were not published, some of the range-azimuth imaging re-
sults were made available [80]. Other binaural sonar models that explicitly use ITD
for angular imaging have appeared in the literature [158, 161, 162]. These models

                                           36
take advantage of the large bandwidth that yields improved range resolution, but
additional spectral information is useful to improve azimuthal performance and is
absolutely necessary for localization in elevation. Only recently have models begun
to include spectral cues in the source localization process, including azimuth and el-
evation [11, 163, 93], but many of these models abandon the filter bank approach in
favor of more traditional signal processing tools.


2.3.4   Mathematical Models of Echolocation Performance

Taking a systems of systems approach to biosonar modeling and not concerning our-
selves with the complexities of the brain can prove useful. There have been several
interesting mathematical models published that aim to provide an explanation of
echolocation performance by animals. In one of the earliest (and possibly most il-
luminating) mathematical studies on a binaural sonar system, Altes calculated the
Cramer-Rao lower bound (CRLB) for azimuth and elevation, and derived the max-
imum likelihood estimator based on these results [91]. This analytical model found
that azimuth localization accuracy is not only a function of ITD and SNR, but also
of the gradient (i.e. sensitivity) of the magnitude and phase of broadband beam
patterns versus angle. Since this work was ahead of its time, it did not include
a numerical analysis with any measured biosonar beam patterns that have become
available. Although the spectral effects for both, transmit and receive beam patterns
were considered, none of the frequency-dependent effects in signal propagation were
included. This particular study was limited to the accuracy of angular localization
rather than resolution, which is required for acoustic imaging in densely cluttered
environments. Altes does briefly comment on the subject of resolution, “Accurate
unambiguous azimuth resolution can be obtained with only two transducers, even if
the beam patterns of the transducers are very broad. It is only necessary to utilize a
wide-band signal with an autocorrelation width that is narrow relative to the distance
between transducers.”

                                          37
        With advances in computed-tomography and computational power, finite-element
methods were pioneered to estimate the complex spectral properties of HRTFs [164].
With these new techniques, high-resolution HRTF models of bats’ pinnae and nose-
leaves can be quickly assembled into libraries [47]. The HRTF libraries can be used
for high-fidelity acoustic simulations, or quantifying the spectral information by the
CRLB [165] or information theory [166, 167]. The information theoretic approach has
also been used to evaluate performance of bio-inspired processing with conventional
transducers [168].


2.3.5     Hardware Prototypes as Exploratory Models

As stated previously, modeling can lead to many insights into a problem if done
properly. Unfortunately, models may also mask the true phenomenon of interest. In
this vein, taking real acoustic measurements and constructing biomimetic prototype
systems are necessary to test and verify models in the real world. Hardware prototypes
are also the first step toward creating autonomous biomimetic sensors that can operate
in real-time3 .
        Over the past 15 years, biomimetic sonar models have appeared on integrated
circuits [169, 170, 171]. All-digital field-programmable gate arrays (FPGA) are ap-
pealing for the real-time implementation of auditory filter banks, because of the sheer
number of parallel computations required [172]. Unfortunately, neural information
processing on digital hardware is computationally expensive and makes inefficient use
of resources. This is the primary reason that very-large scale integrated (VLSI) analog
circuits have appeared for various bio-inspired computations (e.g. echo ranging with
delay lines [173, 174, 175], azimuthal localization using IID cues [176, 177, 178, 179],
binaural comparison of spectral cues [180], and spike-based neural information pro-
cessing [181, 182]).
   3
    Real-time has many interpretations that depend on the context. For a biosonar signal processor, real-
time should be defined as having sufficient data throughput such that a bottleneck is never reached and
latency that allows adequate response time to real-world events.


                                                   38
      Various bio-inspired robotic sonar systems have been developed, which can be
grouped by the basic set of information used for localization. Kuc used ITD with
a simple pair of circular aperture receive transducers to localize and classify objects
in realistic environments [183, 184]. Although only ITD was used for localization,
the transducers were oriented off-axis so that a comparison between the broadband
time-based signals could be used to perform classification. Schillebeeckx and Pere-
mans have applied Bayesian probabilistic techniques [185] and maximum likelihood
estimation (MLE) [186] to the localization problem from binaural HRTF. Using the
spectrum of an emitted sound in a different manner, Guarato et al. showed that es-
timating source orientation is possible [187]. Combining the concept of sparse arrays
and bio-inspired processing, Steckel and Peremans used bandwidth to average out
grating lobes over multiple frequency octaves [188, 189, 190]. A model and hardware
processor was also created for simultaneous localization and mapping for guidance
and control of a robot [191]. Each hardware prototype has individual merit, but
together they demonstrate the clear advantages of biosonar acoustic imaging.


References
  [1] W. Gerstner, R. Kempter, J. Van Hemmen, and H. Wagner, “A neuronal learn-
      ing rule for sub-millisecond temporal coding”, Nature 383, 76–78 (1996).
  [2] W. Au and J. Simmons, “Echolocation in dolphins and bats”, Phys. Today 60,
      40–45 (2007).
  [3] C. J. Sumner, R. Meddis, and I. M. Winter, “The role of auditory nerve inner-
      vation and dendritic filtering in shaping onset responses in the ventral cochlear
      nucleus”, Brain Res. 1247, 221–234 (2009).
  [4] E. Covey and J. H. Casseday, “The lower brainstem auditory pathways”, in
      Hearing by bats, 235–295 (Springer, New York, NY) (1995).
  [5] N. Suga, E. Gao, Y. Zhang, and X. Ma, “The corticofugal system for hearing:
      Recent progress”, Proc. Natl. Acad. Sci. U.S.A. 97, 11807–11814 (2000).
  [6] E. Covey, “Neurobiological specializations in echolocating bats”, Anat. Rec.
      Part A 287, 1103–1116 (2005).

                                          39
 [7] E. Covey and J. Casseday, “Timing in the auditory system of the bat”, Annu.
     Rev. Physiol. 61, 457–476 (1999).
 [8] J. Casseday, “The monaural nuclei of the lateral lemniscus in an echolocating
     bat: Parallel pathways for analyzing temporal features of sound”, J. Neurosci.
     11, 3456–3470 (1991).
 [9] R. Meddis, “Simulation of auditory-neural transduction: Further studies”, J.
     Acoust. Soc. Am. 83, 1056–1063 (1988).
[10] R. Meddis, “Simulation of mechanical to neural transduction in the auditory
     receptor”, J. Acoust. Soc. Am. 79, 702–711 (1986).
[11] B. Fontaine and H. Peremans, “Bat echolocation processing using first-spike
     latency coding”, Neural Networks 22, 1372–1382 (2009).
[12] P. Heil, H. Neubauer, M. Brown, and D. Irvine, “Towards a unifying basis of
     auditory thresholds: Distributions of the first-spike latencies of auditory-nerve
     fibers”, Hearing Res. 238, 25–38 (2008).
[13] P. Heil, H. Neubauer, D. Irvine, and M. Brown, “Spontaneous activity of
     auditory-nerve fibers: Insights into stochastic processes at ribbon synapses”,
     J. Neurosci. 27, 8457–8474 (2007).
[14] P. Heil, “First-spike latency of auditory neurons revisited”, Curr. Opin. Neuro-
     biol. 14, 461–467 (2004).
[15] R. Meddis, “Auditory-nerve first-spike latency and auditory absolute threshold:
     A computer model”, J. Acoust. Soc. Am. 119, 406–417 (2006).
[16] P. Heil and D. Irvine, “First-spike timing of auditory-nerve fibers and compar-
     ison with auditory cortex”, J. Neurophysiol. 78, 2438–2454 (1997).
[17] “Computational Models of the Auditory System”, Springer, New York (2010).
[18] A. R. Moller, Hearing, Anatomy, Physiology, and Disorders of the Auditory
     System, 2nd edition (Academic Press, Burlington, MA) (2006).
[19] N. S. Harper and D. McAlpine, “Optimal neural population coding of an audi-
     tory spatial cue”, Nature 430, 682–686 (2004).
[20] T. Cover and J. Thomas, Elements of Information Theory, Wiley Series in
     Telecommunications and Signal Processing, 2nd edition (Wiley-Interscience,
     Hoboken, NJ) (2006).
[21] D. Oertel, “The role of timing in the brain stem auditory nuclei of vertebrates”,
     Annu. Rev. Physiol. 61, 497–519 (1999).
[22] D. Oertel and E. Young, “What’s a cerebellar circuit doing in the auditory
     system?”, Trends Neurosci. 27, 104–110 (2004).


                                         40
[23] D. Oertel, S. Wright, X. Cao, and M. Ferragamo, “The multiple functions of T
     stellate/multipolar/chopper cells in the ventral cochlear nucleus”, Hearing Res.
     276, 61–69 (2011).
[24] P. H. S. Jen, “Adaptive mechanisms underlying the bat biosonar behavior”,
     Front. Biol. 5, 128–155 (2010).
[25] M. Abeles, G. Hayon, and D. Lehmann, “Modeling compositionality by dynamic
     binding of synfire chains.”, J Comput. Neurosci 17, 179–201 (2004).
[26] P. Dayan and L. Abbott, Theoretical Neuroscience: Computational and Math-
     ematical Modeling of Neural Systems (MIT Press, Cambridge, MA) (2001).
[27] J. C. R. Licklider, “A duplex theory of pitch perception”, Experientia 7, 128–
     134 (1951).
[28] S. Shamma, “On the role of space and time in auditory processing”, Trends
     Cogn. Sci. 5, 340–348 (2001).
[29] P. Joris, P. Smith, and T. Yin, “Coincidence detection minireview in the audi-
     tory system: 50 years after Jeffress”, Neuron 21, 1235–1238 (1998).
[30] S. Dear and N. Suga, “Delay-tuned neurons in the midbrain of the big brown
     bat”, J. Neurophysiol. 73, 1084–1100 (1995).
[31] J. F. Olsen and N. Suga, “Combination-sensitive neurons in the medial genic-
     ulate body of the mustached bat: encoding of target range information.”, J.
     Neurophysiol. 65, 1275–1296 (1991).
[32] J. A. Simmons and J. E. Gaudette, “Biosonar echo processing by frequency-
     modulated bats”, IET Radar Sonar Navig. 6, 556–565 (2012).
[33] N. Tishby, F. Pereira, and W. Bialek, “The information bottleneck method”,
     Arxiv Preprint Physics 1–16 (2000).
[34] L. Buesing and W. Maass, “A spiking neuron as information bottleneck”, Neural
     Comput. 22, 1961–1992 (2010).
[35] D. Johnson, “Information Theory and Neural Information Processing”, IEEE
     Trans. Inf. Theory 56, 653–666 (2010).
[36] T. Lu and X. Wang, “Information content of auditory cortical responses to
     time-varying acoustic stimuli”, J. Neurophysiol. 91, 301 (2004).
[37] W. Bialek, F. Rieke, R. R. de Ruyter van Steveninck, and D. Warland, “Reading
     a neural code.”, Science 252, 1854–1857 (1991).
[38] E. M. Izhikevich, “Polychronization: Computation with spikes”, Neural Com-
     put. 18, 245–282 (2006).
[39] E. M. Izhikevich, Dynamical systems in neuroscience, the geometry of excitabil-
     ity and bursting (MIT Press, Cambridge, MA) (2007).

                                        41
[40] P. Chadderton, J. P. Agapiou, D. Mcalpine, and T. W. Margrie, “The Synaptic
     Representation of Sound Source Location in Auditory Cortex”, J. Neurosci. 29,
     14127–14135 (2009).
[41] F. L. Wightman and D. J. Kistler, “Monaural sound localization revisited.”, J.
     Acoust. Soc. Am. 101, 1050–1063 (1997).
[42] R. A. Butler and R. A. Humanski, “Localization of sound in the vertical plane
     with and without high-frequency spectral cues.”, Percept. Psychophys. 51, 182–
     186 (1992).
[43] R. A. Butler, R. A. Humanski, and A. D. Musicant, “Binaural and monaural
     localization of sound in two-dimensional space”, Perception 19, 241–256 (1990).
[44] H. Neubauer and P. Heil, “A physiological model for the stimulus dependence of
     first-spike latency of auditory-nerve fibers”, Brain Res. 1220, 208–223 (2008).
[45] B. J. Fischer, L. J. Steinberg, B. Fontaine, R. Brette, and J. L. Pe˜
                                                                         na, “Effect
     of instantaneous frequency glides on interaural time difference processing by
     auditory coincidence detectors”, Proc. Natl. Acad. Sci. U.S.A. 108, 18138–
     18143 (2011).
[46] A. Brand, O. Behrend, T. Marquardt, D. Mcalpine, and B. Grothe, “Precise
     inhibition is essential for microsecond interaural time difference coding”, Nature
     417, 543–547 (2002).
[47] J. Ma and R. M¨ uller, “A method for characterizing the biodiversity in bat pin-
     nae as a basis for engineering analysis”, Bioinspiration Biomimetics 6, 026008
     (2011).
[48] N. H. Fletcher and S. Thwaites, “Obliquely truncated simple horns: Idealized
     models for vertebrate pinnae”, Acustica 65, 194–204 (1988).
[49] E. Lopez-Poveda, “Spectral processing by the peripheral auditory system: Facts
     and models”, Int. Rev. Neurobiol. 70, 7–48 (2005).
[50] R. M¨uller, “A numerical study of the role of the tragus in the big brown bat”,
     J. Acoust. Soc. Am. 116, 3701–3712 (2004).
[51] M. Aytekin, E. Grassi, M. Sahota, and C. Moss, “The bat head-related transfer
     function reveals binaural cues for sound localization in azimuth and elevation”,
     J. Acoust. Soc. Am. 116, 3594–3605 (2004).
[52] J. A. Simmons and A. Megela Simmons, “Bats and frogs and animals in be-
     tween: Evidence for a common central timing mechanism to extract periodicity
     pitch”, J. Comp. Physiol. A 197, 585–594 (2010).
[53] D. Griffin, Listening in the Dark, The Acoustic Orientation of Bats and Men
     (Cornell University Press, London) (1958).


                                         42
[54] N. Veselka, D. D. Mcerlain, D. W. Holdsworth, J. L. Eger, R. K. Chhem, M. J.
     Mason, K. L. Brain, P. A. Faure, and M. B. Fenton, “A bony connection signals
     laryngeal echolocation in bats”, Nature 463, 939–942 (2010).
[55] G. Neuweiler, The Biology of Bats (Oxford University Press, New York, NY)
     (2000).
[56] T. W. Cranford, M. Amundin, and K. S. Norris, “Functional morphology and
     homology in the odontocete nasal complex: Implications for sound generation”,
     J. Morphol. 228, 223–285 (1996).
[57] J. L. Aroyan, “Three-dimensional numerical simulation of biosonar signal emis-
     sion and reception in the common dolphin”, Ph.D. thesis, University of Califor-
     nia at Santa Cruz, Santa Cruz, CA (1996).
[58] T. W. Cranford, P. Krysl, and J. A. Hildebrand, “Acoustic pathways revealed:
     Simulated sound transmission and reception in Cuvier’s beaked whale (Ziphius
     cavirostris)”, Bioinspiration Biomimetics 3, 016001 (2008).
[59] W. E. Evans, “Echolocation by marine delphinids and one species of fresh-water
     dolphin”, J. Acoust. Soc. Am. 54, 191–199 (1973).
[60] Y. Yovel, B. Falk, C. F. Moss, and N. Ulanovsky, “Optimal localization by
     pointing off axis”, Science 327, 701–704 (2010).
[61] Q. Zhuang and R. M¨  uller, “Noseleaf furrows in a horseshoe bat act as resonance
     cavities shaping the biosonar beam”, Phys. Rev. Lett. 97, 218701 (2006).
[62] D. Vanderelst, F. De Mey, H. Peremans, I. Geipel, E. Kalko, and U. Firzlaff,
     “What noseleaves do for FM bats depends on their degree of sensorial special-
     ization”, PLoS ONE 5, e11893 (2010).
[63] A. Surlykke and C. F. Moss, “Echolocation behavior of big brown bats, Eptesi-
     cus fuscus, in the field and the laboratory”, J. Acoust. Soc. Am. 108, 2419–2429
     (2000).
[64] R. Altes and E. Titlebaum, “Bat signals as optimally Doppler tolerant wave-
     forms”, J. Acoust. Soc. Am. 48, 1014–1020 (1970).
[65] R. Altes, “Ubiquity of hyperacuity”, J. Acoust. Soc. Am. 85, 943–952 (1989).
[66] F. C. Fraser and P. E. Purves, “Hearing in cetaceans”, Bulletin of the British
     Museum (Natural History) (1954).
[67] F. C. Fraser and P. E. Purves, “Hearing in cetaceans: Evolution of the accessory
     air sacs and the structure and function of the outer and middle ear in recent
     cetaceans”, Bulletin of the British Museum (Natural History) (1960).
[68] K. S. Norris, “Some problems of echolocation in cetaceans”, in Marine bioa-
     coustics, edited by W. N. Tavolga, 316–336 (Pergamon Press, New York, NY)
     (1964).


                                         43
[69] K. S. Norris, “The evolution of acoustic mechanisms in odontocete cetaceans”,
     in Evolution and environment, edited by E. T. Drake, 297–324 (Yale University
     Press, New Haven, CT) (1968).
[70] K. S. Norris, “The echolocation of marine mammals”, in The biology of marine
     mammals, edited by H. T. Anderson, 391–423 (Academic Press, New York, NY)
     (1969).
[71] R. L. Brill, M. L. Sevenich, T. J. Sullivan, J. D. Sustman, and R. E. Witt, “Be-
     havioral evidence for hearing through the lower jaw by an echolocating dolphin
     (Tursiops truncatus)”, Marine Mammal Science 4, 223–230 (1988).
[72] A. Rihaczek, Principles of High-Resolution Radar (Artech House, Norwood,
     MA) (1996).
[73] M. I. Skolnik, Introduction to Radar Systems, 3rd edition (McGraw-Hill, Boston,
     MA) (2001).
[74] J. A. Simmons, “The resolution of target range by echolocating bats”, J. Acoust.
     Soc. Am. 54, 157–173 (1973).
[75] D. Menne and H. Hackbarth, “Accuracy of distance measurement in the bat
     Eptesicus fuscus: Theoretical aspects and computer simulations”, J. Acoust.
     Soc. Am. 79, 386–397 (1986).
[76] M. I. Sanderson, N. Neretti, N. Intrator, and J. A. Simmons, “Evaluation of an
     auditory model for echo delay accuracy in wideband biosonar”, J. Acoust. Soc.
     Am. 114, 1648–1659 (2003).
[77] N. Neretti, M. Sanderson, N. Intrator, and J. Simmons, “Time-frequency model
     for echo-delay resolution in wideband biosonar”, J. Acoust. Soc. Am. 113, 2137–
     2147 (2003).
[78] P. Saillant, J. Simmons, S. Dear, and T. McMullen, “A computational model
     of echo processing and acoustic imaging in frequency-modulated echolocating
     bats: The spectrogram correlation and transformation receiver”, J. Acoust. Soc.
     Am. 94, 2691–2712 (1993).
[79] J. Simmons, P. Saillant, and S. Boatright, “Biologically inspired SCAT sonar
     receiver for 2-D imaging”, J. Acoust. Soc. Am. 102, 3153 (1997).
[80] P. A. Saillant, “Neural Computations for Biosonar Imaging in the Big Brown
     Bat”, Ph.D. thesis, Brown University, Providence, RI (1995).
[81] L. N. Kloepper, P. E. Nachtigall, M. J. Donahue, and M. Breese, “Active echolo-
     cation beam focusing in the false killer whale, Pseudorca crassidens”, J. Exp.
     Biol. 215, 1306–1312 (2012).
[82] L. N. Kloepper, P. E. Nachtigall, C. Quintos, and S. A. Vlachos, “Single-lobed
     frequency-dependent beam shape in an echolocating false killer whale (Pseu-
     dorca crassidens)”, J. Acoust. Soc. Am. 131, 577–581 (2012).

                                        44
[83] J. Simmons, C. Moss, and M. Ferragamo, “Convergence of temporal and spec-
     tral information into acoustic images of complex sonar targets perceived by the
     echolocating bat, Eptesicus fuscus”, J. Comp. Physiol. A 166, 449–470 (1990).
[84] M. Sanderson and J. Simmons, “Neural responses to overlapping FM sounds
     in the inferior colliculus of echolocating bats”, J. Neurophysiol. 83, 1840–1855
     (2000).
[85] M. Sanderson and J. Simmons, “Selectivity for echo spectral interference and
     delay in the auditory cortex of the big brown bat Eptesicus fuscus”, J. Neuro-
     physiol. 87, 2823–2834 (2002).
[86] B. K. Branstetter, S. J. Mevissen, L. M. Herman, A. Pack, and S. P. Roberts,
     “Horizontal angular discrimination by an echolocating bottlenose dolphin tur-
     siops truncatus”, Bioacoustics 14, 15–34 (2003).
[87] J. Wotton and J. Simmons, “Spectral cues and perception of the vertical po-
     sition of targets by the big brown bat, Eptesicus fuscus”, J. Acoust. Soc. Am.
     107, 1034–1041 (2000).
[88] J. Wotton, T. Haresign, M. Ferragamo, and J. Simmons, “Sound source ele-
     vation and external ear cues influence the discrimination of spectral notches
     by the big brown bat, Eptesicus fuscus”, J. Acoust. Soc. Am. 100, 1764–1776
     (1996).
[89] Z. M. Fuzessery, “Monaural and binaural spectral cues created by the external
     ears of the pallid bat”, Hearing Res. 95, 1–17 (1996).
[90] W. M. Masters, A. J. Moffat, and J. A. Simmons, “Sonar tracking of horizontally
     moving targets by the big brown bat Eptesicus fuscus”, Science 228, 1331–1333
     (1985).
[91] R. Altes, “Angle estimation and binaural processing in animal echolocation”,
     J. Acoust. Soc. Am. 63, 155–173 (1978).
[92] R. M¨ uller, “Numerical analysis of biosonar beamforming mechanisms and
     strategies in bats”, J. Acoust. Soc. Am. 128, 1414–1425 (2010).
[93] J. Reijniers and H. Peremans, “Biomimetic sonar system performing spectrum-
     based localization”, IEEE Trans. Robot. 23, 1151–1159 (2007).
[94] B. K. Branstetter and E. Mercado, III, “Sound Localization by Cetaceans”,
     International Journal of Comparative Psychology 19, 26–61 (2006).
[95] S. S¨
         umer, A. Denzinger, and H.-U. Schnitzler, “Spatial unmasking in the echolo-
     cating Big Brown Bat, Eptesicus fuscus”, J. Comp. Physiol. A 195, 463–472
     (2009).
[96] J. A. Simmons, S. A. Kick, B. D. Lawrence, C. Hale, C. Bard, and B. Escudie,
     “Acuity of horizontal angle discrimination by the echolocating bat, Eptesicus
     fuscus”, J. Comp. Physiol. A 153, 321–330 (1983).

                                        45
 [97] M. E. Bates, S. A. Stamper, and J. A. Simmons, “Jamming avoidance response
      of big brown bats in target detection”, J. Exp. Biol. 211, 106–113 (2008).
 [98] M. Warnecke, M. E. Bates, V. Flores, and J. A. Simmons, “Spatial release from
      simultaneous echo masking in bat sonar”, J. Acoust. Soc. Am. 135, 1–9 (2014).
 [99] M. E. Bates, J. A. Simmons, and T. V. Zorikov, “Bats use echo harmonic
      structure to distinguish their targets from background clutter”, Science 333,
      627–630 (2011).
[100] R. M. Pope and E. S. Fry, “Absorption spectrum (380-700 nm) of pure water.
      II. Integrating cavity measurements”, Applied optics 36, 8710–8723 (1997).
[101] G. E. Becker and S. H. Autler, “Water vapor absorption of electromagnetic
      radiation in the centimeter wave-length range”, Physical Review 70, 300–307
      (1946).
[102] X. Lurton, An Introduction to Underwater Acoustics, Principles and Applica-
      tions (Springer, New York) (2002).
[103] H. S. Maxim, A New System for Preventing Collisions at Sea (Cassell and
      Company, London) (1912).
[104] R. Urick, Principles of Underwater Sound, 3rd edition (Pennsylvania Publica-
      tions, Los Altos, CA) (1983).
[105] W. Burdic, Underwater Acoustic System Analysis, 2nd edition (Pennsylvania
      Publications, Los Altos, CA) (2003).
[106] B. Maranda, “Efficient digital beamforming in the frequency domain”, J.
      Acoust. Soc. Am. 86, 1813–1819 (1989).
[107] M. Bono, B. Shapo, P. McCarty, and R. Bethel, “Subband energy detection
      in passive array processing”, Technical Report ADA405484, Univ. of Texas at
      Austin. Applied Research Labs., Austin, TX (2000).
[108] V. Valimaki and T. Laakso, “Principles of fractional delay filters”, in Proc.
      IEEE ICASSP ’00, 3870–3873 (2000).
[109] D. H. Johnson and D. E. Dudgeon, Array Signal Processing: Concepts and
      Techniques (Prentice Hall PTR, Upper Saddle River, NJ) (1993).
[110] D. Abraham, “Short Course on Array Signal Processing for Sonar”, in 166th
      Meeting of the Acoustical Society of America (San Francisco, CA) (2013).
[111] A. V. Oppenheim, R. W. Schafer, and J. R. Buck, Discrete-Time Signal Pro-
      cessing, 2nd edition (Prentice Hall PTR, Englewood Cliffs, NJ) (1999).
[112] P. P. Vaidyanathan and P. Pal, “Sparse coprime sensing with multidimensional
      lattice arrays”, Digital Signal Processing Workshop IEEE 425–430 (2011).
[113] M. J. Hinich, “Processing spatially aliased arrays”, J. Acoust. Soc. Am. 64,
      792–794 (1978).

                                        46
[114] K. Drakakis, “A review of Costas arrays”, J. Appl. Math. 2006, 1–32 (2006).
[115] J. Costas, “A study of a class of detection waveforms having nearly ideal range—
      Doppler ambiguity properties”, Proc. IEEE 72, 996–1009 (1984).
[116] M. P. Hayes and P. T. Gough, “Synthetic aperture sonar: A review of current
      status”, IEEE J. Ocean. Eng. 34, 207–224 (2009).
[117] A. Bellettini and M. A. Pinto, “Theoretical accuracy of synthetic aperture sonar
      micronavigation using a displaced phase-center antenna”, IEEE J. Ocean. Eng.
      27, 780–789 (2002).
[118] M. Pinto, “Use of frequency and transmitter location diversities for ambiguity
      suppression in synthetic aperture sonar systems”, in OCEANS ’97. MTS/IEEE
      Proc., 363–368 (1997).
[119] K. F. Nieman, K. A. Perrine, T. L. Henderson, K. H. Lent, T. J. Brudner, and
      B. L. Evans, “Wideband monopulse spatial filtering for large receiver arrays
      for reverberant underwater communication channels”, in Proc. IEEE OCEANS
      2010 MTE, 1–8 (IEEE) (2010).
[120] E. Mosca, “Angle estimation in amplitude comparison monopulse systems”,
      IEEE Trans. Aerosp. Electron. Syst. AES-5, 205–212 (1969).
[121] G. Llort-Pujol, C. Sintes, and D. Gueriot, “Analysis of Vernier interferometers
      for sonar bathymetry”, in Proc. IEEE OCEANS ’08, 1–5 (IEEE) (2008).
[122] G. Llort-Pujol, C. Sintes, and X. Lurton, “A new approach for fast and high-
      resolution interferometric bathymetry”, in Proc IEEE OCEANS ’06, 1–7 (2006).
[123] R. G. Lorenz and S. P. Boyd, “Robust minimum variance beamforming”, IEEE
      Trans. Signal Process. 53, 1684–1696 (2005).
[124] J. Capon, “High-resolution frequency-wavenumber spectrum analysis”, Proc.
      IEEE 57, 1408–1418 (1969).
[125] J. W. Odendaal, E. Barnard, and C. W. I. Pistorius, “Two-dimensional super-
      resolution radar imaging using the MUSIC algorithm”, IEEE Trans. Antennas
      Propagat. 42, 1386–1391 (1994).
[126] R. Schmidt, “Multiple emitter location and signal parameter estimation”, IEEE
      Trans. Antennas Propagat. 34, 276–280 (1986).
[127] A. Baggeroer and W. Kuperman, “An overview of matched field methods in
      ocean acoustics”, IEEE J. Ocean. Eng. 18, 401–424 (1993).
[128] A. Baggeroer, W. Kuperman, and H. Schmidt, “Matched field processing:
      Source localization in correlated noise as an optimum parameter estimation
      problem”, J. Acoust. Soc. Am. 83, 571–587 (1988).
[129] R. L. Thompson, J. Seawall, and T. Josserand, “Two dimensional and three
      dimensional imaging results using blazed arrays”, in Proc. IEEE OCEANS ’01,
      985–988 (2001).

                                         47
[130] R. L. Thompson and W. J. Zehner, “Frequency-steered acoustic beam forming
      system and process”, US Patent Office 5,923,617 (1999).
[131] M. Hiipakka, T. Kinnari, and V. Pulkki, “Estimating head-related transfer
      functions of human subjects from pressure–velocity measurements”, J. Acoust.
      Soc. Am. 131, 4051–4061 (2012).
[132] V. A. Gordienko, V. I. Il’ichev, and L. N. Zakharov, Vector-phase methods in
      acoustics (George Washington University, Seattle, WA) (1989).
[133] D. M. Donskoy and B. A. Cray, “Acoustic particle velocity horns”, J. Acoust.
      Soc. Am. 131, 3883 (2012).
[134] A. Nehorai and E. Paldi, “Acoustic vector-sensor array processing”, IEEE
      Trans. Signal Process. 42, 2481–2491 (1994).
[135] J. V. Candy, Model-Based Signal Processing (John Wiley & Sons, Hoboken,
      NJ) (2005).
[136] L. B. Jackson, Digital Filters and Signal Processing with MATLAB Exercises,
      3rd edition (Klewer Academic Publishers, Norwell, MA) (1995).
[137] S. S. Haykin, Adaptive Filter Theory, 5th edition (Prentice Hall, Upper Saddle
      River, NJ) (2013).
[138] D. Simon, Optimal State Estimation, Kalman, H Infinity, and Nonlinear Ap-
      proaches (John Wiley & Sons, Hoboken, NJ) (2006).
[139] R. Van der Merwe and E. Wan, “The square-root unscented Kalman filter for
      state and parameter-estimation”, in IEEE ICASSP ’01 Proc., 3461–3464 vol.6
      (2001).
[140] D. Gamerman and H. F. Lopes, Markov Chain Monte Carlo, Stochastic Sim-
      ulation for Bayesian Inference, Second Edition, 2nd edition (CRC Press, Boca
      Raton, FL) (2006).
[141] I. Steinwart and A. Christmann, Support Vector Machines (Springer, New York)
      (2008).
[142] S. S. Haykin, Neural Networks and Learning Machines (Prentice Hall, Upper
      Saddle River, NJ) (2009).
[143] T. Irino and R. Patterson, “A time-domain, level-dependent auditory filter: The
      gammachirp”, J. Acoust. Soc. Am. 101, 412–419 (1997).
[144] C. Sumner, L. O’Mard, E. Lopez-Poveda, and R. Meddis, “A nonlinear filter-
      bank model of the guinea-pig cochlear nerve: Rate responses”, J. Acoust. Soc.
      Am. 113, 3264–3274 (2003).
[145] E. Covey and J. H. Casseday, “Connectional basis for frequency representation
      in the nuclei of the lateral lemniscus of the bat Eptesicus fuscus”, J. Neurosci.
      (1986).

                                          48
[146] J. A. Simmons, M. B. Fenton, and M. J. O’Farrel, “Echolocation and pursuit
      of prey by bats”, Science 203, 16–21 (1979).
[147] Ferragamo, M. Sanderson, and J. Simmons, “Phase sensitivity of auditory brain-
      stem responses in echolocating big brown bats”, J. Acoust. Soc. Am. 112, 2288
      (2002).
[148] J. A. Simmons, M. Ferragamo, C. F. Moss, S. B. Stevenson, and R. A. Altes,
      “Discrimination of jittered sonar echoes by the echolocating bat, Eptesicus fus-
      cus: The shape of target images in echolocation”, J. Comp. Physiol. A 167,
      589–616 (1990).
[149] R. Roverud, “Complex sound analysis in the lesser bulldog bat: Evidence for
      a mechanism for processing frequency elements of frequency modulated signals
      over restricted time intervals”, J. Comp. Physiol. A 174, 559–565 (1994).
[150] M. Ferragamo, T. Haresign, and J. Simmons, “Frequency tuning, latencies,
      and responses to frequency-modulated sweeps in the inferior colliculus of the
      echolocating bat, Eptesicus fuscus”, J. Comp. Physiol. A 182, 65–79 (1997).
[151] H. Peremans and J. Hallam, “The spectrogram correlation and transformation
      receiver, revisited”, J. Acoust. Soc. Am. 104, 1101–1110 (1998).
[152] M. Park and R. Allen, “Pattern-matching analysis of fine echo delays by the
      spectrogram correlation and transformation receiver”, J. Acoust. Soc. Am. 128,
      1490–1500 (2010).
[153] W. Martin and P. Flandrin, “Wigner-Ville spectral analysis of nonstationary
      processes”, IEEE Trans. Acoust., Speech, Signal Process. 33, 1461–1470 (1985).
[154] M. I. Sanderson, “The representation of temporal and spectral information cor-
      responding to target range in the auditory system of the big brown bat”, Ph.D.
      thesis, Brown University, Providence, RI (2002).
[155] S. Mann and S. S. Haykin, “The chirplet transform: physical considerations”,
      IEEE Trans. Signal Process. 43, 2745–2761 (1995).
[156] I. Matsuo, K. Kunugiyama, and M. Yano, “An echolocation model for range dis-
      crimination of multiple closely spaced objects: Transformation of spectrogram
      into the reflected intensity distribution”, J. Acoust. Soc. Am. 115, 920–928
      (2004).
[157] I. Matsuo and M. Yano, “An echolocation model for the restoration of an acous-
      tic image from a single-emission echo”, J. Acoust. Soc. Am. 116, 3782–3788
      (2004).
[158] I. Matsuo, J. Tani, and M. Yano, “A model of echolocation of multiple targets
      in 3D space from a single emission”, J. Acoust. Soc. Am. 110, 607–624 (2001).
[159] N. S. Sharma, J. R. Buck, and J. A. Simmons, “Trading detection for resolution
      in active sonar receivers”, J. Acoust. Soc. Am. 130, 1272 (2011).

                                         49
[160] N. S. Sharma and J. Buck, “A generalized linear filter approach for sonar re-
      ceivers”, in IEEE DSP/SPE 2009, 507–512 (2009).
[161] I. Matsuo, “Localization and tracking of moving objects in two-dimensional
      space by echolocation”, J. Acoust. Soc. Am. 133, 1151–1157 (2013).
[162] S. E. Forsythe, H. A. Leinhos, and P. R. Bandyopadhyay, “Dolphin-inspired
      combined maneuvering and pinging for short-distance echolocation”, J. Acoust.
      Soc. Am. 124, EL255–EL261 (2008).
[163] L. Wiegrebe, “An autocorrelation model of bat sonar”, Biol. Cybern. 98, 587–
      595 (2008).
[164] R. M¨uller and J. C. T. Hallam, “Knowledge mining for biomimetic smart an-
      tenna shapes”, Rob. Autom. Syst. 50, 131–145 (2005).
[165] R. M¨uller, H. Lu, and J. Buck, “Sound-diffracting flap in the ear of a bat
      generates spatial information”, Phys. Rev. Lett. 100, 108701 (2008).
[166] D. Vanderelst, J. Reijniers, J. Steckel, and H. Peremans, “Information gener-
      ated by the moving pinnae of Rhinolophus rouxi : Tuning of the morphology at
      different harmonics”, PLoS ONE 6, e20627 (2011).
[167] J. Reijniers, D. Vanderelst, and H. Peremans, “Morphology-induced information
      transfer in bat sonar”, Phys. Rev. Lett. 105, 148701 (2010).
[168] D. Vanderelst, J. Reijniers, F. Schillebeeckx, and H. Peremans, “Evaluat-
      ing three-dimensional localisation information generated by bio-inspired in-air
      sonar”, IET Radar Sonar Navig. 6, 516–525 (2012).
[169] T. Horiuchi, “A systems view of a neuromorphic VLSI echolocation system”,
      IEEE ISCAS 2008 (2007).
[170] T. Horiuchi, “Seeing in the dark: Neuromorphic VLSI modeling of bat echolo-
      cation”, IEEE Signal Process. Mag. 22, 134–139 (2005).
[171] G. Cauwenberghs, R. Edwards, Y. Deng, R. Genov, and D. Lemonds, “Neuro-
      morphic processor for real-time biosonar object detection”, IEEE ICASSP ’02
      Proc. 4, 3984–3987 (2001).
[172] C. Clarke and L. Qiang, “Bat on an FPGA: A biomimetic implementation of a
      highly parallel signal processing system”, in Proc. IEEE ACSSC ’04, 456–460
      (2004).
[173] T. Horiuchi, “A spike-latency model for sonar-based navigation in obstacle
      fields”, IEEE Trans. Circuits Syst. I, Reg. Papers 56, 2393–2401 (2009).
[174] T. Horiuchi, “A neural model for sonar-based navigation in obstacle fields”,
      IEEE ISCAS 2008 605–608 (2006).
[175] M. Cheely and T. Horiuchi, “A VLSI model of range-tuned neurons in the bat
      echolocation system”, IEEE ISCAS 2003 4, 872–875 (2003).

                                         50
[176] T. Horiuchi, “A neuromorphic VLSI model of bat interaural level difference pro-
      cessing for azimuthal echolocation”, IEEE Trans. Circuits Syst. I, Reg. Papers
      54, 74–88 (2007).
[177] T. Horiuchi, “A VLSI model of the bat dorsal nucleus of the lateral lemniscus
      for azimuthal echolocation”, IEEE ISCAS 2005 5, 4217–4220 (2005).
[178] R. Z. Shi and T. K. Horiuchi, “A VLSI model of the bat lateral superior olive
      for azimuthal echolocation”, in IEEE ISCAS ’04, 900–903 (2004).
[179] T. Horiuchi, “Spike-based VLSI modeling of the ILD system in the echolocating
      bat”, Neural Networks (2001).
[180] T. Horiuchi, “Binaural spectral cues for ultrasonic localization”, IEEE ISCAS
      2008 2110–2113 (2008).
[181] H. Abdalla and T. K. Horiuchi, “Spike-based acoustic signal processing chips
      for detection and localization”, in 2008 IEEE Biomedical Circuits and Systems
      Conference, 225–228 (IEEE) (2008).
[182] T. Horiuchi, “An ultrasonic filterbank with spiking neurons”, IEEE ISCAS 2008
      (2005).
[183] R. Kuc, “Biomimetic sonar and neuromorphic processing eliminate reverbera-
      tion artifacts”, IEEE Sensors J. 7, 361–369 (2007).
[184] R. Kuc, “Biomimetic sonar locates and recognizes objects”, J. Ocean. Eng.,
      IEEE 22, 616–624 (1997).
[185] F. Schillebeeckx, J. Reijniers, and H. Peremans, “Probabilistic spectrum based
      azimuth estimation with a binaural robotic bat head”, in 2008 Fourth Inter-
      national Conference on Autonomic and Autonomous Systems (ICAS), 142–147
      (IEEE) (2008).
[186] F. Schillebeeckx and H. Peremans, “Biomimetic sonar: 3D-localization of mul-
      tiple reflectors”, in IEEE/RSJ International Conference on Intelligent Robots
      and Systems, 3079–3084 (2010).
[187] F. Guarato, L. Jakobsen, D. Vanderelst, A. Surlykke, and J. Hallam, “A method
      for estimating the orientation of a directional sound source from source direc-
      tivity and multi-microphone recordings: Principles and application”, J. Acoust.
      Soc. Am. 129, 1046–1058 (2011).
[188] J. Steckel, A. Boen, and H. Peremans, “Broadband 3-D sonar system using a
      sparse array for indoor navigation”, IEEE Trans. Robot. 29, 161–171 (2013).
[189] J. Steckel and H. Peremans, “A novel biomimetic sonarhead using beamform-
      ing technology to mimic bat echolocation”, IEEE Tran. Ultrason., Ferroelectr.,
      Freq. Control 59, 1369–1377 (2012).
[190] J. Steckel, F. Schillebeeckx, and H. Peremans, “Biomimetic sonar, outer ears
      versus arrays”, in Sensors, 2011 IEEE, 821–824 (2011).

                                         51
[191] J. Steckel and H. Peremans, “BatSLAM: Simultaneous Localization and Map-
      ping Using Biomimetic Sonar”, PLoS ONE 8, e54076 (2013).


                                      52
Chapter 3

Multi-Component Separation and
Analysis of Bat Echolocation Calls

Abstract
The vast majority of animal vocalizations contain multiple FM components with vary-
ing amounts of non-linear modulation and harmonic instability. This is especially true
of biosonar sounds where precise time-frequency templates are essential for neural in-
formation processing of echoes. Understanding the dynamic waveform design by bats
and other echolocating animals may help to improve the efficacy of man-made sonar
through biomimetic design. Bats are known to adapt their call structure based on
the echolocation task, proximity to nearby objects, and density of acoustic clutter.
To interpret the significance of these changes, a method was developed for component
separation and analysis of biosonar waveforms. Techniques for imaging in the time-
frequency plane are typically limited due to the uncertainty principle and interference
cross-terms. This problem is addressed by extending the use of the fractional Fourier
transform to isolate each non-linear component for separate analysis. Once separated,
Empirical Mode Decomposition (EMD) can be used to further examine each compo-
nent. The Hilbert transform may then successfully extract detailed time-frequency
information from each isolated component. This multi-component analysis method is
   The contents of this chapter were published in the Journal of the Acoustical Society of America. 2013
January; 133(1):538–546. [DOI: 10.1121/1.4768877].


                                                  53
applied to the sonar signals of four species of bats recorded in-flight by radiotelemetry
along with a comparison of other common time-frequency representations.


3.1      Introduction
The active sonar call of the big brown bat (Eptesicus fuscus) contains multiple non-
linear FM components that are harmonically related [1]. The scale invariant proper-
ties of this species’ echolocation signals [2, 3] implies that cross-correlation between
the signal and the echo returns are insensitive to in-flight Doppler shifts. Furthermore,
the call of E. fuscus is a multi-component signal that naturally increases the effective
bandwidth and consequently improves range resolution. Despite the advantages for
active sonar pulse design, these non-linear and multi-component characteristics make
it difficult to precisely localize energy in the time-frequency plane.
      Animal vocalizations are typically described using conventional spectrograms,
which have intrinsically low time-frequency resolution. Alternative representations
may better capture the information that animals actually use, particularly since bats
manifest greater time-frequency acuity. Small details in the call signal structure may
appear subtle and unimportant, but could actually lead to statistically significant ob-
servations of the animals’ behavior. An example of nearly indistinct, yet intentional
adaptive pulse design by E. fuscus is described in Hiryu et al. [4]. Using the spectro-
gram, they found that bats shifted echolocation frequencies by several kHz (< 4-8% of
total bandwidth) to avoid pulse-echo ambiguity in dense clutter. Most interesting is
the fact that temporal cross-correlation between the pulse-echo pairs are nearly iden-
tical, which strongly suggests that these bats do not simply use conventional matched
filtering for echo processing.
      Many different time-frequency representations (TFR) are used to process multi-
component, linear, quadratic, and higher-order FM signals. If the signal is stationary,
the Fourier Transform (FT) is an effective tool for analyzing the frequency content.


                                           54
                                                                  0                                                                       0
                      120                                                                      120
                                A                FM3                                                     B
                      100                                         −5                           100                                        −5
                                       FM2
    Frequency (kHz)


                                                                             Frequency (kHz)
                      80                                          −10                          80                                         −10

                      60              FM1                         −15                          60                                         −15

                      40                                          −20                          40                                         −20

                      20                                          −25                          20                                         −25

                       0                                          −30                           0                                         −30
                            0   0.5    1    1.5 2 2.5   3   3.5                                      0   0.5   1    1.5 2 2.5   3   3.5
                                           Time (ms)                                                               Time (ms)

                                                                  0                                                                       0
                      120                                                                      120
                                C                                                                        D
                      100                                         −5                           100                                        −5
    Frequency (kHz)


                                                                             Frequency (kHz)
                      80                                          −10                          80                                         −10

                      60                                          −15                          60                                         −15

                      40                                          −20                          40                                         −20

                      20                                          −25                          20                                         −25

                       0                                          −30                           0                                         −30
                            0   0.5    1    1.5 2 2.5   3   3.5                                      0   0.5   1    1.5 2 2.5   3   3.5
                                           Time (ms)                                                               Time (ms)

Figure 3.1. Four different time-frequency distributions of an FM echolocation call from E. fuscus.
(a) The spectrogram shows that this species of bat produces at least two prominent harmonic com-
ponents (labeled FM1, FM2, etc.), which is a common characteristic among many echolocating bats.
(b) The Wigner-Ville distribution (WVD) provides very good resolution, but interference cross-terms
incorrectly place energy within and between components. (c) Cross-terms are effectively removed at
the cost of resolution in the Smoothed Pseudo WVD. (d) The reassignment method [7] (computed
on c) is a highly effective technique for improving the readability of any TFR. Reassignment works
by remapping the energy distributed in a TFR onto its center-of-gravity; however, it cannot show
details that are unresolved in the base representation. All plots are shown on a normalized decibel
scale.


However, the FT provides little insight into the nature of signals from nonstationary
or nonlinear systems. For instance, quadratic phase (linear FM) signals are poorly
represented by the FT because it is a transform from time to frequency, i.e. not
a joint distribution in time and frequency. A common way around this issue is to
take the FT of short moving windows of the signal in time, thus providing frequency
information as a function of time. This leads us to the short-time Fourier transform
and its squared modulus, the spectrogram. The difficulty with this approach is that
the window must be small enough in time to provide good time resolution and wide


                                                                        55
enough in the bandwidth sense to provide good frequency resolution. These simul-
taneous conflicting objectives lead to leakage of the spectral energy and a generally
smeared appearance in the time-frequency plane. Use of the spectrogram has be-
come ubiquitous due to its fast computation, simple interpretation, and widespread
software integration; however, it is very difficult to resolve fine details from the spec-
trogram alone, especially if attempting to automate the process. Fig. 3.1 illustrates
the spectrogram of an example echolocation call by E. fuscus alongside other TFRs,
including the Wigner-Ville distribution (WVD) [5], the smoothed psuedo-WVD [6],
and the reassignment method [7].
      Many different methods have been used to visualize biosonar signals beyond
the common TFRs. These include time-scale analysis [8], the Fractional Fourier
Transform (FrFT) [9, 10, 11], wavelets [12], and the minimum variance estimator [13].
In these methods only a small number of signals were analyzed to show the processing
technique. For practical applications, it is important to consider how well a method
can automatically extract waveform parameters in a large set of data.
      Recently, a host of TFR tools based on the idea of polynomial phase signal
models have appeared [14, 15, 16, 17]. They generally rely on adaptations to the
ambiguity function including multiple products, lagged versions, higher orders, or
some combination thereof [18, 19, 20]. It is not surprising that this approach has
received a great deal of attention, as the ambiguity function is itself the characteristic
function of the WVD [21]. In other words, the WVD and ambiguity function form
a Fourier pair [22]. A significant reason for not adopting these more mathematically
rigorous parametric models is that often they are defended with the caveat that the
amplitude be constant or slowly varying in time. This condition cannot be guaranteed
for biosonar signals which contain unique amplitude modulations that change with
each emitted pulse.
      Unfortunately there is no single time-frequency technique that is optimized for
all situations. While meaningful insight can be gleaned using parametric models or


                                           56
appropriate TFRs for a specific signal, the notion of using an adaptive or empirical
decomposition is attractive due to the complexity and nonlinearity of the bat’s sound
production system. Imaging techniques to improve time-frequency fidelity would more
easily identify small differences in call structure (as found in Hiryu et al.). Resolving
these differences is critical, however, to understand how these changes are actually
perceived by the bat.
      This paper extends the use of the FrFT and applies several techniques to sepa-
rate and analyze nonlinear harmonic components in biosonar signals. The methodol-
ogy should be easily extrapolated to other highly variable, multi-component signals,
such as calls of other bat species, marine mammal calls and whistles, insect commu-
nication, and voiced-speech.


3.2      Data Collection
The algorithm was developed and refined using a single E. fuscus call recorded at
high signal-to-noise ratio (Fig. 3.1). An ultrasonic free-field microphone (Series 4139,
Br¨
  uel & Kjær) was placed directly in front of the bat on a stationary platform at
approximately 20 cm. A recording was made while the bat performed a 2-choice
discrimination test. The echolocation signal was recorded with a digital audio recorder
(ISC-16, R.C. Electronics) at a 250kHz sampling rate [23]. Typical of this species of
bat, the signal is non-linearly modulated, with two principal harmonics FM1 and
FM2 along with a partial 3rd harmonic, FM3.
      To evaluate the utility of our method, we analyzed a body of existing data.
This consisted of biosonar sounds recorded from four species of bats using a radio
microphone (“Telemike”) carried by the flying bat [4, 24, 25, 26]. The Telemike
includes an electret condenser microphone (FG Series, Knowles Acoustics, IL, USA)
positioned above the bat’s head and attached to a miniature radio transmitter used to
record the sounds without the acoustic artifacts that normally occur when a moving


                                           57
bat is recorded by a stationary microphone. The data set included calls from E.
fuscus, the eastern bent-winged bat (Miniopterus fuliginosus), the Japanese house bat
(Pipistrellus abramus), and the greater horseshoe bat (Rhinolophus ferrumequinum).
For each species the time series contained multiple biosonar signals recorded while
the animal was navigating through a flight room used for testing their responses to
clutter. The flights and recordings were conducted in the laboratories of Hiroshi
Riquimaroux and Shizuko Hiryu at Doshisha University (Kyotanabe, Japan) or at
Brown University. During recording, the signals were digitally sampled at either 384
kHz or 192 kHz [4, 24].


3.3     Methods
The multi-component analysis presented here is a two-part process: separation of
harmonic components followed by mono-component decomposition. Component sep-
aration includes a new use of the FrFT to find a rough approximation of instantaneous
frequency, fi (t), time-varying demodulation centered about fi (t), and a zero-phase
filtering technique that will not affect the phase or group delay of the signal compo-
nent. Mono-component decomposition consists of applying analysis techniques such
as Empirical Mode Decomposition (EMD) and Hilbert spectral analysis. The re-
sulting decomposition produces highly resolvable images of each component in the
time-frequency plane. The reader is referred to the Appendices for an overview of
our definitions of a multi-component waveform and how the Hilbert spectral analysis
can be used to extract this information.


3.3.1   Separation of Harmonic Components

Component separation may be performed in a variety of ways; however, the follow-
ing demonstrates a robust approach that combines the use of the Fractional Fourier
Transform, demodulation, and zero-phase filtering. The FrFT provides an easy way

                                           58
to approximate a component’s instantaneous frequency. We apply a time-varying
bandpass filter along this estimate to isolate the component. Subtracting the result
from the original signal allows the process to be repeated until all components have
iteratively been separated.


3.3.1.1   Fractional Fourier Transform

The fractional Fourier transform (FrFT) and Radon-Wigner transform (RWT) are
both fractional rotations of a signal from the time domain to the frequency domain
in the time-frequency plane. The FrFT can be defined in its more familiar integral
form [27] as

                                       π   φ
                                  e−i( 4 − 2 )
                                                 Z
                                                            1   2 +u2 ) cot(φ)         tu
               F rF T (φ, u) =   p                   x(t)e 2 i(t                 e−i sin(φ) dt   (3.1)
                                   2π sin(φ)


The parameter φ is the angle of rotation in radians and u is the fractional dimension
between time and frequency. Letting φ = α π2 , a rotation of α = 0 is simply the time
series itself and a rotation of α = 1 is a traditional FT, any non-integer rotation
will produce a fractional FT. This can be accomplished easily by forming the Fourier
unitary matrix, raising it to an arbitrary power, α, then multiplying the FT of the
original signal with the matrix. Repeatedly applying the FT to a signal is equivalent
to raising this matrix to an integer power. For example, raising the matrix to 0, 1,
2 and 3, results in the original time series, the FT, the time-reversed series, and the
FT of the time-reversed signal, respectively.
      The RWT is the Radon transform of the WVD. Geometrically, the RWT is a
tomographic transform that combines a rotation of the WVD with a projection onto
a one dimensional axis at some angle of rotation φ. Like the WVD, the RWT results
in a 2D distribution. Unlike the WVD, the RWT provides intensity information not
as a function of time and frequency, but rather as a function of frequency and angle


                                               59
of rotation of the WVD. As a result, the relationship between the RWT and FrFT
follows


                            RW T (φ, u) = |F rF T (φ, u)|2                        (3.2)

That is, the RWT is equivalent to the squared modulus of the FrFT [28, 29, 30].
      It should be noted that, like the conventional FT, the FrFT is a linear operator.
The WVD, and therefore the RWT, are both bilinear operators on the signal. As a
result, the FrFT is a TFR which does not produce the cross-term interference asso-
ciated with bilinear TFRs. Because the RWT is a projection onto a one-dimensional
axis through a line integral at angle α, the two-dimensional, bilinear (quadratic) rep-
resentation loses the cross term interference during the projection, thus preserving
the relationship between the RWT and the FrFT [29].


3.3.1.2   Rough Approximation of Instantaneous Frequency

This method uses a discrete implementation of the Fractional Fourier Transform
(FrFT) [31] to compute the RWT of the analytic signal, x˜(t). Fig. 3.2 shows the
signal from Fig. 3.1 in the rotation-fraction domain. Each column in the image is
formed by computing the RWT of x˜(t) for a specific angle of rotation, α. Computing
the RWT at more angles leads to better α resolution and zero-padding or interpolating
the signal will increase resolution in u. Every (α, u) pair corresponds to a specific
line in the time-frequency plane. For a linear FM signal, fi (t) = f0 + kt can be
precisely estimated by finding its peak in the rotation-fraction plane and solving for
the constants f0 and k as


                                          60
                             1                                                     0

                            0.8                                                    −5

                                                                                   −10
             Fraction (u)
                            0.6
                                                                                   −15
                            0.4
                                                                                   −20
                            0.2                                                    −25

                             0                                                     −30
                             −1   −0.5        0       0.5                    1
                                         Rotation (α)
Figure 3.2. Rotation-fraction domain of the E. fuscus signal. The FrFT is computed on the analytic
time series signal at incremental rotation values, α. The squared modulus, |F rF T |2 , produces the
vertical slices of the rotation-fraction domain. Each (α, u) point in the image corresponds to a unique
line cutting across the time-frequency plane. Once the global peak on the surface is found, points
along the local ridge (inset) represent lines passing through subsections of the nonlinear component
in the time-frequency plane. A polynomial curve is fit to the intersection points of adjacent lines
which results in a rough estimate of fi (t) for one component.


                                             fs2      π
                                      k =−       cot(α )                                         (3.3)
                                             T        2
                                                   1     π
                                     fc =fs (u − )csc(α )                                        (3.4)
                                                   2     2
                                                  T
                                     f0 =fc − k                                                  (3.5)
                                                  2

where fs is the sampling frequency, T is the period of the signal, and fc is the frequency
at the midpoint of the line [32].
       Since the bat’s signal consists of nonlinear FM components, there is no single
peak, but a continuous ridge where multiple (α, u) pairs correspond to lines that
pass through subsections of a component. We make use of this fact by normalizing
the RWT to the highest peak, detecting local points along the ridge above a thresh-
old, then finding the intersection points of the lines from adjacent (α, u) pairs. This

                                                  61
generates points in the time-frequency plane along the most prominent component.
The end points can be extended by projecting out from the first and last intersec-
tion points. Fitting a polynomial or spline curve to these points provides a rough
approximation to fi (t) for one component without a priori information on any FM
parameters.


3.3.1.3   Zero-Phase Component Filtering

A time-varying bandpass filter is effectively applied to the analytic signal along the
instantaneous frequency approximation. This is achieved by first integrating fi (t) to
find the phase law, φi (t), as in Eq. (3.13) and demodulating the signal as


                                  xˇ(t) = x˜(t)e−jφi (t)                          (3.6)

The demodulated complex signal, xˇ(t), is then lowpass filtered to remove unwanted
harmonics and reverberation. The filter bandwidth can be adjusted depending on the
accuracy of the initial fi (t) estimate. Note that a zero-phase forward-backward filter
is required to minimize phase distortions and avoid introducing group delay:


                         Yˇ (ejωT ) = H(e−jωT )H(ejωT )X(e
                                                       ˇ jωT )                    (3.7)

The signal is then remodulated using the negative of the phase law:


                                   y˜(t) = yˇ(t)ejφi (t)                          (3.8)


      Each step is shown in Fig. 3.3 for the 2nd component, FM2. The process of
rough approximation and zero-phase filtering is repeated for subsequent components
(i.e. FM1 and FM3) once the isolated component, y˜(t), is subtracted from the analytic
signal, x˜(t). After each harmonic component has been effectively isolated, this opens
the door for a variety of different processing options.

                                            62
                                       x
                                       ˜(t)                                                        x
                                                                                                   ˇ(t)
                                                      0                                                           0
                       100                                                          100

                                                      −10                                                         −10
    Frequency (kHz)


                                                                 Frequency (kHz)
                        50                                                           50

                         0                            −20                             0                           −20

                      −50                                                          −50
                                                      −30                                                         −30

                      −100   A                                                     −100   B
                                                      −40                                                         −40
                         0        1         2     3                                   0       1         2     3
                                      Time (ms)                                                   Time (ms)

                                       yˇ(t)                                                       y˜(t)
                                                      0                                                           0
                       100                                                          100

                                                      −10                                                         −10
    Frequency (kHz)


                                                                 Frequency (kHz)
                        50                                                           50

                         0                            −20                             0                           −20

                      −50                                                          −50
                                                      −30                                                         −30

                      −100   C                                                     −100   D
                                                      −40                                                         −40
                         0        1         2     3                                   0       1         2     3
                                      Time (ms)                                                   Time (ms)

Figure 3.3. Overview of FM2 component separation using a least-squares cubic approximation of
fi (t). Negative frequencies are shown to accommodate the frequency warping caused by demodula-
tion. (a) The analytic signal, x˜(t), with approximate fi (t) curve for FM2. (b) FM2 is now clearly
separable by frequency after demodulation to 0 Hz (ˇ x(t)). (c) A zero-phase lowpass filter is applied
to remove other components (ˇ   y (t)). (d) FM2 is modulated back using the negative phase law, re-
sulting in y˜(t). Through the process of component separation, the resulting component is free from
non-overlapping echoes, reverberation, and background noise.


3.3.2                   Monocomponent Decomposition

3.3.2.1                      Empirical Mode Decomposition

Empirical mode decomposition (EMD) is a useful technique for analyzing nonlinear
FM signals due to its robustness in handling nonstationary, nonlinear data. The
EMD separates a time-series signal into multiple decompositions known as intrinsic
mode functions (IMFs). An IMF is defined only if (1) the number of extrema and
the number of zero-crossings are equal or at most differ by one, and (2) the mean
of the envelope of the maxima and the envelope of the minima is zero at all points.


                                                            63
This works due to the tacit relationship between zero-crossings and the frequency
spectrum of a signal [33]. IMFs have properties conducive to signal processing, namely
that they are linear and have well behaved Hilbert transforms. Additionally, the
EMD forms a basis which is complete, approximately orthogonal, local, and adaptive.
The orthogonal property of the IMFs ensures that the energy associated with the
distribution is positive, a critical designation for a time-frequency representation.

                          −70         −60        −50            −70         −60        −50
                    100 A                                        B
                      50
                                                    IMF 1                               IMF 2
                        0
  Frequency (kHz)


                       −40         −30        −20                       −20        −10           0
                    100 C                                        D
                      50
                                                    IMF 3                               IMF 4
                        0
                         −50          −40       −30            −70        −60         −50
                    100 E                                         F
                      50                                                              IMF 6−13
                                                    IMF 5
                        0
                                  1         2          3        1                 2          3
                                                        Time (ms)
Figure 3.4. Shown here are results of the empirical mode decomposition on the separated second
harmonic, FM2, from E. fuscus (Fig. 3.3.1.3). Since the EMD works strictly in the time-domain,
interpolation beyond the Nyquist rate is necessary to achieve good performance. FM2 was inter-
polated by a factor of 8 before EMD to avoid aliasing artifacts. Spectrograms for IMF 1 through
5 (a-e) illustrate how energy is distributed amongst the IMFs. High frequency noise is contained
largely in IMFs 1 and 2 (a and b). IMFs 3 and 4 (c and d) contain the strongest parts of the signal
with a weaker part found in IMF 5 (e). Residual low frequency energy is found in IMFs 6 through
13 (combined in f). IMFs 4-6 may be summed and passed on to later processing stages. Since the
decomposition forms a complete basis, summation across all IMFs will result in the original signal.
The color scale depth is set to 30 dB on all plots.


                    The result of the EMD is similar to that of passing the signal of interest through


                                                        64
a filter bank [34]. The key differences are that filtering is not stationary nor restricted
to separation in the time-frequency plane. In this regard, the IMF that results from
the decomposition is composed of the same time-varying frequency modulation of the
original signal with much of the non-coherent signals (noise) and riding waves (DC
to very low-frequency) suppressed. Spectrograms of the IMFs generated from FM2
are shown in Fig. 3.4.


3.3.2.2   Hilbert Spectral Analysis

Computing instantaneous frequency and amplitude from the mono-component signals
provides very useful information that cannot be easily found by other methods. In the
discrete-time implementation, ai (t) is a straightforward absolute value calculation of
the complex analytic signal. Finding fi (t) involves numerical integration and therefore
requires some approximation. Calculation of fi (t) for a filtered analytic component,
y˜(t), can be accomplished directly in discrete-time by


                                       fs
                            fi [k] =        y [k + 1] y˜∗ [k − 1])
                                          ∠(˜                                        (3.9)
                                       2π

for k = 2, 3, 4 . . . N − 1 where k is the discrete-time sample number, N is the total
number of sample points, and fs is the sampling rate [35]. This is immediately
recognized as the central finite difference [36].
      The resulting fi (t) and ai (t) functions (Fig. 3.5a-b) may optionally be smoothed
to compensate for low signal-to-noise ratio using the least-squares Savitzky-Golay
filter [37]. If applied, care should be taken to avoid over-smoothing by using a short
filter length and a sufficient polynomial order. Each component is then combined to
form a precise and high-resolution TFR (Fig. 3.5c).


                                              65
        Freq. (kHz)
                      100   A
                       50
                                 FM1   FM2     FM3
                        0
                         0       0.5   1     1.5        2      2.5         3        3.5
                      −20
                             B
   Amp. (dB)


                      −60

                                 FM1   FM2     FM3
               −100
                   0             0.5   1     1.5        2      2.5         3        3.5
                                                                                               −25
        Freq. (kHz)


                      100   C                                                                  −35

                       50                                                                      −45
                                                                                               −55
                        0
                         0       0.5   1     1.5     2         2.5         3        3.5
                                               Time (ms)
Figure 3.5. Hilbert spectral analysis results showing ai (t) and fi (t) for each harmonic component
of the E. fuscus call (a-b). Each component has its own fi (t) and ai (t) function. (a) and (b) are
combined to form the time-frequency representation shown in (c). The instantaneous amplitude is
plotted on a decibel scale in (b) and is shown with intensity in (c). Line thickness has been increased
in all plots to improve visibility.


3.3.3                 Waveform Synthesis and Ground Truth

An important aspect to these mono-component decomposition techniques is that all of
the original signal information is retained. This implies that recorded biosonar signals
can be decomposed, modified in some way, and finally synthesized into a noise-free
replica of the recorded waveform for detailed acoustic simulations or computational
models of auditory neural processing. This step is also useful to perform a ground
truth by subtracting the synthesized signal from the original. When the initial phase
φ0 (see Sec. B) is properly adjusted, results show negligible error in the time-frequency
plane with only the broadband noise and non-interfering echoes removed from the
signal.


                                                   66
3.4                      Results

3.4.1                    Telemike Data Series

Echolocation signals from E. fuscus and three East Asian bat species were processed
to show the method’s flexibility and ease of use. Data from various Telemike experi-
ments were used in all four cases [4, 25, 26]. First, the biosonar calls were separated
using a simple energy detector and then individually run through multi-component
analysis. The spectrogram of the full time series are shown side-by-side with the
analysis results for each bat in Figure 3.6a-d.

                                    Spectrogram of Telemike Data                                    Overlaid Analysis Results
                                                                                                                                 0
                  100                                                                             100                            −10
            kHz


                   50                                                                              50                            −20
                     0 A                                                                            0 E                          −30
                      0      0.1     0.2       0.3          0.4          0.5          0.6            0       1       2       3
                    90                                                                             90                            0
                                                                                                                                 −10
              kHz


                    45                                                                            45
                                                                                                                                 −20
Frequency


                                                                                                        F
                     0 B                                                                           0                             −30
                            0.05    0.1     0.15      0.2         0.25         0.3         0.35     0            1       2
                    90                                                                            90                             0
                                                                                                                                 −10
              kHz


                    45                                                                            45
                                                                                                                                 −20
                                                                                                        G
                     0 C                                                                           0                             −30
                      0      0.02    0.04      0.06     0.08             0.1          0.12          0            1       2
                    90                                                                            90                             0
                                                                                                                                 −20
              kHz


                    45                                                                            45
                       D                                                                                                         −40
                     0                                                                             0 H
                      0       0.1         0.2       0.3            0.4               0.5            0       10    20   30
                                            Time (seconds)                                                   Time (ms)

Figure 3.6. Multi-component analysis was performed on call sequences from radiotelemetry record-
ings of E. fuscus and three Asian bat species. The spectrogram for the entire time series are shown
for E. fuscus (a), P. abramus (b), M. fuliginosus (c), and R. ferrumequinum (d). The analysis
results for each call are aligned and overlaid in the time-frequency plane (e-h). The color scales
are the sme across each row. Pairs of pulses, known as strobe groups, can be identified by short
inter-pulse timing in the cases of E. fuscus, P. abramus, and R. ferrumequinum. Both P. abramus
and M. fuliginosus emit mono-component non-linear FM waveforms. Although their calls are nearly
identical in time-frequency structure (f and g), only P. abramus is known to emit strobe groups. R.
ferrumequinum use relatively long constant frequency tones with short FM tails at the beginning
and end of each call. The color depth was extended to -50 dB for R. ferrumequinum (d and h) to
show the first harmonic, which is approximately 20 dB weaker than the second in this species. The
E. fuscus data set was collected by Hiryu et al. [4] and the remaining data sets were collected by
Riquimaroux et al. and Hiryu et al. [25, 26].


                    The Telemike data from E. fuscus (Fig. 3.6a) contains 13 echolocation signals
emitted as it entered a densely cluttered array of chains. This data set is the same

                                                                    67
as the example shown in Hiryu et al. [4]. Fig. 3.6b and 3.6c shows spectrograms of
the Telemike data from the eastern bent-windged bat (Miniopterus fuliginosus) and
the Japanese house bat (Pipistrellus abramus). Fig. 3.6d shows eight calls emitted
by the greater horseshoe bat (Rhinolophus ferrumequinum).
        Figure 3.7 shows the results from E. fuscus in more detail. The pulse-to-pulse
time intervals were used to identify strobe groups, which are closely spaced pairs of
calls with short time intervals [1, 4]. The figure shows the strobe groups identified
with brackets. It is worth noting that FM1 is stronger than FM2 by approximately
8 dB due to the off-axis microphone placement of the Telemike. In this data set, the
first four pulses were emitted early in the clutter field where pulse-echo ambiguity
was present. The last four pulses were emitted after pulse-echo ambiguity subsided.
Hiryu et al. found that when pulse-echo ambiguity was strong, the bats shifted the
tail-end frequency for each strobe group pair. This behavior was absent when pulse-
echo ambiguity was not present. The results from our method confirm this occurred
in the example data set, but it is significantly more pronounced than when looking
at the spectrogram alone.


3.4.2     Synthesized Multi-Component FM Analysis

To demonstrate how the proposed technique can adapt to small time-frequency per-
turbations, a multi-component linear FM waveform is generated with a small sinu-
soidal modulation. The combined FM signal can be defined using


                                          µ0 2   B
                          φ(t) = f0 t +     t +      sin 2πfm t                 (3.10)
                                          2     4πfm

where f0 is the initial frequency, µ0 is the linear sweep rate, B is the amplitude of
sinusoidal modulation (in Hz) and fm is the modulation frequency. This phase law is
used directly in Eq. (3.12) to construct the discrete-time noiseless components, which
are then added together.


                                             68
                                                           FM2
                          120


        Frequency (kHz)
                                       1    2          3   strobe groups   4      5       −10
                          100
                                  80                                                      −20

                                  60
                                       A                                       2 ms
                                                                                          −30
                                  40
                                                           FM1
                                  80                                                      0
                Frequency (kHz)


                                       1    2         3                    4      5


                                  60
                                                                                          −10
                                  40
                                                                                          −20
                                  20   B                                       2 ms

                                           Artificially Compressed Interval Time
Figure 3.7. E. fuscus was previously found to use slight frequency shifts to avoid pulse-echo
ambiguity. Multi-component analysis results are plotted separately for FM2 (a) and FM1 (b). The
time duration of each pulse component matches the scale bar, but the inter-pulse interval time is
artificially compressed. This was done to show the fine detail in each call, which cannot be easily seen
in the overlaid plot (Fig. 3.6e). The results reveal a clear distinction between the lowest frequency
in each harmonic component for strobe groups 1 and 2. As noted in Hiryu et al. this separation
becomes insignificant when pulse-echo ambiguity is no longer a problem. This is shown circled in
strobe groups 4 and 5.


       Fig. 3.8a shows the desired fi (t) functions to synthesize a multicomponent si-
nusoidal FM riding on an linear FM. The sinusoidal riding wave varies by ± 2.5 kHz,
but neither the Wigner-Ville nor the Reassignment method (Fig. 3.8b-c) can resolve
these variations. The proposed component separation and Hilbert spectral analysis
faithfully reproduces the original fi (t) curves (Fig. 3.8d).


3.5               Discussion
Many decompositions, including Hilbert spectral analysis and EMD, do not perform
well on multi-component signals. In fact, unless the multi-component signal is first
decomposed into the corresponding mono-component signals, the concepts of fi (t) and

                                                              69
                      120                                                       120
                            A                                                          B
                      100                                                       100

    Frequency (kHz)


                                                              Frequency (kHz)
                      80             FM2                                         80

                      60                                                         60

                      40         FM1                                             40

                      20                                                         20

                       0                                                          0
                        0        1            2    3                                       1         2     3
                                       Time (ms)                                               Time (ms)

                      120                                                       120
                            C                                                          D
                      100                                                       100


                                                             Frequency (kHz)
    Frequency (kHz)


                      80                                                         80

                      60                                                         60

                      40                                                         40

                      20                                                         20

                       0                                                          0
                                 1           2     3                               0       1         2     3
                                       Time (ms)                                               Time (ms)

Figure 3.8. (a) The original fi (t) functions used to synthesize two linear plus sinusoidal FM compo-
nents, (b) Wigner-Ville distribution, (c) Smoothed Pseudo-WVD, and (d) results after separation of
components with the proposed method (d). Despite having better resolution, the WVD is perfectly
localized for up to a second order phase law, such as a linear FM or a constant tone. This syn-
thetic FM signal demonstrates that methods we consider “high fidelity” may not resolve small, but
significant features in natural signals such as biosonar calls. For cases where the signal generation
mechanism is unknown or not well understood, it is best not to assume any TFR is optimal.


ai (t) lose physical meaning [38, 39, 40, 41]. How does one define the instantaneous
frequency of a signal that has overlapping functions of frequency at a single point
in time? Therefore, these signals must first be separated into mono-components and
analyzed individually. Using such a technique, signal parameter estimation is not
restricted to the coarse resolution of a spectrogram or interference cross-terms that
plague other high-resolution methods.
                      We have presented a technique for isolating and processing individual compo-
nents of the call from E. fuscus based on the fractional Fourier transform, time-varying
demodulation, EMD, and Hilbert spectral analysis. The method can be applied to
any frequency modulated multi-component signal provided a rough estimate of the

                                                        70
instantaneous phase is achievable and components are separable in the time-frequency
plane. Algorithm parameters can be freely adjusted to allow for an automated algo-
rithm with various types of signals. Ultimately, we arrive at a TFR that is highly
localized in both time and frequency.
      The EMD has important insights to offer in the realm of biological sonar. It was
asserted [13] that the EMD technique is not generally efficient for estimating fi (t) of
bat calls. We do not feel that is accurate. When recording an E. fuscus echolocation
signal along the main response axis, the dominant signal energy typically transitions
from the first to the second harmonic. This was offered as a reason to avoid the EMD,
as the decomposition tracks the strongest energy in the signal. We have shown that
a simple technique for isolating and separating the components can and does provide
effective relief of this problem. Second, the EMD is not solely designed to break a
multi-component signal into mono-components. The property of most importance is
the similarity to the time-varying constant Q filter bank. In this way, EMD is more
similar to the Minimum Variance Estimator (MVE) technique, which Kopsinis et al.
endorse. This is due to the strong relationship between zero-crossings and spectral
content [33].
      Since its inception, the EMD has provided insights into a great many systems
that are categorized by nonlinear and nonstationary signals. However, the problems
with EMD have been well documented [13, 34, 42]. The lack of mathematical rigor
and definition related to the EMD is often identified as a source for criticism. If the
EMD is applied carefully and the results scrutinized, this concern can be effectively
mitigated by applying known techniques to serve as a model for comparison.
      Recent advances have been made with empirical-based methods. The normal-
ized Hilbert Transform, the normalized amplitude Hilbert Transform, and their rela-
tionship to the signal quadrature help to mitigate some of the restrictions imposed
by Bedrosian and Nutall [43, 44, 45]. In certain instances, the error between the
approximated Hilbert transform and the quadrature can produce spectral artifacts in


                                          71
the Hilbert spectral analysis. In other instances, the EMD can highlight the issue of
undersampling.
      In conclusion, higher resolution time-frequency techniques are necessary to un-
derstanding biosonar. This paper describes one possible solution to the problem of
multi-component time-frequency analysis. Further developments in empirical decom-
position techniques will enable new ways of evaluating non-linear processes.


3.6     Acknowledgments
This work was funded through internal investments by the Naval Undersea Warfare
Center, Division Newport, RI and ONR grant N00014-09-1-0691. The authors wish
to thank Hiroshi Riquimaroux and Shizuko Hiryu for providing time series data from
recordings using the Telemike recording system, Ivars Kirsteins and Lee Estes for dis-
cussions on the Fractional Fourier Transform, and Laura Kloepper and Andrea Sim-
mons for editorial suggestions. Figures showing the WVD, smoothed pseudo-WVD,
and reassignment method for comparison were produced using the Time-Frequency
Toolbox for MATLAB [46].


A     Multi-Component Frequency-Modulated Wave-
      forms
Many bat echolocation signals consist of components (usually harmonics) with a
varying degree of amplitude and phase modulation. The multi-component version of
the big brown bat’s echolocation call is a summation of each individual FM waveform,
or

                                            N
                                            X
                                   s(t) =         x˜n (t)                       (3.11)
                                            n=1


                                            72
for N independent harmonically related components, x˜n (t). Given this assumption,
each component therefore has a time-dependent amplitude and frequency, or more
precisely is an instantaneous function of time. A signal component can be defined in
its analytic form as


                                 x˜(t) = ai (t)ejφi (t)+jφ0                     (3.12)

where ai (t) is the instantaneous amplitude, φi (t) is the instantaneous phase modula-
tion (or phase law), and φ0 is the initial phase of the complex exponential. The phase
law is related to the instantaneous frequency, fi (t), by

                                               Z     T
                                 φi (t) = 2π             fi (t)dt               (3.13)
                                                 0

      In this manner, we assume that the bat’s multi-component FM waveforms can
be completely described by defining ai (t), fi (t), and φ0 for each harmonic component.


B     Hilbert Spectral Analysis of Modulated Wave-
      forms
We present the formulation below in continuous time for the purpose of familiarity.
Assume for computational purposes that the signal of interest, x[n], is obtained by
                                                                                      1
sufficiently sampling a band-limited signal x(t) such that x[n] = x(nT ), where T =   fs

is the sampling interval chosen to avoid aliasing.
      If a real mono-component signal, x(t), fits the criteria for a modulated wave-
form, then we can extract the parameters of interest directly from estimates of fi (t)
and ai (t). This requires first converting the original mono-component signal into its
complex analytic form using the Hilbert Transform, H, and is achieved with


                                   x˜(t) = x(t) + xˆ(t)                         (3.14)


                                            73
where x(t) is the purely real signal under consideration and xˆ(t) is the purely imagi-
nary H{x(t)}. This is calculated as follows

                                            Z   ∞
                                  xˆ(t) =           x(τ )h(t − τ )dτ                (3.15)
                                            −∞

              1
with h(t) =   πt
                 .   The integral can be solved by using Cauchy’s principal value theorem;
however, it should be noted that many simple approximations exist for a discrete-time
implementation. The resulting analytic signal will consist only of positive spectral
components in the frequency domain. This signal representation is convenient since
it provides the information to fully describe a mono-component modulated signal.
Once in this form, finding an estimate of the instantaneous amplitude, phase, and
frequency is given by


                                            p
                             ai (t) =|˜
                                      x(t)| =  Re{˜x}2 + Im{˜x}2                    (3.16)
                                                        x}
                                      x(t) = arctan Im{˜
                                                           
                             φi (t) =∠˜                 x}
                                                     Re{˜
                                                                                    (3.17)


and using the relation in (3.13),


                                                     1 d
                                      fi (t) =            φi (t)                    (3.18)
                                                    2π dt

The issue of finding the constant φ0 in (3.12) can be resolved by optimizing the phase
alignment in time between the original signal and a waveform synthesized using the
estimated parameters, but the value depends largely on the arbitrarily defined time
origin. As long as φ0 is consistent between harmonics, it need not be exact for analysis
purposes. The general use of the Hilbert transform in estimation of fi (t) and ai (t)
has been termed elsewhere as Hilbert spectral analysis [42].


                                                    74
References
 [1] A. Surlykke and C. F. Moss, “Echolocation behavior of big brown bats, Eptesicus
     fuscus, in the field and the laboratory.”, J. Acoust. Soc. Am. 108, 2419–2429
     (2000).
 [2] B. Harris and S. Kramer, “Asymptotic evaluation of the ambiguity functions of
     high-gain FM matched filter sonar systems”, in Proc. IEEE, 2149–2157 (1968).
 [3] R. Altes and E. Titlebaum, “Bat signals as optimally Doppler tolerant wave-
     forms”, J. Acoust. Soc. Am. 48, 1014–1020 (1970).
 [4] S. Hiryu, M. E. Bates, J. A. Simmons, and H. Riquimaroux, “FM echolocating
     bats shift frequencies to avoid broadcast-echo ambiguity in clutter”, Proc. Natl.
     Acad. Sci. 107, 7048–7053 (2010).
 [5] S. Kay and G. Boudreaux-Bartels, “On the optimality of the Wigner distribution
     for detection”, in Acoustics, Speech, and Signal Processing, IEEE International
     Conference on ICASSP ’85, 1017–1020 (1985).
 [6] W. Martin and P. Flandrin, “Wigner-Ville spectral analysis of nonstationary
     processes”, IEEE Trans. Acoust., Speech, Signal Process. 33, 1461–1470 (1985).
 [7] F. Auger and P. Flandrin, “Improving the readability of time-frequency and time-
     scale representations by the reassignment method”, IEEE Trans. Signal Process.
     43, 1068–1089 (1995).
 [8] B. Ristic and B. Boashash, “Scale domain analysis of a bat sonar signal”, Time-
     Frequency and Time-Scale Analysis, 1994., Proc. of the IEEE-SP International
     Symposium on 373–376 (1994).
 [9] C. Capus, Y. Rzhanov, and L. Linnett, “The analysis of multiple linear chirp
     signals”, Time-scale and Time-Frequency Analysis and Applications (Ref. No.
     2000/019), IEE Seminar on 4 (2000).
[10] C. Capus and K. Brown, “Short-time fractional Fourier methods for the time-
     frequency representation of chirp signals”, J. Acoust. Soc. Am. 113, 3253–3263
     (2003).
[11] C. Capus, Y. Pailhas, K. Brown, D. M. Lane, P. W. Moore, and D. Houser, “Bio-
     inspired wideband sonar signals based on observations of the bottlenose dolphin
     (Tursiops truncatus).”, J. Acoust. Soc. Am. 121, 594–604 (2007).
[12] S. Olhede and A. Walden, “A generalized demodulation approach to time-
     frequency projections for multicomponent signals”, Proc. R. Soc. A 461, 2159
     (2005).
[13] Y. Kopsinis, E. Aboutanios, D. Waters, and S. McLaughlin, “Time-frequency and
     advanced frequency estimation techniques for the investigation of bat echoloca-
     tion calls”, J. Acoust. Soc. Am. 127, 1124–1134 (2010).


                                         75
[14] S. Peleg and B. Friedlander, “The discrete polynomial-phase transform”, IEEE
     Trans. Signal Process. 43, 1901–1914 (1995).
[15] S. Peleg and B. Friedlander, “Multicomponent signal analysis using the
     polynomial-phase transform”, IEEE Trans. Aerosp. Electron. Syst. 32, 378–387
     (1996).
[16] P. Wang, I. Djurovic, and J. Yang, “Instantaneous Frequency Rate Estimation
     Based On the Robust Cubic Phase Function”, in Acoustics, Speech and Signal
     Processing (ICASSP ’06) Proceedings, IEEE International Conference on, 89–92
     (2006).
[17] P. O’Shea, “A fast algorithm for estimating the parameters of a quadratic FM
     signal”, IEEE Trans. Signal Process. 52, 385–393 (2004).
[18] S. Barbarossa and V. Petrone, “Analysis of polynomial-phase signals by the
     integrated generalized ambiguity function”, IEEE Trans. Signal Process. 45, 316–
     327 (1997).
[19] S. Barbarossa, A. Scaglione, and G. Giannakis, “Product high-order ambiguity
     function for multicomponent polynomial-phase signal modeling”, IEEE Trans.
     Signal Process. 46, 691–708 (1998).
[20] C. Ioana, “Time-frequency analysis using warped-based high-order phase mod-
     eling”, EURASIP J. Applied Signal Processing 2856–2873 (2005).
[21] L. Cohen, Time-Frequency Analysis: Theory and Applications (Prentice Hall
     PTR, Englewood Cliffs, NJ) 299 (1995).
[22] F. Hlawatsch and G. Boudreaux-Bartels, “Linear and quadratic time-frequency
     signal representations”, IEEE Signal Process. Mag. 9, 21–67 (1992).
[23] J. A. Simmons, M. Ferragamo, C. F. Moss, S. B. Stevenson, and R. A. Altes,
     “Discrimination of jittered sonar echoes by the echolocating bat, Eptesicus fus-
     cus: the shape of target images in echolocation.”, J. Comp. Physiol. A 167,
     589–616 (1990).
[24] S. Hiryu, Y. Shiori, T. Hosokawa, H. Riquimaroux, and Y. Watanabe, “On-board
     telemetry of emitted sounds from free-flying bats: compensation for velocity and
     distance stabilizes echo frequency and amplitude.”, J. Comp. Physiol. A 194,
     841–851 (2008).
[25] H. Riquimaroux and S. Hiryu, “Findings on bat sonar through Telemike system”,
     J. Acoust. Soc. Am. 131, 3422 (2012).
[26] S. Hiryu, N. Matsuta, S. Mantani, E. Fujioka, H. Riquimaroux, and Y. Watanabe,
     “On-board telemetry of biosonar sounds from free-flying bats”, J. Acoust. Soc.
     Am. 131, 3522 (2012).
[27] L. E. Estes, “Revisiting an Eigenfunction perspective on the ordinary and frac-
     tional Fourier transforms”, NUWC-TM-12-010, NUWC Division Newport, RI
     (2012).

                                         76
[28] J. Wood and D. Barry, “Radon transformation of time-frequency distributions for
     analysis of multicomponent signals”, IEEE Trans. Signal Process. 42, 3166–3177
     (1994).
[29] A. W. Lohmann and B. H. Soffer, “Relationships between the Radon-Wigner
     and fractional Fourier transforms”, J. Opt. Soc. Am. A 11, 1798–1801 (1994).
[30] O. Akay and G. Boudreaux-Bartels, “Fractional convolution and correlation via
     operator methods and an application to detection of linear FM signals”, IEEE
     Trans. Signal Process. 49, 979–993 (2001).
[31] H. M. Ozaktas, O. Arikan, M. A. Kutay, and G. Bozdagt, “Digital computation
     of the fractional Fourier transform”, IEEE Trans. Signal Process. 44, 2141–2150
     (1996).
[32] R. Jacob, T. Thomas, and A. Unnikrishnan, “Applications of fractional Fourier
     transform in sonar signal processing”, IETE J. Res. 55, 16 (2009).
[33] R. Kumaresan and Y. Wang, “On the duality between line-spectral frequencies
     and zero-crossings of signals”, IEEE Trans. Speech Audio Process. 9, 458–461
     (2001).
[34] P. Flandrin, G. Rilling, and P. Goncalves, “Empirical mode decomposition as a
     filter bank”, IEEE Signal Process. Lett. 11, 112–114 (2004).
[35] S. Kay, “A fast and accurate single frequency estimator”, IEEE Trans. Acoust.,
     Speech Signal Process. 37, 1987–1990 (1989).
[36] J. H. Mathews and K. D. Fink, Numerical Methods Using MATLAB, 3rd edition
     (Prentice Hall PTR, New York) 662 (1998).
[37] A. Savitzky and M. J. E. Golay, “Smoothing and differentiation of data by sim-
     plified least squares procedures.”, Anal. Chem. 36, 1624–1639 (1964).
[38] B. Boashash, “Estimating and interpreting the instantaneous frequency of a sig-
     nal. I. Fundamentals”, in Proc. IEEE, 520–538 (1992).
[39] B. Boashash, “Estimating and interpreting the instantaneous frequency of a sig-
     nal. II. Algorithms and applications”, in Proc. IEEE, 540–568 (1992).
[40] P. Oliveira and V. Barroso, “On the concept of instantaneous frequency”, in
     Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE
     International Conference on, 2241–2244 (1998).
[41] R. Kumaresan and A. Rao, “Model-based approach to envelope and positive in-
     stantaneous frequency estimation of signals with speech applications”, J. Acoust.
     Soc. Am. 105, 1912 (1999).
[42] N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N. C. Yen,
     C. C. Tung, and H. H. Liu, “The empirical mode decomposition and the Hilbert
     spectrum for nonlinear and non-stationary time series analysis”, Proc. R. Soc. A
     454, 903–995 (1998).

                                         77
[43] N. E. Huang, Z. Wu, S. R. Long, K. Arnold, X. Chen, and K. Blank, “On
     instantaneous frequency”, Adv. Adapt. Data Anal 1, 177–229 (2009).
[44] E. Bedrosian, “A product theorem for Hilbert transforms”, Proc. IEEE 51, 868
     – 869 (1963).
[45] A. Nuttall and E. Bedrosian, “On the quadrature approximation to the Hilbert
     transform of modulated signals”, Proc. IEEE 54, 1458 – 1459 (1966).
[46] F. Auger, P. Flandrin, P. Goncalves, and O. Lemoine, “Time-frequency toolbox”,
     Technical Report (1996), http://tftb.nongnu.org; last accessed February 1,
     2012.


                                        78
Chapter 4

High Resolution Acoustic Measure-
ment System and Beam Pattern
Reconstruction Method for Bat
Echolocation Emissions

Abstract
Measurements of the transmit beam patterns emitted by echolocating bats have pre-
viously been limited to cross-sectional planes or averaged over multiple signals using
sparse microphone arrays. To date, no high-resolution measurements of individual
bat transmit beams have been reported in the literature. Recent studies indicate that
bats may change the time-frequency structure of their calls depending on the task,
and suggest that their beam patterns are more dynamic than previously thought. To
investigate beam pattern dynamics in a variety of bat species, a high-density recon-
figurable microphone array was designed and constructed using low-cost ultrasonic
microphones and custom electronic circuitry. The planar array is 1.83 meters wide by
1.42 meters tall with microphones positioned on a 2.54 cm square grid. The system
can capture up to 228 channels simultaneously at a 500 kHz sampling rate. Beam
patterns are reconstructed in azimuth, elevation, and frequency for visualization and
further analysis. Validation of the array measurement system and post-processing
   The contents of this chapter were published in the Journal of the Acoustical Society of America. 2014
January; 135(1):513–520. [DOI: 10.1121/1.4829661].


                                                  79
functions is shown by reconstructing the beam pattern of a transducer with a fixed
circular aperture and comparing the result with a theoretical model. To demonstrate
the system in use, transmit beam patterns of the big brown bat, Eptesicus fuscus,
are shown.


4.1      Introduction
Approximately 1,200 species of bats exist worldwide and nearly 1,000 of these rely
primarily on the active probing of echolocation to gather information about their
surroundings [1]. Many bat species appear to have evolved different strategies for
hunting, foraging, and navigation. The ultrasonic echolocation signals used by bats
are generally classified as constant frequency, frequency-modulated (FM), or a com-
bination of the two. Depending on the species the source of emitted sound is either
through the bat’s mouth or a noseleaf, which both have unique and highly com-
plex structural properties. These reflective surfaces provide directivity to the sound
which is highly frequency dependent [2]. The spatial directivity of the echolocation
sound is known as the transmit beam pattern. Combined with the receive patterns
of the ears, these beam patterns control the spatial information that is fundamental
to echolocation.
      The most commonly reported measurements of biosonar transmit beam pat-
terns are in controlled laboratory environments. Beam measurements from a single
echolocation signal are traditionally limited to cross-sections with line arrays arranged
in azimuth, elevation, or both. Another common approach combines the signals re-
ceived over multiple echolocation calls. This averaging technique works well if the
beam pattern is guaranteed to remain constant throughout the experiment.
      One of the earliest reported observations of transmit beam directivity of echolo-
cating bats was by Griffin [3]. Following these early studies of little brown bats (Myotis
lucifugus), Simmons published the beam pattern of the mustached bat (Pteronotus


                                           80
parnellii ) and the big brown bat (Eptesicus fuscus) while stationed on a platform with
a four-channel microphone array [4]. Detailed measurements for E. fuscus were later
published by Hartley and Suthers who used a single microphone and combined the
measurements over multiple echolocation calls [5]. They found that a reasonable ap-
proximation to the big brown bat’s beam in azimuth was a circular piston transducer
with an acoustic aperture comparable to the width of the mouth (4.7 mm radius).
Ghose and Moss [6] reconstructed the beams used by E. fuscus in flight using the
envelope of a narrow frequency band centered at 35 kHz, which corresponds to the
strongest peak in the fundamental harmonic component. Interestingly, some [5, 7]
have noted that E. fuscus emits a beam with two distinct vertical lobes. In a field
study, Surlykke et al. recorded signals emitted by Myotis daubentonii, estimated the
beam pattern by tracking a single bat, and averaged over multiple approaches toward
a four-channel microphone array [8].
      Despite the many different approaches to measuring the bat’s echolocation
beam, until recently these studies have assumed a static transmit beam. Investi-
gations by Yovel et al. noted that the Egyptian fruit bat (Rousettus aegyptiacus)
points its echolocation beam slightly off axis to simultaneously optimize both target
detection and localization during flight [9]. Matsuta et al. designed a 31-channel mi-
crophone array to measure the dynamic beams of Rhinolophus ferrumequinum [10].
This array included an “O-shaped” planar dimension in addition to the horizontal and
vertical planes. Transmit beam measurements have also recently been reported from
various vespertilionid species [11, 12, 13], two emballonurid bats [14], and Trachops
cirrhosis [15].
      In addition to empirical measurements of biosonar beam patterns, computa-
tional methods are now becoming practical. M¨
                                            uller developed a numerical technique
based on finite element modeling that predicts the transmit and receive beams from
computed tomography scans of the noseleaf and external ears (pinnae and tragus) for
numerous bat species [2, 16, 17]. A large library of transmit and receive beam pat-


                                          81
terns have been assembled; however, this work currently excludes transmit beams of
species producing echolocation sounds through the mouth. In several recent papers,
Vanderelst et al. have used the finite element method on estimating bats’ transmit
and received beams [18, 19, 20, 21].
      Biosonar measurement systems are also being pioneered using underwater ar-
rays of hydrophones. Investigations with echolocating marine mammals such as the
bottlenose dolphin (Tursiops truncatus) and false killer whale (Pseudorca crassidens)
have demonstrated with planar arrays that their beams are dynamic and may be
shaped and/or steered depending on echolocation task [22, 23]. Multi-element, high
resolution, underwater hydrophone arrays continue to provide information on the
shape and dynamics of beam patterns of odontocetes [24, 25].
      Although aerial and undersea echolocating mammals have evolved unique acous-
tic structures and waveforms, they do exhibit similar performance characteristics [26].
It may be useful to quantitatively compare the adaptive beamforming techniques be-
tween bats and cetaceans. These investigations into the dynamics of beam formation
would provide insight into biosonar target localization and tracking strategies that
may have significant implications for improving man-made sonar and radar systems.
In this paper we introduce and describe a new apparatus and method for recording
bat echolocation beams in the laboratory using low cost, commercially available mi-
crophones that provide unprecedented resolution and accuracy for measuring bats’
dynamic echolocation beams. Beam measurements from both a man-made projector
and the big brown bat (E. fuscus) are shown for system demonstration.


4.2     Data Collection
A large reconfigurable microphone array was designed and constructed using low-
cost ultrasonic microphones and custom analog and digital interface electronics. The
microphone units are silicon integrated circuits based on micro electro-mechanical


                                          82
systems (MEMS) technology (SPM0404UD5, Knowles Acoustics, Itasca, IL). The ar-
ray backplane is 1.42 m tall by 1.83 m wide and consists of 16 printed circuit board
panels attached to a machined aluminum frame. The surface of the array is covered
with 2.54 cm thick acoustic foam panels (Class A™ Melamine Foam, American Micro
Industries, Chambersburg, PA) with cutouts for the 2.0 cm by 1.3 cm microphone
preamplifier circuit boards. The foam panels reduce echo backscatter by approxi-
mately 15 dB across the frequency bands used by echolocating bats. Measurements
show that the presence of the surrounding foam does not affect the omni-directional
beams of the MEMS microphones.
      Sensors can be placed anywhere on the planar array within a 2.54 cm pitch grid.
For a sound source at 1 meter and centered normal to the array plane, the maximum
beam coverage in azimuth and elevation is 84◦ and 70◦ , respectively. The microphones
were initially positioned uniformly on the planar array, therefore the minimum and
maximum spacing varied with angle. Based on the geometry, a minimum element
spacing of 3.39◦ horizontal and 5.18◦ vertical was located at the edges of the array.
A maximum element spacing of 5.80◦ horizontal and 7.24◦ vertical was found at the
center of the array.
      Acoustic signals transduced by the microphones are band-pass filtered between
10 kHz and 120 kHz, amplified, and synchronously sampled on 228 channels at a
programmable rate up to 500 kHz. Digital signals are collected with a custom high-
speed data recorder based on field programmable gate array (FPGA) technology. The
FPGA’s parallel interface design enables data to be simultaneously sampled from each
channel’s analog-to-digital converter. Figure 4.1 shows the fully assembled array and
microphone preamplifier circuit boards.
      Data can be recorded either continuously or in short bursts triggered by an
echolocation call received on a separate microphone. When using the trigger system,
the amplitude envelope of a monitored microphone is compared with a threshold in
real-time that enables the recording system for 10–20 ms. This duration is sufficient to


                                          83
Figure 4.1. (a) Photograph of the fully constructed microphone array. Acoustic foam (not pictured)
with microphone cutouts was placed on the face of the array to reduce echo backscatter during
beam pattern measurements. With acoustic foam installed, the reflected energy was attenuated
by approximately 15 dB across the entire frequency band of 10 kHz to 100 kHz. (b) Close-up
view of a microphone preamplifier circuit board showing the integrated MEMS microphone unit.
Preamplifier and filter circuitry are located on the back of the circuit board. Mechanical alignment,
power distribution, and signal routing are provided by the backplane.


capture the signal on each of the microphones in the array while providing the ability
to record a much longer duration experiment without exceeding data processing and
storage limits.


                                                 84
4.3       Methods

4.3.1     Beam Pattern Reconstruction

Recorded acoustic data are post-processed using functions written in MATLAB. Fig-
ure 4.2 shows a flow diagram of the entire beam reconstruction process. Each data
channel is first mapped to its known planar array coordinates. Sonar signals are then
identified by an energy detector. The next series of steps are performed iteratively
over each identified sonar signal.
        To accurately reconstruct the beam pattern from each signal, array data must be
mapped from the planar array coordinates to a spherical coordinate system centered
at the source of the sound (see Figure 4.3). The echolocation calls are localized in
azimuth, elevation, and range using time difference of arrival (TDOA) to triangulate
the sound’s point of origin [27]. Once the position of the source is known, signals
from each channel are aligned in time.
        The echolocation calls emitted by most species of bats consist of multiple non-
stationary harmonic components. The multi-component signals produced by some bat
species overlap in frequency; therefore, each harmonic component was analyzed indi-
vidually. Components are first identified in the data by application of the fractional
Fourier transform (FrFT) [28] and then extracted using a time-variant zero-phase fil-
ter as outlined in [29]. Separation of harmonics is necessary to examine the possibility
that the beams change over time within the same broadcast. Another benefit to this
approach is that it improves signal-to-noise ratio by removing non-interfering echoes
and reverberation from the data.
        Although the MEMS sensor-to-sensor magnitude and phase response agree very
well due to tightly controlled fabrication processes, the frequency response is not
flat. To correct for this variability each channel is digitally equalized with a zero-
phase auto-regressive moving-average (ARMA) filter [30] that inverts the frequency
response of the microphones, preamplifiers, and digital converter circuitry based on

                                           85
                    Start
                                                                       Localize point
                                                                     source with TDOA

              Map data channels
             to array coordinates
                                                                    Realign calls in time


              Detect call events
                                                                   Separate harmonics
                                                                   with time-varying filter


              Process next call                                 Equalize frequency response
                                                                from microphone calibration


       Yes                                                        Correct for transmission
             Unprocessed Calls?                                    and absorption losses

                            No

                                                                    Estimate frequency
             Analyze beam data                                    content of each channel


                                                                Interpolate beam magnitude
                                                                  over angular coordinates


Figure 4.2. Flow chart describing the signal processing steps to reconstruct each beam. After
identifying each echolocation call in a data set, calls are processed iteratively to reconstruct the
beam patterns. Once complete, the reduced set of beam data can be visualized and analyzed.


calibration data from each microphone. The zero-phase is necessary to avoid intro-
ducing frequency-dependent phase shifts and group delay effects. Additional details
of the calibration procedure are discussed in the following section.
      The Euclidean distance between a sound source and each microphone varies
significantly at close range. Furthermore, frequency absorption becomes dominant
for the high frequencies considered here at only one to two meters in distance. Given
the distance, d, between each microphone and the sound’s point of origin, transmis-


                                                86
                        0.8
                        0.6
                        0.4
      Y position (m)

                        0.2
                         0
                       −0.2
                       −0.4
                       −0.6
                       −0.8
                          0.8
                                0.4
                                      0
                                       −0.4                            0.2 0
                                           −0.8
                                                       1   0.8 0.6 0.4
                          X position (m)                         Z position (m)
Figure 4.3. Diagram showing microphone sensor positions mapped to spherical coordinates with
the sound source positioned at the origin. In the example shown, the planar array is located 1 m
from a point source centered about the middle of the array coordinates. Beam pattern coverage and
resolution depend upon the sound source position relative to the array.


sion loss effects due to both spherical spreading and frequency dependent absorption
are estimated and corrected by computational means. Spherical spreading losses
contribute an overall attenuation in pressure proportional to 1/d and is indepen-
dent of frequency. The atmospheric absorption coefficient, α, varies significantly as
a function of frequency and is dependent upon environmental conditions of ambient
temperature, relative humidity, and atmospheric pressure [31]. Once absorption has
been calculated in conventional units of dB/m, the combined transmission loss at a
specific distance, d, can be estimated as a function of frequency:


                                                  87
                                                 d
                          T L(d, f ) = 20log10      + α(f )(d − d0 )               (4.1)
                                                 d0


where d0 is the reference distance of the sound source (typically 0.1 m for bat sonar).
The desired magnitude response to correct for transmission losses in pressure mea-
surements at a particular distance, d, is therefore


                                  Hd (f ) = 10T L(d,f )/20 .                       (4.2)

        Attenuation in physical systems implies the presence of dispersion and phase
shift to guarantee causality [32]; however, the effects on phase can be ignored at
acoustic frequencies of interest here. Therefore, transmission losses exhibit a low-pass
filter response that is well modeled by a zero-phase moving-average (MA) filter [14].
This transfer function model is unique for each source-sensor pair and is not pre-
calculated as for microphone channel equalization.
        The final steps in reconstructing the beam pattern are to estimate the frequency
spectrum at each microphone position and interpolate along the angular coordinates.
The magnitude and phase response of each harmonic component is extracted through
spectral analysis via the fast Fourier transform (FFT). The values at each spatial
angle are interpolated across a fine uniform grid with 1◦ resolution to simplify visu-
alization and data analysis. Interpolation is achieved by using the natural neighbor
interpolation method on linear units of amplitude [33].


4.3.2     Microphone and System Calibration

A detailed calibration of each microphone channel and supporting electronics was per-
formed to ensure meaningful acoustic beam measurements. A custom electrostatic
transducer with a fixed circular aperture of 2.0 cm was used as a broadband sound
source to validate the array measurement system [34]. For a reference measurement,


                                             88
the projector was positioned 10 cm from a calibrated 1/4” ultrasonic microphone (Se-
ries 4135, Br¨
             uel & Kjær, Nærum, Denmark) in the free-field. The projector emitted
ten identical linear FM chirps with 2 ms duration from 110 kHz down to 10 kHz.
Each microphone channel on the array was then tested individually with the same set
of ten pulses from 10 cm. The projector’s distance and orientation were physically
constrained to minimize measurement error and acoustic foam was installed on the
array prior to taking the measurements.
     Time series data from the reference measurement, x[k], and each array mi-
crophone channel, yi [k], were processed to estimate the frequency spectrum of the
transduced signals. The FFT of each signal provided an estimate of the frequency
spectrum. Spectra from multiple pulses on a given microphone were averaged to-
gether. The transfer function, Hi (z), for each sensor and supporting electronics was
then calculated as


                                      Yi (z)
                           Hi (z) =          ,    i = 1, 2 . . . N              (4.3)
                                      X(z)

where Yi (z) is the frequency spectrum of each of N array microphone channels and
X(z) is the frequency spectrum of the reference microphone. Frequencies not covered
by the FM pulse contained only noise and were forced to unity.
     Given the magnitude and phase response of a linear time-invariant system, an
ARMA model can be created for that system to mimic or reverse its response. The
advantage of using such a model over a simpler all-zero model is that it can directly
model the physical resonances in a system using its poles while the zeros match nulls
in the response and reduce any residual error. For an equalizer the desired model
is the inverse of the system’s frequency response, HiEQ (z) = Hi−1 (z). In this case,
special care must be taken to ensure that the inverse filter remains stable or can
be made stable. The procedure to generate a zero-phase ARMA model is based off
an initial estimate using Prony’s method [30] followed by iterative refinement with a
frequency-domain Steiglitz-McBride algorithm [35]. Once the model coefficients are

                                             89
defined for each channel, data are equalized by passing through the zero-phase filter.
        An important specification for the acoustic measurement system is the full-
scale sound pressure level (SPL) at the face of the array, which is the equivalent RMS
sound pressure level that would produce saturation of the analog-to-digital converters.
Based on the calibration measurements and the derived voltage-to-acoustic conversion
factor, the full-scale sound pressure level at the face of the array is 127 dB SPL (re
20 µPa). For reference, acoustic source levels of some representative bat echolocation
signals are typically 110 dB SPL (re 20 µPa @ 0.1m) for aerial feeding bats such as E.
fuscus and 100 dB SPL for smaller “whispering bats” such as Artibeus jamaicensis [1,
26, 36]. The loudest bat species to have been reported, the lesser and greater bulldog
bats (Noctilio albiventris and Noctilio leporinus, respectively), produce echolocation
sounds up to 140 dB SPL (re 20 µPa @ 0.1m) [37]. Another important specification,
instantaneous dynamic range, is approximately 110 dB across the entire frequency
range including signal processing gain. A programmable gain up to 30 dB can be
applied to all channels to record less intense signals.


4.4       Results

4.4.1     Example Beam Pattern of a Circular Electrostatic Projector

The same electrostatic projector and FM waveform used for calibration was also used
to validate the acoustic beam measurements and post-processing functions. The 2 cm
diameter projector provides a symmetrical circular beam that is highly repeatable and
may be quantitatively compared with the expected beam from a theoretical model
of the transducer. For this validation measurement, the projector was moved normal
to the center of the array at 1 meter distance. A sampling rate of 235 kHz was used
during these measurements, which provided sufficient frequency coverage without
aliasing effects.


                                           90
Figure 4.4. Aspect view and contour plot of the reconstructed transmit beam pattern of a 2 cm
diameter transducer at its resonant frequency of 60 kHz. The transducer was centered normal to
the array at 1 m distance and emitted a 2 ms broadband linear FM pulse from 110 kHz down to
10 kHz. The dB units are normalized to the peak at the maximum response axis.


      The half-power beam width, β3dB , is defined as the angular width of the beam
pattern at the 3 dB cutoff points. As with any fixed-aperture transducer, the beam
width varies inversely proportional to frequency. β3dB was measured for the projector
along the horizontal axis to be 47.0◦ , 21.9◦ , and 12.3◦ at 30 kHz, 60 kHz, and 90 kHz.
Sidelobe peak levels were approximately 15 dB relative to the main lobe at 60 kHz.
Sidelobes could not be verified for these pulses at 30 kHz due to coverage limitations
and at 90 kHz due to the projector’s limited source level above 60 kHz. Figure 4.4
shows the reconstructed beam pattern for the transducer at its resonant frequency of
60 kHz.
      The theoretical model of a piston transducer with an infinite baffle is defined [38]
as


                                             91
                                                     Dω(θ)
                          0


                        −6
     Directivity (dB)


                        −12


                        −18
                                  10 kHz
                                  20 kHz
                                  40 kHz
                        −24       60 kHz
                                  80 kHz
                                 100 kHz
                        −30
                              −40 −30 −20 −10    0    10                 20   30      40
                                            Angle (deg)
Figure 4.5. Theoretical beam pattern of a piston transducer with 2 cm diameter in air. The beam
pattern is frequency dependent such that the beam width scales inversely proportional to frequency.
Side-lobes are also predicted by the model. The first and second side-lobes are approximately 17 dB
and 24 dB lower than the main lobe, respectively. The beam model is axial-symmetric and the main
response axis is normal to the transducer at all frequencies.


                                                                    !2
                                                  2J1 (π λd sinθ)
                                       Dω (θ) =                                              (4.4)
                                                    π λd sinθ

where Dω (θ) is the one-dimensional beam pattern at acoustic frequency ω and angle
θ, d is the diameter of the transducer, λ is the wavelength in the medium at frequency
ω, and J1 is the Bessel function of the first kind and order 1. This model is useful to
verify several characteristics of the measured beam pattern. Specifically, β3dB of the
main lobe can be quantitatively compared with measured results for a given frequency
and the sidelobe levels can be verified.
         Figure 4.5 shows the theoretical beam patterns for several different frequencies
of a piston transducer with an infinite baffle. An approximation to β3dB can be made

                                                   92
as follows:

                                                        
                                        −1             λ
                             β3dB = 2sin         0.514     .                   (4.5)
                                                       d

This model predicts β3dB = 34.3◦ , 16.9◦ , and 11.24◦ at 30, 60, and 90 kHz for a
2 cm diameter. Based on the projector’s measured β3dB , the data align well with
an effective aperture of 1.6 cm; 20% less than its physical aperture of 2.0 cm. This
discrepancy is likely due to a combination of the smaller diameter active components
internal to the membrane (sintered disk) and the added stiffness at the edges of the
transducer where the membrane is held securely in place. The beam pattern depends
directly on the wavelength of the sound, which in turn depends on the speed of sound
in the medium. Although sound speed does change with temperature, its sensitivity
is minimal at room temperature (1% for a 10◦ C change) and would not explain the
20% difference in beam width.


4.4.2   Example Beam Pattern of the Big Brown Bat, Eptesicus fuscus

Echolocation calls from the big brown bat, E. fuscus, were recorded and processed to
demonstrate the ability of the measurement system to record biosonar beam patterns.
Three bats were trained to perform a target detection task while stationary on a
platform 1 m from the array, and all emitted sonar signals from each trial were
recorded. The reconstructed beam pattern for one example call is shown in Figure 4.6
for the frequencies of 40 kHz, 60 kHz, and 80 kHz. Results appear comparable to
previous beam pattern measurements of E. fuscus [5, 6] and reasonably match the
theoretical beam widths of β3dB = 56.1◦ , 36.5◦ , and 27.2◦ at 40, 60, and 80 kHz
produced by a piston transducer with a 9.4 mm diameter.


                                           93
           A                                             B


                                  C


Figure 4.6. Aspect view and 6 dB contour plot of the reconstructed beam patterns for a single
E. fuscus transmit pulse. The frequency-dependent beam magnitudes at 40 kHz (a), 60 kHz (b),
and 80 kHz (c) are shown on a normalized magnitude (dB) scale. Beam widths at these frequencies
appear consistent with past measurements for this species and can be approximated by a circular
piston transducer with a fixed 4.7 mm radius. Color scale is used to reinforce the vertical axis.


4.5      Discussion
With this array system we have constructed a tool of fine spatial resolution, high
sampling rate, and rapid data collection that allows for investigations into the bat’s
dynamic sonar beam. The relatively recent development of MEMS technology allows
mass-production of low-cost sensors. The microphones used here have an extremely
small acoustic aperture that ensures omni-directionality; however, current MEMS
microphones introduce significant variability in the frequency response that must be
equalized through careful calibration. Validation of the array was performed with
a custom-built piston transducer and compared against a theoretical model. An
example beam pattern from a single E. fuscus call was shown to demonstrate the


                                               94
usefulness of the array in capturing biosonar beams. The array is primarily intended
to be used with echolocating bats in a controlled laboratory environment, although
measurements in the field would be possible after modifications to the mechanical
assembly. As described in Sec. 4.4, beam measurements are being made of bats
performing on a static platform. Future experiments are being planned with bats
flying through an obstacle course.
      Systems containing a large number of sensors and supporting electronics are
inherently more complex. Every additional sensor channel and electronic component
reduces the mean time between failure so it is common for multi-element arrays to
exhibit failed or degraded components over their supported lifetime. Two possible
solutions to this problem are to 1) ensure a high degree of quality in component
selection and manufacturing processes, and 2) design for maintainability. Our system
aims to be a relatively low-cost solution to high-resolution acoustic measurement.
Although high-quality ultrasonic microphones are commercially available, it is not
yet feasible to use hundreds of these in a dense array due to their significant cost. To
address reliability and repeatability concerns, the microphone boards were assembled
by automation rather than by hand. The array was also designed with maintainability
in mind by using pluggable circuit boards containing the microphone preamplifiers
and are easily replaced.
      Another difficulty in developing measurement systems with high channel counts
is that massive amounts of data must be simultaneously stored, processed, validated,
and analyzed. This is where the parallel computing platform (i.e. FPGA) outper-
forms even the fastest digital signal processor. Data storage and throughput have
seen consistent growth in computing and this has been a necessary enabler for the
amounts of digital data captured by a measurement system with 100 or more chan-
nels. Validating and analyzing large amounts of information require automation to
perform data reduction and signal processing tasks. The trend toward more measure-
ment sensors will continue to be facilitated by keeping pace with advances in sensing


                                          95
and computational technologies.
     Beyond the hardware, advanced signal processing techniques are used to ap-
proach the beam reconstruction in a novel way. Newly developed algorithms to
separate multiple harmonic components of biosonar signals were used. These tech-
niques improved signal-to-noise ratio and allow more accurate tracking of energy
across time and frequency. Beam pattern measurements are conventionally performed
with Fourier analysis; however, separating multiple components allows other decom-
positions (such as Hilbert spectral analysis) to be used that may be better suited
for certain signals [29]. ARMA and MA modeling were introduced to better equalize
the response of each microphone and reverse frequency dependent absorption effects.
These zero-phase filters eliminate frequency dependent group-delay that is inherent
in any linear, time-invariant model. Although non-causal, this approach works well
for post-processed data and reduces phase errors to within machine precision.
     Experimental design played a significant role in the measurement accuracy.
Initial data collected with a static platform showed significant spatial interference
patterns in the form of frequency-dependent vertical notches. It was found that direct
path signals received at the array were combined with a slightly delayed interfering
reflection off the platform itself. Modifications were made to the platform to reduce
the length, tilt forward, and cover with more attenuative fabric. The echolocation
experiment was also modified to ensure that animals were echolocating from the front
of the platform rather than the rear. Bats naturally perch upside-down, so it may
be feasible to eliminate platform echoes by training some species to echolocate while
inverted from a wire.
     The goal of any measurement system is to sense without interfering with the
phenomenon being sensed. The microphone array was primarily designed to look at
beam pattern emissions from echolocating bats during psychophysical experiments.
The potential behavioral disturbance of introducing a large panel directly in front of
the area to be measured was of concern. To mitigate this potential problem, the array


                                         96
was covered with acoustic foam minimizing backscatter from the array. This proved to
be effective and animals were trained successfully in a detection task on a stationary
platform at 1 m facing the array. Experimental design warranted that the detection
object was located between the array and the stationary platform. Preliminary data
demonstrate that proximity to the array does not impact echolocation ability.
      The array location during in-flight experiments also needs to be carefully consid-
ered. Many experiments in a controlled flight room are designed to test the animal’s
echolocation ability while surrounded by dense clutter. The unfortunate problem is
that physical objects used to alter the echolocation behavior also interfere with the
free-field reception of the transmit beam. Only careful experimental design and data
analysis can ensure no physical objects are interfering with the beam measurement.
      Although E. fuscus was previously found to emit two distinct ventral lobes,
data collected with this array does not contain any evidence for multiple lobes. This
does not constitute sufficient evidence against E. fuscus emitting dual-lobes. Rather,
this characteristic was simply not observed under the circumstances of one particular
stationary echolocation task. Given prior evidence of adaptive beam patterns, it
would not be surprising if this ventral lobe was selectively used where beneficial to
echolocation and suppressed otherwise. Additional experiments will be carried out to
explore this discrepancy further.
      Future work with this measurement system is already underway. By combining
high-resolution acoustic measurements of bats’ transmit beams with high-resolution
and high-speed video during psychophysical experiments, we are investigating the
nature of the dynamics of bat echolocation and the relationship of beam adjustments
to target detection, localization, and tracking.


                                          97
4.6     Acknowledgments
The authors thank Leland Jackson (U. Rhode Island) for many interesting discussions
on ARMA modeling and John Buck (U. Mass. Dartmouth) for helpful suggestions
on beam pattern reconstruction. This work was supported by internal investment
funding from the Naval Undersea Warfare Center, Division Newport, RI, ONR Grant
No. N00014-09-1-0691, and NSF Grant No. DBI-1202833.


References
 [1] G. Neuweiler, The Biology of Bats (Oxford University Press, New York, 2000),
     p. 320.
 [2] R. M¨ uller, “Numerical analysis of biosonar beamforming mechanisms and strate-
     gies in bats”, J. Acoust. Soc. Am. 128, 1414–1425 (2010).
 [3] D. Griffin, Listening in the Dark, The Acoustic Orientation of Bats and Men
     (Cornell University Press, London, 1958), p. 415.
 [4] J. A. Simmons, “Acoustic Radiation Patterns for the Echolocating Bats
     Chilonycteris rubiginosa and Eptesicus fuscus”, J. Acoust. Soc. Am. 46, 1054–
     1056 (1969).
 [5] D. Hartley and R. Suthers, “The sound emission pattern of the echolocating bat,
     Eptesicus fuscus”, J. Acoust. Soc. Am. 85, 1348–1351 (1989).
 [6] K. Ghose and C. Moss, “The sonar beam pattern of a flying bat as it tracks
     tethered insects”, J. Acoust. Soc. Am. 114, 1120–1131 (2003).
 [7] K. Ghose, C. Moss, and T. Horiuchi, “Flying big brown bats emit a beam with
     two lobes in the vertical plane”, J. Acoust. Soc. Am. 122, 3717–3724 (2007).
 [8] A. Surlykke, S. Boel Pedersen, and L. Jakobsen, “Echolocating bats emit a highly
     directional sonar sound beam in the field”, Proc. R. Soc. B 276, 853–860 (2009).
 [9] Y. Yovel, B. Falk, C. F. Moss, and N. Ulanovsky, “Optimal Localization by
     Pointing Off Axis”, Science 327, 701–704 (2010).
[10] N. Matsuta, S. Hiryu, E. Fujioka, Y. Yamada, H. Riquimaroux, and Y. Watan-
     abe, “Adaptive beam-width control of echolocation sounds by CF-FM bats, Rhi-
     nolophus ferrumequinum nippon, during prey-capture flight”, J. Exp. Biol. 216,
     1210–1218 (2013).

                                         98
[11] L. Jakobsen, J. M. Ratcliffe, and A. Surlykke, “Convergent acoustic field of view
     in echolocating bats”, Nature 493, 93–96 (2014).
[12] L. Jakobsen, S. Brinkløv, and A. Surlykke, “Intensity and directionality of bat
     echolocation signals.”, Front. Physiol. 4, 1–9 (2013).
[13] L. Jakobsen and A. Surlykke, “Vespertilionid bats control the width of their
     biosonar sound beam dynamically during prey pursuit”, Proc. Natl. Acad. Sci.
     U.S.A. 107, 13930–13935 (2010).
[14] L. Jakobsen, E. K. V. Kalko, and A. Surlykke, “Echolocation beam shape in
     emballonurid bats, Saccopteryx bilineata and Cormura brevirostris”, Behav. Ecol.
     Sociobiol. 66, 1493–1502 (2012).
[15] A. Surlykke, L. Jakobsen, E. K. V. Kalko, and R. A. Page, “Echolocation in-
     tensity and directionality of perching and flying fringe-lipped bats, Trachops
     cirrhosus (Phyllostomidae)”, Front. Physiol. 4, 1–9 (2013).
[16] R. M¨uller, “A numerical study of the role of the tragus in the big brown bat”,
     J. Acoust. Soc. Am. 116, 3701–3712 (2004).
[17] R. M¨
         uller and J. C. T. Hallam, “Knowledge mining for biomimetic smart antenna
     shapes”, Rob. Autom. Syst. 50, 131–145 (2005).
[18] D. Vanderelst, F. De Mey, H. Peremans, I. Geipel, E. Kalko, and U. Firzlaff,
     “What Noseleaves Do for FM Bats Depends on Their Degree of Sensorial Spe-
     cialization”, PLoS ONE 5, e11893 (2010).
[19] D. Vanderelst, J. Reijniers, J. Steckel, and H. Peremans, “Information Gener-
     ated by the Moving Pinnae of Rhinolophus rouxi: Tuning of the Morphology at
     Different Harmonics”, PLoS ONE 6, e20627 (2011).
[20] D. Vanderelst, R. Jonas, and P. Herbert, “The furrows of Rhinolophidae revis-
     ited”, J. R. Soc. Interface 9, 1100–1103 (2012).
[21] D. Vanderelst, Y. Lee, I. Geipel, E. Kalko, Y. M. Kuo, and H. Peremans, “The
     noseleaf of Rhinolophus formosae focuses the Frequency Modulated (FM) com-
     ponent of the calls”, Front. Physiol. 4, 1–8 (2013).
[22] P. W. Moore, L. A. Dankiewicz, and D. S. Houser, “Beamwidth control and angu-
     lar target detection in an echolocating bottlenose dolphin (Tursiops truncatus)”,
     J. Acoust. Soc. Am. 124, 3324–3332 (2008).
[23] L. N. Kloepper, P. E. Nachtigall, M. J. Donahue, and M. Breese, “Active echolo-
     cation beam focusing in the false killer whale, Pseudorca crassidens”, J. Exp.
     Biol. 215, 1306–1312 (2012).
[24] J. Starkhammar, M. Amundin, J. Nilsson, T. Jansson, S. A. Kuczaj,
     M. Almqvist, and H. W. Persson, “47-channel burst-mode recording hydrophone
     system enabling measurements of the dynamic echolocation behavior of free-
     swimming dolphins”, J. Acoust. Soc. Am. 126, 959–962 (2009).

                                         99
[25] J. W. Shaffer, D. Moretti, S. Jarvis, P. Tyack, and M. Johnson, “Effective beam
     pattern of the Blainville’s beaked whale (Mesoplodon densirostris) and impli-
     cations for passive acoustic monitoring”, J. Acoust. Soc. Am. 133, 1770–1784
     (2013).
[26] W. Au and J. Simmons, “Echolocation in dolphins and bats”, Phys. Today 60,
     40–45 (2007).
[27] M. Gillette and H. Silverman, “A Linear Closed-Form Algorithm for Source
     Localization From Time-Differences of Arrival”, IEEE Signal Process. Lett. 15,
     1–4 (2008).
[28] O. Akay and G. Boudreaux-Bartels, “Fractional convolution and correlation via
     operator methods and an application to detection of linear FM signals”, IEEE
     Trans. Signal Process. 49, 979–993 (2001).
[29] J. DiCecco, J. E. Gaudette, and J. A. Simmons, “Multi-component separation
     and analysis of bat echolocation calls”, J. Acoust. Soc. Am. 133, 538–546 (2013).
[30] L. B. Jackson, Digital Filters and Signal Processing with MATLAB Exercises,
     3rd ed. (Klewer Academic, Norwell, MA, 1995), pp. 323–372.
[31] “ANSI S1.26-1995 (R2009) Method for Calculation of the Absorption of Sound
     by the Atmosphere”, American National Standards Institute, New York (2009).
[32] W. I. Futterman, “Dispersive body waves”, J. Geophys. Res. 67, 5279–5291
     (1962).
[33] T. Bobach and G. Umlauf, “Natural Neighbor Concepts in Scattered Data In-
     terpolation and Discrete Function Approximation”, in in proceedings of Visual-
     ization of Large Unstructured Data Sets, 23–35 (2007).
[34] J. A. Simmons, M. B. Fenton, W. R. Ferguson, M. Jutting, and J. Palin, Appara-
     tus for research on animal ultrasonic signals (Royal Ontario Museum, Toronto,
     1979), p. 10.
[35] L. B. Jackson, “Frequency-domain Steiglitz-McBride method for least-squares
     IIR filter design, ARMA modeling, and periodogram smoothing”, IEEE Signal
     Process. Lett. 15, 49–52 (2008).
[36] S. Brinkløv, E. K. V. Kalko, and A. Surlykke, “Intense echolocation calls from
     two “whispering” bats, Artibeus jamaicensis and Macrophyllum macrophyllum
     (Phyllostomidae)”, J. Exp. Biol. 212, 11–20 (2009).
[37] A. Surlykke and E. K. V. Kalko, “Echolocating Bats Cry Out Loud to Detect
     Their Prey”, PLoS ONE 3, e2036 (2008).
[38] R. Urick, Principles of Underwater Sound, 3rd ed. (McGraw-Hill, New York,
     1983), p. 43.


                                         100
Chapter 5

Modeling Bio-Inspired Broadband
Sonar for High-Resolution Angu-
lar Imaging

Abstract
Echolocating mammals perceive images of targets with hyper-resolution and navi-
gate seamlessly through obstacles in complex acoustic environments. The biological
solution to imaging with sound is vastly different from man-made sonar. The most
prominent difference is that instead of imaging with narrow beams and large aper-
tures, bats ensonify a large spatial region and exploit broadband echo information to
acoustically focus with about one degree of angular resolution. Using the additional
information in the spectrum, angular localization may therefore be redefined as a
spectral pattern matching problem. By imaging with wider beams this remarkable
performance requires only a single broadband transmitter and two receive elements.
Our computational modeling work provides new insight into the salient spatial in-
formation encoded by the bat’s auditory system. Although information can increase
with the highly complex baffle structures found in biological sonar, we show they
are not theoretically required for good spatial resolution. Replicating bio-inspired
acoustic processing techniques in man-made systems can reduce sonar array aperture
requirements by two orders of magnitude for a variety of both aerial and underwater


                                        101
acoustic sensing applications. Modeling and simulation results show the feasibility
of designing a bio-inspired broadband sonar system for a compact high-resolution
acoustic imaging solution. Also presented is a method for quantifying the theoretical
limit to the resolving power for a given set of operating conditions and directivity
patterns.


5.1     Introduction
This chapter begins by describing the environmental acoustics and transducer beam
patterns relevant to biosonar. Then we characterize the echo spectrum from reflective
scatterers in the range-azimuth plane based on these models. This numerical result
effectively quantifies the unique information contained in an echo arriving from any
point in the range-azimuth plane. In an appendix, the physics-based model is adapted
to the underwater environment. Elevation is omitted for simplicity, but including the
additional dimension is straightforward. Our approach serves as a computational
template for the design of a sonar system using micro-aperture broadband acoustic
technology, or µBAT.


5.2     Modeling Broadband Acoustic Information
There are three fundamental ways in which broadband acoustic signals are trans-
formed in the context of biosonar: 1) the physical environment, 2) transducer direc-
tivity patterns, and 3) reflective scatterer structure and composition. Each of these
is explored independently and integrated at the end of this section.


                                         102
5.2.1     Environmental Acoustics

5.2.1.1    The Transformation of Broadband Information in the Physical Envi-
           ronment

The first aspect to be modeled is the physical environment – more specifically its
impact on broadband signal propagation. A physics-based model was constructed
that accounts for the two most prominent characteristics of acoustic transmission
loss in the medium: geometrical spreading and acoustic absorption.
        Geometrical spreading losses are caused by acoustic waves propagating over an
increasing volume. Since the amount of acoustic energy is finite, it spreads evenly
over the surface area of the expanding wavefront. In the simplest case of free-field
propagation, sound spreads spherically in all directions and is a quadratic function of
distance, d, where the energy density is proportional to 1/d2 (a 6 dB energy loss per
doubling of distance). There are cases in which sound energy cannot spread evenly
in all directions (e.g. surface boundary layers, physical obstructions), but spherical
spreading is the upper bound on this type of transmission loss. Due to its frequency
independence, we consider here only free-field propagation and direct most of our
attention to absorption losses.
        Absorption of sound by the atmosphere is highly frequency dependent and gen-
erally imposes a low-pass filter effect on broadband acoustic waves. Unlike geometrical
spreading, absorption is an exponential function of distance proportional to 10−dα/10
where distance, d, is specified in meters, and the absorption coefficient, α, is defined
in dB/m units. It is caused by a combination of factors that become dominant in dif-
ferent frequency regions. The existing models of absorption are complicated functions
of several environmental parameters including ambient temperature, T , atmospheric
pressure, ρ, and relative humidity, hr [1, 2, 3, 4]. The equations that describe these
models have been rearranged in this section to clarify and make obvious their depen-
dence on frequency. Doing this allows us to determine which aspects to absorption
are most important to echolocation, and also quantify the sensitivity of absorption

                                          103
to changing environmental parameters.
      The primary components affecting sound absorption in the air are attributed to
“classical” physics (e.g. viscosity, heat conduction, and diffusion) and the molecular
interactions with oxygen and nitrogen. Given T , ρ and hr , the equation for the
attenuation coefficient in dB/m can be written as a function of frequency:


                         α(f ) = αcr (f ) + αvib,O (f ) + αvib,N (f ).                             (5.1)

Here, αcr is the absorption component due to classical physics and molecular rota-
tional relaxation. αvib,O and αvib,N are the components due to molecular vibrational
relaxation of oxygen and nitrogen, respectively. Figure 5.1 shows how these compo-
nents contribute to the total absorption coefficient.
      Broadband echolocation signals occur on a rapid-time scale and over relatively
short distances. Therefore, we can treat the environmental parameters as constants
and rewrite Equation 5.1 to be strictly a function of frequency:


                                               f 2 FrO                          f 2 FrN
                                                                                        
                           2
              α(f ) = α
                      ˆ cr f + α
                               ˆ vib,O                2
                                                              +α
                                                               ˆ vib,N                 2
                                                                                               .   (5.2)
                                             f 2 + FrO                        f 2 + FrN

Parameters FrO and FrN are the scaled relaxation frequencies for oxygen and nitrogen
and also depend on T , ρ and hr . The individual α
                                                 ˆ components are computed as


                                                  1
                                               T 2               ρ −1
                          αˆ cr = 1.598e-10 ×                         ,                            (5.3)
                                               T0                ρ0
                                              − 52
                                              T
                       ˆ vib,O = 1.107e-1 ×
                       α                                         e−2239.1/T ,                      (5.4)
                                              T0
                                              − 52
                                              T
                       ˆ vib,N = 9.277e-1 ×
                       α                                         e−3352.0/T ,                      (5.5)
                                              T0

where the constant T0 is the standard temperature of 293.15◦ K and ρ0 is the reference
pressure of 1 atm. The equations for FrO and FrN are:


                                                104
                                                      0.02h + h2
                                                                 
                          ρ
                  FrO   =          24 + 4.04e4                                      (5.6)
                          ρ0                          0.391 + h
                                       − 12                       
                                                                           −1
                                                                               !
                          ρ        T                           4.170 1−T /T0 3
                  FrN   =                       9 + 280h × e                        (5.7)
                          ρ0       T0

where T0 and ρ0 are defined as above and h is the molar concentration of water vapor
computed from the estimates of relative humidity and temperature.
       By removing the dependence on frequency in Equation 5.2, these components
reduce to constants. At nominal environmental conditions (T = 20◦ C, hr = 50%,
and ρ = 1 atm), we calculate α
                             ˆ cr = 1.598e-10, α
                                               ˆ vib,O = 5.334e-5, α
                                                                   ˆ vib,N = 1.004e-5,
Fr,N = 332.1 Hz, and Fr,O = 35.45 kHz. For frequencies below about 1 kHz αvib,N
dominates. Between approximately 1 kHz and 100 kHz αvib,O is dominant and above
100 kHz αcr is dominant. Figure 5.2 plots the absorption coefficient vs. frequency
calculated at different temperatures between 0◦ C and 40◦ C at 1 atm.


5.2.1.2   Application of Broadband Transmission Loss to the Active Sonar Equa-
          tion

We are interested in understanding the total spectral effect on a broadband signal
as it propagates over distance. Therefore, we define the transmission loss, T L, as a
function of frequency and distance based upon spherical spreading and absorption
(see Fig. 5.3):


                          T L(f, d) = T Lspr (d) + T Labs (f, d)                    (5.8)

with


                                                  105
                                                                    Components of absorption (T=20.0C, hr=50%)
                                                           4
                                                          10


             Absorption Coefficient α / ρs (dB / m⋅atm)
                                                           2
                                                          10


                                                           0
                                                          10


                                                           −2
                                                          10


                                                           −4
                                                          10
                                                                                                                  αcr
                                                                                                                  αv,O
                                                           −6
                                                          10                                                      αv,N
                                                                                                                  αtot
                                                           −8
                                                          10
                                                                1     2             3            4            5           6
                                                               10   10            10           10            10          10
                                                                             Frequency/pressure (Hz/atm)

Figure 5.1. The total absorption effect in air is the combined result of three individual components
that dominate in different frequency regions. Investigating these components through superposition
enhances our understading and allows us to organize this complex process into simpler models. Fr,N
and Fr,O are the parameters that determine the cutoff frequencies where α  ˆ vib,N and α
                                                                                       ˆ vib,O saturate.
From this separation of parameters, we see that αcr and αvib,O are the dominant characteristics in
ultrasonic echolocation signals.


                                                                                              d
                                                                      T Lspr (d) = 20 log10      ,     and                     (5.9)
                                                                                              d0
                                                                    T Labs (f, d) = α(f )(d − d0 ).                           (5.10)


d0 is taken to be the reference distance for sound pressure level (SPL re 1µPa @ d0 )
in the sonar equations [5]. For the relatively short distances considered in bat sonar,
d0 = 0.1 m is generally considered a reasonable value. When modeling distances
d  d0 , Equation 5.10 typically reduces to


                                                                          T Labs (f, d) ≈ α(f ) × d.                          (5.11)


                                                                                        106
                                                                                                     Atmospheric absorption coef. (T=0−40C, hr=50%)
                                                                                            3
                                                                                           10


                                          Absorption Coefficient α / ρs (dB / m⋅atm)
                                                                                                                                                                                                       40◦ C
                                                                                                                                                                                                       35◦ C
                                                                                                                                                                                                       30◦ C
                                                                                            2                                                                                                          25◦ C
                                                                                           10
                                                                                                                                                   0◦ C
                                                                                                                                                   5◦ C
                                                                                            1                                                      10◦ C
                                                                                           10                                                      15◦ C
                                                                                                                                                   20◦ C


                                                                                            0
                                                                                           10

                                                                                            −1
                                                                                           10

                                                                                            −2
                                                                                           10

                                                                                            −3
                                                                                           10
                                                                                                 3                                                         4                                               5                                                                    6
                                                                                                10                                                   10                                           10                                                                     10
                                                                                                                                           Frequency/pressure (Hz/atm)

Figure 5.2. Absorption vs. frequency at 50% relative humidity plotted for temperatures between
0◦ C and 40◦ C in steps of 5◦ . The absorption coefficient curves are normalized to standard atmo-
spheric pressure (1 atm.). Echolocating bats operate their sonar at ultrasonic frequencies within the
transition region of αvib,O and αcr .

                                        Spherical Spreading Loss                                                                                     Frequency Dependent Absorption Loss in Air                                                             One−Way Transmission Loss in Air
                             40                                                                                                           40                                                                                               80
                                                                                                                                                  10 kHz                                                                                           10 kHz
                             35     A                                                                                                     35
                                                                                                                                                  15 kHz
                                                                                                                                                  22 kHz                                          B                                        70
                                                                                                                                                                                                                                                   15 kHz
                                                                                                                                                                                                                                                   22 kHz                                      C
                                                                                                                                                  32 kHz                                                                                           32 kHz
                             30                                                                                                           30      46 kHz                                                                                   60      46 kHz
 Attenuation (dB re 0.1 m)


                                                                                                              Attenuation (dB re 0.1 m)


                                                                                                                                                                                                               Attenuation (dB re 0.1 m)


                                                                                                                                                  68 kHz                                                                                           68 kHz
                                                                                                                                                 100 kHz                                                                                          100 kHz
                             25                                                                                                           25                                                                                               50


                             20                                                                                                           20                                                                                               40


                             15                                                                                                           15                                                                                               30


                             10                                                                                                           10                                                                                               20


                             5                                                                                                            5                                                                                                10


                             0 −1                                                      0                  1
                                                                                                                                          0 −1                            0                            1
                                                                                                                                                                                                                                           0 −1                             0                       1
                             10                               10                                         10                               10                             10                           10                                   10                             10                       10
                                                         Distance (m)                                                                                               Distance (m)                                                                                     Distance (m)


Figure 5.3. (a) Spherical spreading loss is a quadratic function of distance, T Lspr (d). (b) Frequency
dependent absorption losses are a exponential function of distance, T Labs (f, d). Nominal values of
20◦ C, 50% rh, and 1 atm. were chosen for environmental parameters in calculating the absorption
coefficient. (c) The combined transmission loss components due to both spreading and absorption,
T L(f, d).


                                    Applying the sonar equation for an active broadband system, the echo strength
(ES) at the face of the receive sensors can be estimated as a function of frequency
and distance as


                                                                                                                                                                   107
                                                                Frequency Dependent Echo Strength for TS = 0 dB
                                                       0


                                                     −20


            Relative Echo Intensity (dB re 0.1 m)
                                                     −40


                                                     −60


                                                     −80

                                                               10 kHz
                                                    −100       15 kHz
                                                               22 kHz
                                                               32 kHz
                                                    −120       46 kHz
                                                               68 kHz
                                                              100 kHz
                                                    −140 −1                             0                          1
                                                       10                             10                          10
                                                                               Target Range (m)

Figure 5.4. Relative echo strength (ES) vs. distance at different frequencies for an ideal 0 dB
point reflector. Spreading losses are dominant at close range due to the exponential vs. quadratic
scaling with distance. Beyond approximately 0.5 to 1.0 m of distance traveled absorption becomes
significant across the broad range of frequencies applicable to biosonar.


                                                              ES(f, d) = SL(f ) − 2T L(f, d) + T S(f )                 (5.12)

where knowledge of source level (SL) and target strength (TS) for each scattering
reflector is either given a priori or estimated through iteration. Figure 5.4 shows the
relative echo strength for an ideal point reflector having a flat 0 dB TS across the entire
spectrum. The consequence is that there exists a unique transfer function between
the source and each point scatterer located at some distance, d, based on the two-
way transmission loss in the physical environment. This range-dependent transfer
function effectively predicts the physical environment’s impact on any broadband
signal propagating through the medium.


                                                                                  108
5.2.2     Transducer Directivity Patterns

5.2.2.1    Broadband Spectral Information in Conventional Transducers

Another important aspect of modeling broadband acoustic information is the directiv-
ity pattern. Directivity (or beam) patterns of a sonar system are explicitly controlled
by the transducer construction for signal transmission and/or reception. Whereas
transmission losses impose a range-dependent frequency spectrum, a transducer’s di-
rectivity pattern imposes an angular-dependent frequency spectrum. These patterns
may be designed to be as simple or complex as is necessary to achieve the perfor-
mance goals. Most man-made sonar systems utilize arrays of many basic piezoelectric
elements to construct narrow beams. Directivity is achieved by the relative position-
ing of these elements and applying different amplitude scaling factors and/or phase
delays between them [6]. These elements are typically capable of both transmitting
and receiving acoustic waves, but are sometimes used exclusively for one mode in
concert with another transducer array. An important distinguishing characteristic
of many such systems is that they are designed to operate over a relatively small
bandwidth-to-center-frequency ratio. Consequently, standard beam patterns and ar-
ray geometries are either designed for one particular frequency or constrained to have
a constant beam width over many frequencies.
        Directivity patterns are defined by the magnitude response vs. angle. The
phase response of individual elements is almost always ignored except when check-
ing for mechanical consistency between elements. This is because absolute phase is
irrelevant for conventional beamforming – regardless of the exact scatterer distance,
non-stationarity and non-linearities in the medium cause the acoustic signal’s phase
to converge on a uniform random variable (i.e. a non-coherent receiver). Although the
phase variation with angle may be negligible under usual circumstances, it inherently
exists in any real system with mechanical damping. Ignoring phase is lamentable
when considering the broadband directivity patterns that are ubiquitous in biosonar.


                                         109
The fundamental idea is not that echolocating animals have coherent receivers; it is
that the broadband waveforms traveling through the physical medium retain their
relative phase over the wide range of frequencies. When the biosonar signals emitted
contain multiple harmonics, this is known as harmonic coherence [7].
      The directivity of a transducer element is intimately related to its geometrical
structure and operating frequencies. For a transducer with fixed physical aperture, the
beam pattern will scale with frequency due to the close interaction between aperture
and wavelength. Piezoelectric elements are usually constructed out of simple shapes
such as cylinders or rectangular blocks. When the entire crystal surface vibrates in
unison along the acoustic axis, the instantaneous pressure and particle velocity waves
propagate outward from each point on the surface at a constant speed. The group
interference of this continuum of waves causes the angular dependent amplitude which
is manifested as the far-field directivity pattern. Ignoring any backward propagation
(assuming an infinite baffle), the theoretical directivity pattern for a piston transducer
is [8, Ch. 11]

                                                             !2
                                          2J1 (k d2 sin θ)
                             D(k, θ) =                            ,                (5.13)
                                            k d2 sin θ

where J1 is the Bessel function of the first kind and order 1, k = 2π/λ is the acoustic
wavenumber, d is the piston’s diameter, and θ is the off-axis angle. The defining
parameter of the piston’s directivity pattern is the aperture to wavelength ratio, d/λ.
This particular beam response is radial symmetric as would be expected. Figure 5.5
shows the transducer’s amplitude vs. angle in air for a 1 cm diameter piston across
the frequency range of 10 to 100 kHz. Long wavelengths relative to the aperture
will have a broad beam, while short wavelengths will have narrower beams and many
sidelobes. Plotting the linear amplitude rather than magnitude emphasizes that each
alternating sidelobe exhibits phase reversal by 180◦ .


                                           110
                                                             Acoustic Directivity Pattern in Air (d=1cm)
                                                           10 kHz
                                                           15 kHz
                                               1           22 kHz


             Normalized Pressure Amplitude
                                                           32 kHz
                                                           46 kHz
                                              0.8          68 kHz
                                                          100 kHz

                                              0.6


                                              0.4


                                              0.2


                                               0


                                             −0.2
                                                    −80    −60      −40   −20         0     20   40   60   80
                                                                                Angle (°)

Figure 5.5. Theoretical directivity pattern for a piston transducer in air with a fixed circular
aperture of 0.94 cm. Low frequencies remain omni-directional since λ  d, whereas high frequencies
contain sidelobes that alternate between positive and negative amplitudes. If the wavenumber,
k, is made complex it can account for damping in the mechanical system and phase will become
continuous between reversals [9, p. 145].


5.2.2.2   Bio-Acoustic Baffle Structures and Implications for Modeling

In biosonar, transmitted patterns are defined by the geometry of the mouth or nose-
leaf in different species in bats [10] and the melon in odontocetes. Received directivity
patterns arise through a complex yet minimal set of acoustic baffles (two ears for bats
and mandibular structure in odontocetes [11, 12, 13, 14]) that change the magnitude
and phase response dramatically over frequency and angle. No closed form expres-
sions are available for the directivity pattern of such complex baffles; however, finite
element methods have successfully predicted the directivity of many biosonar struc-
tures [15, 16, 11]. The transmit directivity pattern in azimuth of E. fuscus has been
approximated by a piston transducer of 9.4 mm diameter with reasonable matching of
3 dB beam widths over most of the frequency range. To date, no directivity patterns
have been published that rely on the finite element method applied to the oral cavity.


                                                                                111
      Obliquely truncated horn models appear to be good approximations of receiv-
ing ears in biosonar [17]. Acoustic horns amplify sound produced or received and
have a frequency dependent magnitude response along the radial axis [18]. Angular
directivity of a horn is similar to a piston transducer model where higher frequencies
have a narrower main lobe and sidelobes that scale inward (Figure 5.6). A defining
characteristic of the truncation angle is that at low frequencies the MRA shifts from
off-axis, normal to the angle of trancation, and toward the radial cone axis at higher
frequencies. This characteristic is present in the azimuthal receive measurements of
E. fuscus (refer back to Fig. 1.2).
      In addition to the magnitude response, phase response is also present. The
presence of phase spatial structure in an acoustic baffle makes intuitive sense, because
the sound path through the acoustic baffle to the pressure sensing element will be
dependent on the angle of incidence and wavelength. The difference in sound path
length vs. angle may be relatively large compared to the wavelengths (e.g. E. fuscus
ear length is about 8 mm; λ = 3.4 mm at 100 kHz). By reciprocity, the same
arguments would apply for an acoustic baffle used in sound transmission.
      What is interesting here is not the fact that echolocating animals have compli-
cated baffle structures nor the distinct directivity patterns they provide, but instead
what they can achieve by having a spectral pattern that is unique across angle. Al-
though biosonar beam patterns are typically complicated functions of frequency and
angle, we later show that the concept of spatial imaging through spectral pattern
matching can actually be performed using standard piezoelectric transducers. To
demonstrate this, in Section 5.3 a simple biosonar array of three circular aperture
piston transducers can achieve fine angular resolution without the convoluted struc-
tures found in biological sonar systems.


                                           112
   A                                                        B                      Peak Magnitude Level


                                           Magnitude (dB)
                  `m                                          0
                                                            −10
                                                            −20
    β                                                       −30
                                                                  10    20   30   40    50     60    70    80   90      100
                                                                                  Maximum Response Axis
                                      L                      40                                                      elevation
                                                                                                                     azimuth


                                           Angle (deg)
                                                             20

                                                              0
                    α
                                                            −20

                                                                  10    20   30   40     50    60     70   80   90      100
                   `t                                                                  Frequency (kHz)

   C                                                                    D


Figure 5.6. Example beam pattern data measured from an obliquely truncated horn. (a) The
geometry of a simple truncated horn can be described by several parameters. `t and `m are the
diameters of the throat and mouth before truncation. L is the length of the horn, α is the conical
angle, and β is the angle of truncation. (b) Data were collected with a truncated horn constructed
out of a flexible rubber sheet (`t = 0.4 cm, `t = 3 cm, α = 20◦ , β = 45◦ ). A projector at the throat
of the horn emitted linear FM chirps from 100 to 10 kHz and a mechanically aligned microphone
received the signal at 3◦ increments in azimuth and elevation. The mouth of the horn acts as a
circular aperture of diameter, `m , that can be closely approximated as a piston transducer response
and a frequency-dependent magnitude defined by the acoustic gain of the horn’s mouth to throat
ratio. The main response axis (MRA) of a standard conical horn would remain along the cone’s
radial axis (0◦ ) for all frequencies; however, for an obliquely truncated horn the MRA will shift
off-axis, normal to the truncation at low frequencies [18]. The data support this finding, but also
show that the harmonic frequency shifts the MRA to match the fundamental. (c) The beam’s
magnitude response of the constructed horn at 47.6 kHz shows a clear main lobe centered around
16◦ in elevation. (d) Interestingly, the phase response shows a significant amount of spatial structure,
which was unexpected. In the main-lobe, phase varies only slightly whereas off-axis the phase varies
significantly across frequency. Measured acoustic horn data was provided courtesy of Mittu Pannala
and Rolf M¨  uller of Virginia Tech.


                                                                  113
5.2.3     Reflective Scatterer Structure and Composition

An ideal point scatterer reflects all incident energy with a flat magnitude spectrum
and zero phase and group delay. Real acoustic objects are usually modeled as having
multiple ideal point reflectors and a constant target strength (the ratio of attenuation
or gain of the reflected to incident energy) that may also depend upon the aspect
angle [19]. This model is reasonable when 1) the object consists of one or more
dominant points of specular reflection or surface protuberances having some spatial
extent, and 2) the frequencies of interest have a relatively small bandwidth-to-center
frequency ratio (e.g. most man-made active sonar systems). In the case of a broad-
band biosonar system, the first assumption may be reasonable; a good example would
be the wingtips of an insect. However, the second assumption requires more care-
ful consideration and the frequency dependence of target strength must be further
examined.
        According to theory, any convex surface that is rigid and smooth reflects energy
independent of frequency if the following conditions are met [5, p. 291]:

   • ka1 , ka2  1, and

   • the object is in the acoustic farfield.

Here, k = ω/c = 2π/λ is the wave number and a1 , a2 are the principal radii of the
convex curvature. For frequencies down to 20 kHz in air, this would require a surface
with radii a1 and a2  2.7 mm. To satisfy the second constraint, a 1 cm diameter
transducer operating up to 100 kHz means the object must be no closer than the
Fraunhofer distance [9, Ch. 8] of 2d2 /λ ≈ 6 cm. Under these conditions the target
strength can be expressed as


                                                    a1 a2
                                   T S = 10 log10         .                       (5.14)
                                                     4

        For natural and man-made objects with flat surfaces, target strength can be

                                           114
categorized under two distinct situations shown in Table 5.1. Target strength of
objects having infinite dimensions relative to the proximity of the sonar (such as
a wall or long cable) do not depend on frequency when ka1,2  1. Objects with
finite linear dimensions relative to the sonar beam (such as a cylinder or small plate)
increases with frequency. This class of objects actually have the opposite effect from
the natural low-pass filtering by frequency absorption and off-axis directivity of a
fixed-aperture transducer.

                          TS independent of f                                          TS increases with f
                                                                         a1 a2
                        T Sconvex = 10 log10                               4

                                                                          a2
                         T Ssphere = 10 log10                             4

                                                                          r2                                              A2
                         T Splate∞ = 10 log10                             4            T Splate = 10 log10                λ

                                                                        ar                                               aL2
                          T Scyl∞ = 10 log10                            2              T Scyl = 10 log10                  2λ


Table 5.1. Target strength of various simple geometrical objects. Objects that have either con-
vex surfaces or large dimensions relative to the sonar field are theoretically frequency independent.
Spheres, for example, are often used as sonar test targets due to their frequency and aspect indepen-
dence and predictable target strength. Objects that have finite linear dimensions depend strongly
on frequency, but target strength actually increases with frequency to form a high-pass transfer
function. Legend: a, spherical radius; r, circular radius; A, surface area; L, cylinder height; λ,
wavelength.


                                                               Target strength of fish at dorsal aspect
                                                       −25

                                                       −30

                                                       −35
                            target strength, TS (dB)


                                                       −40

                                                       −45

                                                       −50                                                  1 kHz
                                                                                                           10 kHz
                                                       −55                                                100 kHz

                                                       −60

                                                       −65

                                                       −70 0                       1                                 2
                                                         10                      10                                 10
                                                                          fish length (cm)


Figure 5.7. The target strength of an individual fish at dorsal aspect is strongly correlated with its
length. A minor correction term is added for a slight frequency dependence, but target strength only
drops by less than 1 dB from 10 kHz to 100 kHz. Therefore, the size of the fish is the most important
predictor of a reflected echo. These values were found to be valid over the range 0.7 < L/λ < 90 [5].


       In reality, there is rarely a case where objects consist of simple geometrical

                                                                             115
shapes. Nonetheless, real acoustic objects can be deconstructed into numerous sim-
pler structures based on the dominant reflective scatterers. In water, for example, the
target strength of fish from a dorsal aspect has been empirically derived [5, p. 315]
to depend primarily on its length, L, with a small correction factor for frequency, f ,
as T S(f ) = 19.1 log10 L − 0.9 log10 f − 62 with length in units of cm (Fig. 5.7). Due
to the large acoustic impedance mismatch, the gas-filled swim bladder is usually the
primary source of reflection and most of the remaining body is acoustically transpar-
ent [20]. Even though there is a small correction factor for frequency, target strength
remains relatively independent of frequency and implies that the swim blatter forms
a generally convex reflective surface.
        So far, we have only considered the geometrical structure of reflective scatterers.
The physical composition of these objects also matters when acoustic impedance may
cause effects such as resonance of the object. This greatly affects the target strength
at a particular frequency, but its impact should be local to the resonance. Spheres for
instance, have been well characterized for use as ideal reflectors due to their frequency
independent properties and minimal dependence on aspect angle. At low frequencies
(i.e. ka1,2 < 1) there are creeping waves, internal reflections, and other secondary
artifacts add to the overall echo structure [5]. Marine mammals have been shown to
easily detect a hollow versus filled cylinder [21]. In fact, determining the composition
of objects is of high interest for a variety of maritime applications and the source
of information most useful for classification or automatic target recognition (ATR).
For the purposes of localization and imaging, the directly reflected wavefront is the
primary echo component of interest and not the secondary wave artifacts that follow.


5.2.4     The Broadband Echo Spectrum in the Range-Azimuth Plane

The integration of all three broadband factors—environmental acoustics, directivity
patterns, and target strength—is straightforward. Since range-dependent transmis-
sion loss (Fig. 5.4) and angle-dependent directivity patterns (Fig. 5.5) are independent

                                            116
phenomena, the spectra at any particular range and angle are simply multiplied to
result in a 3-dimensional volume of relative echo intensity across range, azimuth, and
frequency. The full-spectrum target strength of an object at some location would
be applied in the same manner. For demonstration purposes, a single ideal scatterer
with constant target strength is assumed. Additionally, including directivity patterns
for elevation would require a fourth dimension, but is omitted here for simplicity.
        With the physics-based models and assumptions in place, Figure 5.8 shows the
expected magnitude spectrum of an echo at any particular point in the range-azimuth
plane. Assuming these models of the physics are correct, an echo arriving from a
specific location in space would have a spectrum matching the line cut vertically
across the frequency dimension. Thus, the multi-dimensional data set shown here
corresponds to a look-up table for the range and angle dependent transfer function
imposed by the environment and transducers. This data could be used to implement
a broadband matched-field processing algorithm and scan the acoustic space in a
manner similar to a beamformer. Conversely, the spectrum of a discrete echo received
at some random angle may be compared and matched to the spectra across the entire
space while minimizing some error function. The latter scenario would form the basis
for a machine learning classifier or regression algorithm.


5.3       Extraction of Broadband Spatial Information
          from Echoes

5.3.1     Quantifying the Angular Resolution Limit

The accuracy of spectral localization in biosonar has been studied to varying degrees
using the Cramer-Rao lower bound (CRLB) [22, 23, 24, 25, 26]. However, the angular
resolution that is achievable with a biosonar solution has only been investigated in
animals [27]. Resolution is conventionally defined as the minimum distance between


                                          117
Figure 5.8. The relative intensity of an echo is shown as a function of range, azimuth, and frequency.
Acoustic transmission loss and the composite transmit-receive beam patterns are independent func-
tions of frequency. By combining these independent functions for each receive element, the echo
spectrum can be estimated a priori for any point in the range-azimuth plane. This idea extends to
the dimension of elevation, but is restricted to range-azimuth here for visualization of the data.


two signals that arrive concurrently at a receiver, while still being resolved as distinct
objects [28]. The general definition applies to the spatial domain of range as well as
angle, but here we are most interested in angular resolution. This critical piece of in-
formation is necessary to begin developing bio-inspired broadband sonar systems for
real-world applications that must appropriately separate on-axis target echoes from
simultaneously arriving off-axis clutter. In conventional beamforming, the resolution
of an array is easily determined to be the half-power width of the summed beam
pattern (see Section 2.2). For biosonar, this is a much more difficult quantity to cal-
culate, because the beam pattern width is not what determines imaging performance.
What follows is a simplified approach to demonstrate the acoustic resolving power of
using broadband spectral information in addition to the time delay between receive
elements.
       To evaluate the information carried by broadband echoes from anywhere across


                                                 118
the range-azimuth plane we use a simplified error metric, the minimum L1 distance,
to quantify the ability to discriminate between the predicted spectrum (or transfer
function) of a single focal point and the predicted spectra across all other points in
the range-azimuth plane. Plotting the L1 distance over range and azimuth produces
an error surface whereby the characteristics of the spectrum can be discriminated by
selecting an error threshold. The error surfaces are shown in units of time, since both
the mammalian auditory system and our bio-inspired model encode the frequency
dependent amplitude of sound logarithmically into time.
      This logarithmic translation is known as amplitude-latency trading (ALT) and
was first observed as a psychoacoustic effect during behavioral experiments [29]. ALT
is a psychological shift in relative time delay that amounts to approximately 16 µs/dB.
For example, an echo that is 6 dB louder will appear to arrive ≈ 96 µs earlier. The
perceptual limits for echo range discrimination in bats has been measured to be within
2-3 µs [30] or even less than 0.5 µs in 180◦ phase reversal experiments [31]. These
experimental results imply that the biological sonar system would have little trouble
resolving targets above a discrimination threshold of on the order of 10 µs. For a
man-made system, such timing constraints are controllable and fairly easy to meet
or exceed with careful design. The fundamental limitation for these systems would
likely be the noise introduced by scatterer echoes.
      It is unknown exactly where ALT originates in the auditory system, however it
is likely a narrowband physiological effect beginning with the inner hair cells (IHC)
of the cochlea and contributed to at every incremental neural stage afterward. Its
significance to our receiver model is that relative differences in echo amplitude across
frequency will directly correspond to a decorrelation of broadband echoes in time.
Therefore, echoes can be localized by selectively adjusting the timing parameters
across individual frequency channels. Likewise, echoes that fall outside of a focal
region (e.g. selective attention) can be simply filtered out and passed to a peripheral
imaging process.


                                          119
5.3.2     Broadband Acoustic Focusing with a Single Piston Transducer

As a simple example, when using a single transducer for both transmit and receive,
the beam patterns are identical (e.g. Fig. 5.5). In this case, the entire frontal hemi-
sphere is ensonified with an arbitrary broadband waveform, but acoustically focusing
to 0◦ will result in a region that is approximately as narrow as the beam width for
the highest frequency used. Figure 5.9a shows approximately 30◦ resolution is achiev-
able for a 0.94 cm piston transducer when focusing at 4.25 m range and 0◦ azimuth.
Interestingly, the region of focus is highly range dependent and also shows a slight
bias toward closer ranges for off-axis echoes, because both transducer directivity and
acoustic absorption exhibit a low-pass filtered response. This error surface does not in-
corporate any of the high-resolution range information available by cross-correlation,
but would be used in practice to restrict the biosonar algorithm to search possible
angles within a single range. An important caveat with these results is they assume
an ideal point reflector and infinite signal-to-noise ratio.


         A                                            B


Figure 5.9. The region of focus after applying the L1 spectral distance around 4.5 m at 0◦ azimuth
(a) and 25◦ off-axis (b) for a single transmit-receive transducer. The color depth shows the distance
metric in units of µs, which is a direct logarithmic conversion from amplitude. The 0.94 cm bidirec-
tional transducer is modeled with the full bandwidth from 10 to 100 kHz. Despite having a nearly
omni-directional beam at low frequencies, the range and angle dependent spectral characteristics are
significant enough to distinguish between echoes arriving from the focal region and elsewhere in the
range-azimuth plane. A single broadband transducer could therefore serve as a very cost-effective
obstacle avoidance sensor without requiring a large aperture.


        Acoustic focusing through the L1 minimization can be applied for any point in
space, given sufficient signal-to-noise ratio. Figure 5.9b demonstrates the region of

                                                120
focus applied off-axis to 4.5 m and 25◦ . Due to the symmetry of the piston transducer’s
beam pattern, the spectral distance is zero between ±25◦ left and right. In fact, this
symmetry persists around the entire radial axis unless the beam pattern symmetry can
be broken in some manner. More interesting, though, is that the large angular width
of the focus region is significantly reduced. The reason focusing off-axis improves
resolution is that the beam patterns have an area of highest sensitivity off-axis, where
the derivative of the beam with respect to angle (spatial gradient) is significantly
larger than on-axis at 0◦ [23, 32]. These results can be iteratively applied for multiple
ranges to form a sector scan of the frontal region, or selectively when an echo has
been detected. The actual implementation will depend on the intended application
and operating environment.


5.3.3      Broadband Acoustic Focusing with a Bio-Inspired Array

Although demonstrating angular localization with only a single broadband transducer
is impressive, a pair of identical piston transducers can be used to eliminate the off-axis
ambiguities. Furthermore, by orienting each sensor off-axis by approximately ±25◦
we establish much higher angular resolution along the main-response axis. Each
transducer still has wide spatial coverage, but the difference in the beam pattern
orientations provide additional information to be gleaned. Figure 5.10 shows a bio-
inspired conceptual array based on the dimensions of the mouth and ears in an adult
E. fuscus. The horn shaped baffles and complex mouth cavity have been replaced by
standard piston transducers to show evidence that a bio-inspired broadband sonar
does not require complex beam patterns to localize and resolve echoes with precision.
         Focusing on a point at center (0◦ , 4.5 m) with this binaural configuration sig-
nificantly reduces the region of focus to several degrees; however, this causes angular
ambiguity as seen before (Fig. 5.9). Figure 5.11 shows the improved resolution for
each transmit-receive transducer pair and evidence of the problem of left-right ambi-
guity.

                                            121
Figure 5.10. A bio-inspired broadband sonar array is proposed utilizing only three circular piston-
like elements. A single broadband transmit element is used with the main response axis pointed
directly forward. Two broadband receive elements are oriented off-axis by 25◦ . The array geometry
approximates the size and relative locations of the acoustic baffles, ears and mouth, in E. fuscus. The
information available to the sonar system is the absolute pulse-echo time delay for range estimates,
the relative time-delay between receive sensors for rough horizontal angle estimates, and the spectral
information for the left-right transmit-receive pairs for precise, but ambiguous angle and range
estimates.


      A                                                B


Figure 5.11. The region of focus after applying the L1 spectral distance around 4.5 m at 0◦ azimuth
for a single transmitter and a pair of identical receive transducers. The receive elements are oriented
outward by 25◦ . Color depth shows the distance metric in units of µs. Each transmit-receive pair
shows improved resolution performance, but ambiguous regions to the left (a) and right (b). These
ambiguities can be resolved by comparing the spectral distances for each receive element. With a
pair of receive elements, additional time delay information is also available.


       One method of reducing ambiguity is to use the time-difference of arrival (TDOA)
between each sensor. Figure 5.12 plots the relative time delay between receive ele-
ments spaced 1.4 cm apart. In air, the difference in echo arrival would be anywhere
between ±40 µs. For an echo arriving at 0◦ , the localization accuracy would not be
sufficient to achieve high-resolution imaging [32]. Although TDOA is relatively insen-
sitive across angle, this can be used to eliminate the ambiguity shown in Figure 5.11.

                                                 122
TDOA also has biological significance, because the interaural time delay (ITD) is a
primary auditory cue for localization in azimuth by hearing mammals [33].


Figure 5.12. The time difference of arrival (TDOA) between two receiving transducers is shown
when separated by 1.4 cm. As expected, TDOA information is range-independent. Although it
has been shown to be a useful approach to biomimetic localization [34, 35], the lack of sensitivity
with angle renders it highly inaccurate for localization when used as the only source of angular
information.


        After combining the resulting spectral distance from each sensor and the TDOA
localization (Fig. 5.12), we can see that high-resolution is certainly achievable under
the imposed assumptions. To compute the accuracy of localization in the form of
the Cramer-Rao Lower Bound (CRLB), we must include noise to the model intro-
duced as ambient acoustic noise, internally generated electronic (or neural) noise, and
perturbations caused by natural imperfections in target echoes, sonar system beam
patterns, and inaccuracies of the environmental models.


5.3.4     Mutual Interference and the Diffraction Patterns of Scatterers

Feynman [36, p. 30-1] prefaced a discussion on the subject of diffraction by stating:

      No-one has ever been able to define the difference between interference
      and diffraction satisfactorily. It is just a question of usage, and there is no
      specific, important physical difference between them.

When multiple acoustic waveforms coincide simultaneously in space and time, the
pressure and particle velocity of each wave combines through constructive and de-

                                               123
Figure 5.13. The region of focus after combining binaural spectrogram correlation and TDOA
estimates. By fusing the L1 spectral difference for left and right sensors and the relative time delay
between sensors, the region of focus is restricted to a single unambiguous point. A timing threshold
could be selected based on a sliding window to adjust for spectral and other noise sources. With
a threshold of 20 µs, the region of focus is restricted to only about 2◦ degrees – reasonably close
to the horizontal acuity of E. fuscus [27]. This result clearly demonstrates high-resolution angular
imaging is theoretically achievable by including broadband spectral information.


structive interference. This interference causes specific temporal and/or spatial pat-
terns. For example, a predictable pattern of spectral notches appears in the over-
lapping echoes from two or more closely spaced point reflectors. The bat’s neural
circuitry exploits these temporal interference patterns and has been shown to be the
mechanism responsible for achieving hyper-resolution in range [37, 38, 39, 40, 41].
       Several possible solutions may be used by the bat to cope with echoes arriving
simultaneously from multiple locations:

   1. It is certainly possible that interference patterns by echoes arriving from sepa-
      rate angles have a distinct pattern and can be deconvolved by the same neural
      circuitry for hyper-resolution in range. Reconstructing the coincidence of echoes
      in this manner would be enough to perform localization through spectral pattern
      matching on the separated echoes.

   2. With the very high repetition rates of pulse emissions, the simultaneous coinci-
      dence of two echoes in clutter may be overcome by rejecting these echoes and
      waiting until the subsequent pulse’s echoes arrive. Therefore, mutual interfer-
      ence of two echoes could be interpreted as a single invalid echo without achieving

                                                 124
      sufficient coincidence to register in the high-resolution display.

  3. The possibility that two echoes overlap with perfect coherence at both ears at
      exactly the same time is unlikely, even in dense clutter. Statistical averaging
      over many pulses may be the simplest solution.

  4. If and when interference persists, the animals may resolve the ambiguity by
      changing the shape of the beams through adaptive methods, such as receive
      beam movements. In fact, these beam pattern dynamics are just being discov-
      ered and appear to be intentional [42, 43].

      Regardless of the actual mechanisms by which bats handle interfering echoes ar-
riving concurrently, observation and laboratory experiments with these animals have
shown that deconvolution and subsequent clutter rejection is not only possible, but
also reliably consistent [44]. Likewise, successfully dealing with mutually interfering
scatterers is critical to the success of any bio-inspired sonar system, and failure to
implement a working deconvolution process will restrict the sonar’s operation to a
subset of trivial scenarios without clutter. Resolution, after all, is defined using the
mutual interference of two or more scatterers arriving simultaneously. Without this
angular resolution, the bio-inspired sonar system remains a research project.


5.4      Performance Comparison with Conventional
         Acoustic Imaging
One of the common points of contention for researchers studying biosonar is a lack
of sufficient comparison metrics with existing approaches. In many respects, this is a
difficult comparison to make due to 1) a weak understanding of exactly how animals
process acoustic signals, and 2) the fundamental difference in the information con-
tent being processed. This section addresses these concerns by providing a baseline
comparison with conventional narrowband beamforming, while still including the ad-

                                           125
vantage of signal bandwidth. Conventional beamforming techniques utilize only the
relative time delay between elements to perform angular imaging. Given the same
basic set of sonar array geometry, element beam patterns, and acoustic signals avail-
able to the bat, the results here clearly show the advantages of combining time delay
and spectral information over conventional delay-and-sum beamforming alone.


5.4.1     Processing Broadband Signals with Suboptimal Element Spacing

There are many possible ways to compare conventional acoustic imaging systems with
the biosonar imaging approach. Since we claim that signal bandwidth is the critical
enabler for biosonar’s additional resolving power, it would be appropriate to include
the same bandwidth in conventional beamforming for a fair comparison. The main
difficulties are that biosonar elements are widely spaced relative to the wavelengths
in air (d = 1.4 cm; 0.34 cm ≤ λ ≤ 3.4 cm between 20 to 100 kHz) and the beam
patterns vary significantly over the broad range of frequencies used. To resolve these
issues, broadband signals can be processed with multiple narrowband beams and then
combined additively. This is precisely the solution proposed by Hinich to perform
broadband array signal processing while removing the angular ambiguity caused by
insufficient array spacing (when d > λ/2) [45].
        The phase delay beamformer is a standard narrowband method for producing
multi-beam acoustic images (see Section 2.2 for an overview). For every frequency,
f , and steer angle, θ, the phase delay beamformer response of an N element array is
computed as


                                Y (f, θ) = df (θ)WxTf                           (5.15)

where df (θ) is the 1 × N steering vector of complex phase delays steered to angle θ,
W is the diagonal N × N aperture shading matrix, and xTf is the transposed N × 1
complex frequency data vector for a single time or range bin. To apply broadband


                                         126
processing via the Hinich method, the steered beam response at each frequency is
summed over M discrete frequencies, f = {f1 , f2 . . . fM }, as

                                             M
                                             X
                                Ysum (θ) =         |Y (fi , θ)|                  (5.16)
                                             i=1


where the choice of f might be aligned with FFT bins containing the signal. To
illustrate this technique, Figure 5.14 shows the narrowband and summed beamformer
response to an ideal target (i.e. xf = df (ψ) with a point target at angle, ψ) for an
array of N = 10 elements spaced by d = 1.4 cm. In the case of suboptimally spaced
elements the main lobe of every beam points to the steered angle, while grating
lobe angles vary with frequency. Thus, main lobes add coherently and grating lobes
are reduced by averaging with side lobes. This example shows that grating lobes
can effectively be suppressed at the cost of increased side lobe levels. The angular
resolution of the beam approaches the half-power beam width of the highest frequency
beam, although the energy does taper off more slowly with angle.
      For the case of a biosonar system, the receive array from Figure 5.10 consists
of N = 2 elements spaced by d = 1.4 cm. With only two elements, application of
aperture shading coefficients becomes impossible and W is simply the 2 × 2 identity
matrix. Figure 5.15 plots the beam response at several narrowband frequencies and
the combined beam pattern using the Hinich approach. This simplified array creates a
dipole beam pattern, which is aliased above 12 kHz. The narrowband beam patterns
no longer contain any side lobes and consist entirely of a large main lobe and repeating
grating lobes. Summing beams across frequency as in Equation 5.4.1 does not provide
any obvious benefit for directivity or resolution.
      These results assume omni-directional transmit and receive elements for sim-
plicity. This analysis could be extended to include directivity of individual elements;
however, the resulting conclusions would be the same – conventional array signal
processing with a N = 2 element array has insufficient resolution and large an-

                                             127
                             Beam Response (ψ=0°, N=10, d=1.4cm)                                        Combined Beam Response (ψ=0°, N=10, d=1.4cm)
                     10                                                                                  10
                                                                                A                                                                            C


                                                                                         Mag. (dB)
     Mag. (dB)        0                                                                                   0

                    −10                                                                                 −10

                    −20                                                                                 −20

                           −80    −60   −40   −20   0       20   40      60     80                             −80   −60   −40   −20   0     20   40    60   80

                                         10 kHz      60 kHz         100 kHz

                      1                                                                                   1
                                                                                B                                                                            D


                                                                                       Amplitude
   Amplitude


                    0.5                                                                                 0.5
                      0                                                                                   0
                 −0.5                                                                                −0.5
                    −1                                                                                  −1
                           −80    −60   −40   −20   0       20   40      60     80                             −80   −60   −40   −20   0     20   40    60   80
                                          Bearing Angle, θ (deg.)                                                            Bearing Angle, θ (deg.)


Figure 5.14. The beam patterns of an array with N = 10 omni-directional elements spaced at d
= 1.4 cm. Chebychev aperture shading coefficients for 20 dB side lobes are applied to the beams.
Beam patterns for several narrowband frequencies (a and b) and the combined beam pattern across
all frequencies (c and d) are shown for both magnitude (top) and linear amplitude (bottom) for
illustration. In this example, the steered angle is ψ = 0◦ , but the concept works for any steered
angle. The design frequency of the uniform line array, fd = λ/2, is approximately 12.3 kHz in air
(given c = 344 m/s). Note that the beam patterns at higher frequencies scale inward by cos(θ)
and are spatially aliased for frequencies above fd . When the beam patterns across all frequencies
in the decade from 10 to 100 kHz are combined in steps of 1 kHz, grating lobes can effectively be
suppressed. The abrupt change in sideband levels occur at the grating lobe locations for the highest
frequency beam.


                                 Beam Response (ψ=0°, N=2, d=1.4cm)                                       Combined Beam Response (ψ=0°, N=2, d=1.4cm)
                     10                                                                                  10
                                                                                A                                                                            C
        Mag. (dB)


                                                                                            Mag. (dB)


                      0                                                                                   0

                    −10                                                                                 −10

                    −20                                                                                 −20

                           −80    −60   −40   −20    0      20   40       60    80                             −80   −60   −40   −20    0    20    40   60   80
                                         10 kHz         60 kHz        100 kHz

                      1
                                                                                B                         1
                                                                                                                                                             D
     Amplitude


                                                                                         Amplitude


                     0.5                                                                                 0.5
                      0                                                                                   0
                    −0.5                                                                                −0.5
                     −1                                                                                  −1
                           −80    −60   −40   −20    0      20   40       60    80                             −80   −60   −40   −20    0    20    40   60   80
                                          Bearing Angle, θ (deg.)                                                            Bearing Angle, θ (deg.)


Figure 5.15. The beam patterns of an array with N = 2 omni-directional elements spaced apart
by d = 1.4 cm. No aperture shading is possible with only 2 elements. The natural (ψ = 0◦ ) beam
response at 10, 60, and 100 kHz is plotted in log-magnitude (a) and linear amplitude (b) units. The
combined beam response is shown for the same linearly spaced frequencies between 10 and 100 kHz
as in Figure 5.14 (c and d). With no side lobes present, the grating lobes are averaged together
and only results in approximately 3 dB sideband suppression, which is not useful for high-resolution
angular imaging.


                                                                                     128
gular ambiguities. Most importantly, this minimal array configuration will always
have complete ambiguity in elevation unless additional vertical elements are added or
broadband spectral information is included in the processing.


5.4.2     Coherent Summation of Broadband Signals

Narrowband signals received by a sonar are not typically coherent and the environ-
ment will force the phase of incident waves to behave as a uniform random variable.
At first glance, the destruction of phase information by wave propagation implies
that only the magnitude information persists at the receiver. However, even when
the environment causes the absolute phase of a signal to become random, the rela-
tive coherence of a broadband signal may still persist if an acoustic wave travels in
unison across the same ray propagation path. Acoustic dispersion would be direct
evidence to the contrary, but this does not appear to be a significant factor across
the frequencies or short distances relevant to biosonar, either in air or water. If these
assumptions hold true, then the inclusion of phase information in combined beam pat-
terns is warranted and will provide new information that is absent from the original
Hinich approach.
        Equation 5.16 required adding the absolute value of each frequency dependent
beam pattern. Instead, phase information in the form of alternating positive-negative
amplitudes can be included in the summation as

                                             M
                                             X
                                Ysum (θ) =         Y (fi , θ)                     (5.17)
                                             i=1

where f is selected as before. Since the grating lobes alternate between positive and
negative amplitudes, the exact method for choosing f becomes more important. Fig-
ure 5.16 shows the effect of summing linearly across frequencies (e.g. consecutive FFT
bins) compared with summing logarithmically (e.g. constant-Q filterbank, wavelets).
In either case, coherent broadband processing produces reasonable sidelobe suppres-


                                          129
sion. As before, the summed beamformer resolution will approach the width of the
highest beam. The reason logarithmic frequencies produce better sideband suppres-
sion is that there is more balanced cancellation between positive and negative grating
lobes.

                    Combined Beam Response (ψ=0°, N=2, d=1.4cm)                                Combined Beam Response (ψ=0°, N=2, d=1.4cm)
                   10                                                                         10
                                                                      A                                                                          C
      Mag. (dB)


                                                                                 Mag. (dB)
                    0                                                                          0

                  −10                                                                        −10

                  −20                                                                        −20

                         −80   −60   −40   −20   0     20   40   60   80                            −80   −60   −40   −20   0     20   40   60   80


                    1                                                                          1
                                                                      B                                                                          D
    Amplitude


                                                                             Amplitude
                   0.5                                                                        0.5
                    0                                                                          0
                  −0.5                                                                       −0.5
                   −1                                                                         −1
                         −80   −60   −40   −20   0     20   40   60   80                            −80   −60   −40   −20   0     20   40   60   80
                                       Bearing Angle, θ (deg.)                                                    Bearing Angle, θ (deg.)


Figure 5.16. Summed beam patterns for a simple array of N = 2 elements spaced apart by d =
1.4 cm. Combining the narrowband beams with relative phase information intact provides better
results than simply summing the magnitudes. The manner in which frequencies are selected also
appears to be significant. Summing linearly across frequencies 10 to 100 kHz in 1 kHz bins results
in good suppression of grating lobes overall, but high side lobes (a and b). Summing logarithmically
across the same frequency range produces a slightly larger main lobe, but significantly suppressed
sideband between 20◦ and 40◦ (c and d). Similar to the N = 14 case, the best angular resolution
achievable approaches the main lobe width of the highest frequency bin.


                  In summary, coherent addition of narrowband beam patterns provides a sig-
nificantly improved beam response over incoherent addition. In fact, for the same
N = 2 array configuration, acoustic angular imaging is not possible using conven-
tional phase-delay beamforming. For the angular resolving power of coherent sum-
mation, the theoretical angular resolution of approximately 12◦ to 16◦ (for linear and
logarithmic coherent summation, respectively) is still an order of magnitude larger
than the 1.5◦ resolution demonstrated by including spectral pattern matching in the
acoustic imaging process (Section 5.3). It should be restated that this form of coher-
ent addition only remains valid under the assumption that relative coherence across
all frequencies is maintained throughout signal transmission, propagation, reflection,
and reception.


                                                                           130
5.4.3     Limitations to Conventional Beamforming Comparisons

The broadband Hinich approach provides a reasonable performance comparison be-
tween conventional active array signal processing and bio-inspired broadband sonar.
By using a decade of bandwidth, both methods outperform conventional narrowband
processing for a simple two-element receiver. Furthermore, exploiting spectral infor-
mation for acoustic imaging provides the significant advantage of bio-inspired broad-
band sonar over current broadband sonar techniques. The results already shown in
this section are representative of the current practice of frequency-domain beamform-
ing in high-resolution acoustic imaging; however, some caution in interpreting these
results is warranted.
        The most significant drawbacks of applying phase delay beamforming to the
biosonar array geometry are the small aperture-to-wavelength ratio and insufficient
element spacing. Given the N = 2 elements are spaced at d = 1.4 cm, the aperture-
to-wavelength ratio, L/λ, is between 2.5 and 0.25. Applying conventional array de-
sign techniques, the aperture-to-wavelength ratio for 1◦ of angular resolution requires
L/λ ≥ 46 (Eq. 2.6). Furthermore, element spacing of d = 1.4 cm  λ/2 (≈ 0.17 cm
at 100 kHz) and causes significant spatial aliasing in the form of grating lobes, which
prevent the two-element receive array from achieving any directivity unless signals
remain coherent across all frequencies (i.e. coherent addition). These problems pre-
vent a direct comparison of the acoustic information being processed and are instead
indicative that the frequency-domain approach violates the assumptions under which
it was originally derived.
        To make a fair technical comparison between biosonar and convention, an alter-
nate view of the problem is required. Grating lobes are aliased spatial images of the
main lobe and only appear in the sonar field-of-view when array element spacing is
designed inadequately. The ambiguities introduced with grating lobes are merely an
artifact of the processing itself and are therefore not implicit in the information being
extracted. If beamforming is instead implemented directly in the time domain rather

                                          131
than the frequency domain, then spatial aliasing does not occur. In fact, despite
being considered inefficient, time-domain delay-and-sum beamforming is a standard
technique for sound source localization with sparse arrays [46, 47, 48].
      A better comparison than the Hinich approach might be to compare the matched
filter response (the cross-correlation of a transmitted signal with received signals) for
2 closely-spaced scatterers with a broadband signal. This performance bound is the
criterion stated by Altes in order to address the issue of resolution in azimuth [22]. In
this situation, resolution could easily be determined by finding the minimum angular
spacing before the cross-correlation peaks of two closely spaced scatterers overlap and
merge into one. With this approach, the wider the spacing between each sensor, the
better the angular resolution that can be achieved. At some point, however, correla-
tion between the receiving sensor elements will degrade and overall performance will
diminish [49].


5.5      Discussion
Acoustic dispersion, or the frequency-dependent speed of sound, is notably absent
from this analysis. For aerial biosonar, dispersive effects are not significant at the
ultrasonic frequencies considered. In gasses such as air, dispersion only becomes rele-
vant “at such high frequencies that the wavelength of the sound wave is smaller than
the mean free path of the molecules” [4]. Furthermore, we have assumed isovelocity
(line-of-sight) sound propagation paths. This seems reasonable since, as a gas, air
is typically able to diffuse freely and create a locally homogenous environment in
the region aerial biosonar would be used. Dispersion and absorption characteristics
do change at high altitudes, but not near the surface where bats operate [50]. Un-
derwater sound propagation is a much different scenario and these effects must be
considered for naval applications (see Appendix A). Although non-linear sound prop-
agation paths, significantly reduced absorption losses, and various inhomogeneities


                                          132
persist in the underwater realm, echolocating marine mammals do not necessarily
require precision imaging at several kilometers. Instead, since these animals are for-
aging at close range and using short click impulses, these issues may not significantly
affect the biosonar imaging process where high-resolution is needed.
      Environmental parameters cannot be actively controlled by any sonar system;
however, changes in the environment do occur on relatively slow time scales. There-
fore, a sonar system emitting hundreds or thousands of pulses per minute should be
able to adapt to these changes through iterative feedback or a self-calibration pro-
cedure. With current technology, this task might be well suited for adaptive linear
filtering or a variety of machine learning techniques [51]. To make a familiar analogy,
human vision generally requires time to readjust to changing light levels and focal
depth. In biosonar, this focal adjustment period invokes the proper combination of
the time-frequency waveform structure, control of respiratory exhalation, movement
of the head and pinnae, and memory of prior conditions that led to a sharper image.
      By the same manner, we propose that echolocating animals must perform a con-
tinuous self-calibration to environmental operating conditions and dynamic beam pat-
terns by fine-tuning both auditory neural circuitry and motor control circuitry [52, 53].
Neural networks in the auditory system are well known for their sensitivity to single
spikes, or more specifically single spike events encoded by populations of neurons [39],
and also selectivity to specific waveform features in the auditory cortex [54]. In the
bat’s brain, the mechanisms enabling this precision calibration likely occur on two
very different time-scales. For example, long-term potentiation (LTP) and depression
(LTD) would be the synaptic mechanisms responsible for forming a persistent mem-
ory of the time-frequency signature of an echolocation signal for an individual [55, 56].
Short-term synaptic plasticity, on the other hand, is necessary to make the fine-scale
adjustments in coincidence detection when there is an abrupt shift in signal char-
acteristics caused by either the changing external environment or modified transmit
or receive beam patterns. Spike-timing-dependent plasticity (STDP) [57] operates


                                          133
on a fast enough time-scale and is sensitive enough to adjust the perturbed time-
frequency signatures from pulse-to-pulse ensuring a well-focused spatial image (refer
to Section 2.1.2). Observing this adaptation in bats during infantile development
would be an ideal time frame to study, because it forms the neural basis for the abil-
ity of an adult bat to cope with the constantly changing environment and the diverse
set of echolocation operating modes.
      Future work in this area will include empirical analyses of how broadband spec-
tral information changes in realistic aerial and underwater environments. In addition,
the sensitivity of broadband information (and therefore sonar system performance) to
changing environmental parameters, directivity patterns, interference from multiple
scatterers, and intrinsic/extrinsic noise will be evaluated computationally. This could
be accomplished through local perturbation methods by examining one variable at a
time at a fixed operating point. Sensitivity, in this sense, can be computed by evalu-
ating the partial derivative of the information or performance metric with respect to
a single changing parameter (i.e. ∂Y /∂Xi for an output result Y and input param-
eter set Xi ). To address uncertainties in a) the directivity patterns of sensors and
b) the target strength of complex structured and unstructured objects, Bayesian or
variance-based methods would be more appropriate given the appropriate probabilis-
tic models and allows for the full exploration of the input parameter space to include
interactions and nonlinear responses [58, 59, 60]. Information theoretic approaches
are an appealing way to quantify the channel capacity of the acoustic environment and
have already been applied to estimate the spatial information encoded by biosonar
directivity patterns [25, 23]. Regardless of the method that is used to quantify the
information carried in broadband echoes, the goal is to explore the fundamental lim-
itations of biosonar that will lead to new and exciting solutions to acoustic imaging
with micro-apertures.


                                         134
5.6      Acknowledgments
The authors would like to thank Andrew Hull (NUWC) and Dimitri Donskoy (Stevens
Institute of Technology) for their helpful guidance on acoustics and transducer mod-
eling. We also acknowledge Michael Medeiros, Adam Mirkin, Robert Carpenter, and
Ashwin Sarma from NUWC for numerous lengthy discussions on bioacoustics and
array signal processing.


A      Applying Biosonar Modeling to Underwater Acous-
       tic Imaging
There are very significant differences between sound propagation in air and water. The
speed of sound is approximately 4.3 times faster in water than air; cwater ≈ 1, 470 m/s
at the ocean’s mean temperature (4◦ C) versus cair ≈ 344 m/s at room tempera-
ture (22◦ C). Wavelengths are therefore 4.3 times shorter in water. Dispersion, non-
homogeneities, and other non-linearities are also more prominent. Despite these diffi-
culties, underwater animal models of biosonar (cetaceans) thrive in this environment
and prove that mammalian echolocation can not only function, but exceeds some of
the limitations that exist in air.
      The most important modification to the previous results is an updated ab-
sorption model. Transducer design and target physics will scale with the changes in
wavelength and acoustic impedance, but these models are not fundamentally different
than in air. Therefore, this appendix serves to modify the absorption modeling for
seawater by rearranging and presenting the equations in the same form.
      In water, absorption is also a monotonically increasing function of frequency
that depends heavily upon environmental parameters [61]. Several important differ-
ences from air exist, including the fact that at any particular frequency absorption
is on the order of 100 times weaker per unit of distance than in air, so sound waves


                                         135
of the same acoustic frequency will travel over much longer distances. Bottlenose
dolphins, for example, are capable of detecting a sphere the size of a ping pong ball
at distances of 100 meters [21]. By comparison bats have a limited echolocation range
of 10–20 m and may rely upon their spatial memory for more global navigation [62].
      The model equations for predicting absorption in water require temperature,
T (◦ C); depth (an indirect measure of the pressure), D (m); salinity, S (ppt); and
acidity, pH. Similar to the equation in air, absorption can be split into several
dominant components:


                                                f 2 FrB                                f 2 FrM
                                                                                               
                        2
      α(f ) = α
              ˆ cr (f )f + α
                           ˆ vib,B (f )                2
                                                                 +α
                                                                  ˆ vib,M (f )                2
                                                                                                      (5.18)
                                              f 2 + FrB                              f 2 + FrM

where α
      ˆ cr is the absorption component due to classical physics, α
                                                                 ˆ vib,B is absorption
due to the vibrational relaxation in boric acid and α
                                                    ˆ vib,M is the same for magnesium
sulfate (MgSO4 ). These components are further defined:


                               αcr = 4.9e-4 × e−(T /27+D/17) ,                                        (5.19)

                            αvib,B = 0.106 × e(pH−8)/0.56 , and                                       (5.20)
                                                     
                                                  T        S
                            αvib,M = 0.52 × 1 +                e−D/6 .                                (5.21)
                                                  43       35


The relaxation frequencies are


                                                          12
                                                     S
                                 FrB = 0.78                      eT /26 , and                         (5.22)
                                                     35
                                 FrM = 42.0 eT /17 .                                                  (5.23)


The resulting absorption coefficient in water may be applied to Equation 5.10 as before

                                                   136
(Figure 5.17). The overall impact of this change on T L is that spherical spreading
will dominate for much farther distances out to approximately 100 m. Therefore,
frequency spectra of echoes in water do not depend on range as much as they do
in air. It is worth mentioning that the reference units for sound differ (re 20 µPa
in air and re 1 µPa in water), but this does not impact the relative units used for
absorption.

                                                        Absorption in water (T=−5−35°C, D=0m, S=35ppt, pH=8)
                                                    3
                                                   10
                                                                      -5◦ C
                                                                       0◦ C
              Absorption Coefficient α (dB / km)


                                                                       5◦ C
                                                                      10◦ C
                                                    2                 15◦ C
                                                   10                 20◦ C
                                                                      25◦ C
                                                                      30◦ C
                                                                      35◦ C

                                                    1
                                                   10


                                                    0
                                                   10


                                                    −1
                                                   10


                                                    −2
                                                   10
                                                          3                   4                 5         6
                                                        10                10                10          10
                                                                               Frequency (Hz)

Figure 5.17. Absorption coefficient in water vs. frequency at various temperatures between -5◦ C
and 35◦ C, depth of 0 m, salinity of 35 ppt, and acidity of 8.0 pH. The same general trends exist
for frequency-dependent absorption in water as well as air. Both environments enforce a general
low-pass filter, are monotonic functions of frequency, and depend upon environmental conditions
such as temperature, pressure, and molecular concentrations. One significant difference, however, is
that absorption in water is three orders of magnitude lower. Therefore, sound not only travels faster
in water, but also much farther for the same set of frequencies.


References
 [1] “ANSI S1.26-1995 (R2009) Method for Calculation of the Absorption of Sound
     by the Atmosphere”, American National Standards Institute, New York (2009).


                                                                                  137
 [2] H. Bass, L. Sutherland, A. Zuckerwar, D. Blackstock, and D. Hester, “Atmo-
     spheric absorption of sound: Further developments”, J. Acoust. Soc. Am. 97,
     680–683 (1995).
 [3] H. E. Bass, L. C. Sutherland, and A. J. Zuckerwar, “Atmospheric absorption of
     sound: Update”, J. Acoust. Soc. Am. 88, 2019–2021 (1990).
 [4] A. B. Bhatia, Ultrasonic absorption: An introduction to the theory of sound
     absorption and dispersion in gases, liquids, and solids (Oxford University Press,
     New York) (1985).
 [5] R. Urick, Principles of Underwater Sound, 3rd edition (Pennsylvania Publica-
     tions, Los Altos, CA) (1983).
 [6] D. H. Johnson and D. E. Dudgeon, Array Signal Processing: Concepts and Tech-
     niques (Prentice Hall PTR, Upper Saddle River, NJ) (1993).
 [7] M. E. Bates and J. A. Simmons, “Effects of filtering of harmonics from biosonar
     echoes on delay acuity by big brown bats (Eptesicus fuscus)”, J. Acoust. Soc.
     Am. 128, 936–946 (2010).
 [8] S. N. Rschevkin, “A Course of Lectures on the Theory of Sound”, MacMillan,
     New York (1963).
 [9] L. Kinsler, A. Frey, A. Coppens, and J. Sanders, Fundamentals of Acoustics, 4th
     edition (Wiley, New York) (1999).
[10] Q. Zhuang and R. M¨  uller, “Noseleaf furrows in a horseshoe bat act as resonance
     cavities shaping the biosonar beam”, Phys. Rev. Lett. 97, 218701 (2006).
[11] T. W. Cranford, P. Krysl, and J. A. Hildebrand, “Acoustic pathways revealed:
     Simulated sound transmission and reception in Cuvier’s beaked whale (Ziphius
     cavirostris)”, Bioinspiration Biomimetics 3, 016001 (2008).
[12] T. A. Mooney, M. Yamato, and B. K. Branstetter, Hearing in Cetaceans: From
     Natural History to Experimental Biology, volume 63, 1st edition (Elsevier Ltd.)
     (2012).
[13] J. L. Aroyan, “Three-dimensional modeling of hearing in Delphinus delphis”, J.
     Acoust. Soc. Am. 110, 3305–3318 (2001).
[14] M. Yamato, D. R. Ketten, J. Arruda, S. Cramer, and K. Moore, “The auditory
     anatomy of the minke whale (Balaenoptera acutorostrata): A potential fatty
     sound reception pathway in a baleen whale”, Anat. Rec. 295, 991–998 (2012).
[15] R. M¨
         uller and J. C. T. Hallam, “Knowledge mining for biomimetic smart antenna
     shapes”, Rob. Autom. Syst. 50, 131–145 (2005).
[16] D. Vanderelst, F. De Mey, H. Peremans, I. Geipel, E. Kalko, and U. Firzlaff,
     “What noseleaves do for FM bats depends on their degree of sensorial special-
     ization”, PLoS ONE 5, e11893 (2010).


                                         138
[17] J. Ma and R. M¨  uller, “A method for characterizing the biodiversity in bat pinnae
     as a basis for engineering analysis”, Bioinspiration Biomimetics 6, 026008 (2011).
[18] N. H. Fletcher and S. Thwaites, “Obliquely truncated simple horns: Idealized
     models for vertebrate pinnae”, Acustica 65, 194–204 (1988).
[19] P. M. Morse and K. U. Ingard, Theoretical Acoustics (Princeton University Press,
     New Jersey) (1986).
[20] K. G. Foote, “Importance of the swimbladder in acoustic scattering by fish: A
     comparison of gadoid and mackerel target strengths”, J. Acoust. Soc. Am. 67,
     2084–2089 (1980).
[21] W. W. Au and K. J. Snyder, “Long-range target detection in open waters by an
     echolocating Atlantic Bottlenose dolphin (Tursiops truncatus)”, J. Acoust. Soc.
     Am. 68, 1077–1084 (1980).
[22] R. Altes, “Angle estimation and binaural processing in animal echolocation”, J.
     Acoust. Soc. Am. 63, 155–173 (1978).
[23] R. M¨uller, H. Lu, and J. Buck, “Sound-diffracting flap in the ear of a bat gener-
     ates spatial information”, Phys. Rev. Lett. 100, 108701 (2008).
[24] J. Reijniers and H. Peremans, “Biomimetic sonar system performing spectrum-
     based localization”, IEEE Trans. Robot. 23, 1151–1159 (2007).
[25] J. Reijniers, D. Vanderelst, and H. Peremans, “Morphology-induced information
     transfer in bat sonar”, Phys. Rev. Lett. 105, 148701 (2010).
[26] D. Vanderelst, J. Reijniers, F. Schillebeeckx, and H. Peremans, “Evaluat-
     ing three-dimensional localisation information generated by bio-inspired in-air
     sonar”, IET Radar Sonar Navig. 6, 516–525 (2012).
[27] J. A. Simmons, S. A. Kick, B. D. Lawrence, C. Hale, C. Bard, and B. Escudie,
     “Acuity of horizontal angle discrimination by the echolocating bat, Eptesicus
     fuscus”, J. Comp. Physiol. A 153, 321–330 (1983).
[28] A. Rihaczek, Principles of High-Resolution Radar (Artech House, Norwood, MA)
     (1996).
[29] M. E. Bates, J. A. Simmons, and T. V. Zorikov, “Bats use echo harmonic struc-
     ture to distinguish their targets from background clutter”, Science 333, 627–630
     (2011).
[30] J. A. Simmons, “The resolution of target range by echolocating bats”, J. Acoust.
     Soc. Am. 54, 157–173 (1973).
[31] C. Moss and J. Simmons, “Acoustic image representation of a point target in
     the bat Eptesicus fuscus: Evidence for sensitivity to echo phase in bat sonar”,
     J. Acoust. Soc. Am. 93, 1553–1562 (1993).


                                          139
[32] S. Kay, Fundamentals of Statistical Signal Processing, Volume I: Estimation The-
     ory (Prentice Hall PTR, Upper Saddle River, NJ) (1993).
[33] A. Brand, O. Behrend, T. Marquardt, D. Mcalpine, and B. Grothe, “Precise
     inhibition is essential for microsecond interaural time difference coding”, Nature
     417, 543–547 (2002).
[34] F. Schillebeeckx and H. Peremans, “Biomimetic sonar: 3D-localization of multi-
     ple reflectors”, in IEEE/RSJ International Conference on Intelligent Robots and
     Systems, 3079–3084 (2010).
[35] R. Kuc, “Biomimetic sonar locates and recognizes objects”, J. Ocean. Eng., IEEE
     22, 616–624 (1997).
[36] R. P. Feynman, R. B. Leighton, and M. Sands, The Feynman Lectures on Physics,
     Mainly Mechanics, Radiation, and Heat, volume 1 (Addison-Wesley, Reading,
     MA) (1963).
[37] M. Park and R. Allen, “Pattern-matching analysis of fine echo delays by the
     spectrogram correlation and transformation receiver”, J. Acoust. Soc. Am. 128,
     1490–1500 (2010).
[38] M. I. Sanderson, N. Neretti, N. Intrator, and J. A. Simmons, “Evaluation of an
     auditory model for echo delay accuracy in wideband biosonar”, J. Acoust. Soc.
     Am. 114, 1648–1659 (2003).
[39] M. Sanderson and J. Simmons, “Selectivity for echo spectral interference and
     delay in the auditory cortex of the big brown bat Eptesicus fuscus”, J. Neuro-
     physiol. 87, 2823–2834 (2002).
[40] M. Sanderson and J. Simmons, “Neural responses to overlapping FM sounds
     in the inferior colliculus of echolocating bats”, J. Neurophysiol. 83, 1840–1855
     (2000).
[41] P. Saillant, J. Simmons, S. Dear, and T. McMullen, “A computational model of
     echo processing and acoustic imaging in frequency-modulated echolocating bats:
     The spectrogram correlation and transformation receiver”, J. Acoust. Soc. Am.
     94, 2691–2712 (1993).
[42] L. Gao, S. Balakrishnan, W. He, Z. Yan, and R. M¨      uller, “Ear deformations
     give bats a physical mechanism for fast adaptation of ultrasonic beam patterns”,
     Phys. Rev. Lett. 107, 214301 (2011).
[43] L. Jakobsen and A. Surlykke, “Vespertilionid bats control the width of their
     biosonar sound beam dynamically during prey pursuit”, Proc. Natl. Acad. Sci.
     U.S.A. 107, 13930–13935 (2010).
[44] M. Warnecke, M. E. Bates, V. Flores, and J. A. Simmons, “Spatial release from
     simultaneous echo masking in bat sonar”, J. Acoust. Soc. Am. 135, 1–9 (2014).
[45] M. J. Hinich, “Processing spatially aliased arrays”, J. Acoust. Soc. Am. 64,
     792–794 (1978).

                                         140
[46] M. Gillette and H. Silverman, “A linear closed-form algorithm for source lo-
     calization from time-differences of arrival”, IEEE Signal Process. Lett. 15, 1–4
     (2008).
[47] M. Brandstein, J. Adcock, and H. Silverman, “Microphone-array localization
     error estimation with application to sensor placement”, J. Acoust. Soc. Am. 99,
     3807–3816 (1996).
[48] H. Do, H. Silverman, and Y. Yu, “A real-time SRP-PHAT source location im-
     plementation using stochastic region contraction (SRC) on a large-aperture mi-
     crophone array”, IEEE ICASSP 2007 Proc. 1, I–121–I–124 (2007).
[49] J. F. Lynch, T. F. Duda, and J. A. Colosi, “Acoustical Horizontal Array Coher-
     ence Lengths and the Carey Number”, Acoustics Today 10, 10–17 (2014).
[50] H. E. Bass, C. H. Hetzer, and R. Raspet, “On the speed of sound in the atmo-
     sphere as a function of altitude and frequency”, J. Geophys. Res. 112, D15110
     (2007).
[51] S. S. Haykin, Neural Networks and Learning Machines (Prentice Hall, Upper
     Saddle River, NJ) (2009).
[52] M. Wehr and A. M. Zador, “Balanced inhibition underlies tuning and sharpens
     spike timing in auditory cortex”, Nature 426, 442–446 (2003).
[53] W. M. Masters, A. J. Moffat, and J. A. Simmons, “Sonar tracking of horizontally
     moving targets by the big brown bat Eptesicus fuscus”, Science 228, 1331–1333
     (1985).
[54] C.-Q. Ye, M.-M. Poo, Y. Dan, and X.-H. Zhang, “Synaptic mechanisms of direc-
     tion selectivity in primary auditory cortex”, J. Neurosci. 30, 1861–1868 (2010).
[55] P. Dayan and L. Abbott, Theoretical Neuroscience: Computational and Mathe-
     matical Modeling of Neural Systems (MIT Press, Cambridge, MA) (2001).
[56] E. L. Bienenstock, L. N. Cooper, and P. W. Munro, “Theory for the development
     of neuron selectivity: Orientation specificity and binocular interaction in visual
     cortex”, J. Neurosci. 2, 32–48 (1982).
[57] S. Song, K. Miller, and L. Abbott, “Competitive Hebbian learning through spike-
     timing-dependent synaptic plasticity”, Nature Neuroscience 3, 919–926 (2000).
[58] A. Saltelli, P. Annoni, I. Azzini, F. Campolongo, M. Ratto, and S. Tarantola,
     “Variance based sensitivity analysis of model output. Design and estimator for
     the total sensitivity index”, Comput. Phys. Commun. 181, 259–270 (2010).
[59] A. Saltelli, K. Chan, and E. M. Scott, Sensitivity Analysis (John Wiley & Sons,
     New York, NY) (2000).
[60] K. Chan, A. Saltelli, and S. Tarantola, “Sensitivity analysis of model output:
     Variance-based methods make the difference”, in IEEE WSC 1997, 261–268
     (IEEE Computer Society) (1997).

                                         141
[61] M. A. Ainslie and J. G. McColm, “A simplified formula for viscous and chemical
     absorption in sea water”, J. Acoust. Soc. Am. 103, 1671–1672 (1998).
[62] J. R. Barchi, J. M. Knowles, and J. A. Simmons, “Spatial memory and stereotypy
     of flight paths by big brown bats in cluttered surroundings”, J. Exp. Biol. 216,
     1053–1063 (2013).


                                        142
Chapter 6

Discussion, Applications, Future
Directions, and Concluding Re-
marks

6.1     Discussion
The research objectives of this dissertation were two-fold: 1) To improve our under-
standing of biosonar from an engineering perspective, and 2) to apply this perspective
toward the development of a bio-inspired broadband sonar system. To address the
first objective, Chapters 3 and 4 developed an advanced set of methods and tools for
studying bat echolocation. The second objective was realized in Chapter 5 through
computational modeling and simulation of broadband acoustic information that is
relevant to mammalian echolocation.
      Specifically, Chapter 3 addresses the need for new high-resolution time-frequency
techniques in the field of bio-acoustics. The bat’s auditory system itself encodes an
acoustic time-frequency representation (TFR) that is not hindered by the presence of
multiple harmonic components; however, many existing techniques we would consider
to have “high resolution” in fact suffer from cross-component interference or smearing
of energy across the time-frequency plane. These multi-harmonic waveforms highlight
many of the difficulties faced in existing TFR techniques and is one reason why the
short multi-harmonic pulses emitted by echolocating bats are frequently used as a


                                         143
gold standard for any new TFR or transform. The new approach in Chapter 3 ex-
ploits the recently developed fractional Fourier transform (FrFT) in order to separate
multiple harmonic components. A combination of signal processing ideas, such as
empirical mode decomposition (EMD) and Hilbert spectral analysis, are applied to
the individual components for the detailed analysis of biosonar signals.
     The developments in Chapter 3 were based on the original idea of analyzing
multi-component signals using the EMD due to its efficacy in analyzing signals from
other non-linear systems. Preliminary analysis of bat echolocation signals revealed
peculiarities in the EMD results, because the decomposed energy distributions con-
tained information from two separate harmonic components. To address this issue,
the ensemble EMD was used to process the signals without switching between harmon-
ics. The reliability of estimating instantaneous frequency using the FrFT also posed
a serious problem. New image processing algorithms were written that exploited the
ridge structure found in each harmonic in the rotation-fraction plane. Along with
these new methods, their utility was demonstrated by the large databases of signals
that were successfully processed. The aggregate of these research contributions were
published in the Journal of the Acoustical Society of America.
     Chapter 4 documents “Arrayzilla” – a high-density reconfigurable microphone
array for conducting studies of bats’ transmit beam patterns during echolocation
tasks. A considerable amount of thought and attention to detail went into the ar-
ray design, including everything from the original concept, through the electronics
design phase, to the many acoustic considerations. The mechanical framing was
designed in collaboration with SEEMS, LLC (Warwick, RI). Construction, testing,
and verification of the array required a substantial amount of assistance from many
graduate and undergraduate students in our lab. Alongside this hardware appara-
tus is a multi-element beam reconstruction method that utilizes the time-frequency
analysis techniques in Chapter 3 and includes new processing methods for correcting
frequency dependent transmission losses and microphone variability. The result of


                                         144
synchronously sampling the acoustic space with such high-fidelity and reconstructing
bats’ consecutive transmit beam patterns is unprecedented. Demonstrating the use-
fulness of this new measurement tool for studying bat echolocation required training
several animals to perform echolocation experiments in front of the array. This diffi-
cult and time consuming task and was led by Dr. Laura Kloepper and Michaela War-
necke. In summary, “Arrayzilla” has proven to be a novel approach to the problem of
bioacoustic beam pattern analysis and will remain an invaluable tool for conducting
echolocation studies in our lab.
     In Chapter 5, a numerical model of the physical acoustics is presented to
understand the rich set of information available in broadband acoustic pulses and
echoes. Through computational modeling and acoustic simulation, this study quan-
tifies the performance achievable by a bio-inspired broadband sonar system. The
first development reformulates localization of broadband echoes as a spectral pattern
matching problem. This alternative description as an echo classifier allows a simpli-
fied spectral distance metric to discriminate between echoes arriving from any spatial
direction. Other studies have used the Cramer-Rao lower bound (CRLB) [1, 2] or
information theory [3] to quantify localization performance, but no previous study
has been found that sufficiently addresses the problem of how angular resolution is
achieved with broad beam patterns. Adequate resolution is required for acoustic
imaging, by definition. Several hypotheses are made regarding how bats resolve mul-
tiple closely spaced echoes in angle, but this remains an open question. In addition
to characterizing the resolving power of this new imaging approach, this study is
the first known to examine sensitivity of broadband spectral localization to changing
environmental parameters; critical to the success of any practical bio-inspired broad-
band sonar system. Furthermore, the issue of frequency-dependent target strength is
not well studied in the literature. The work in Chapter 5 also provides a theoretical
examination of broadband target strength for a variety of objects. Together, these
developments show the feasibility of constructing and optimizing a micro-aperture


                                         145
broadband sonar system for potential future applications in air and underwater.


6.2       Applications

6.2.1     Multi-Component Signals and Time-Frequency Analysis

The multi-component analysis technique described in Chapter 3 has been applied to
a variety of bio-acoustic analysis problems. First and foremost, the beam pattern
reconstruction method in Chapter 4 relies upon this technique to automate the signal
analysis of each emitted echolocation call across hundreds of channels. Despite the
variability of emitted signals within and between individual bats, the technique proved
to be highly reliable and adaptable to the tens of thousands of signals captured.
        Multi-component analysis has also been used in investigations into the unique
timing patterns of strobe groups emitted by echolocating bats [4]. Several months
of data were recorded and collected during a target detection task from a stationary
platform. The harmonic components emitted by the bats were isolated and functions
for instantaneous frequency and amplitude were extracted. Statistical analyses of
strobe group timing (e.g. inter-pulse interval (IPI)) in relation to signal parameters
such as call duration, instantaneous energy, frequency span, etc. was based on the
information made available through multi-component analysis.
        Although the multi-component analysis techniques developed in this disserta-
tion were originally developed for the signals of Eptesicus fuscus and related species
of bats, the approach is applicable to a host of alternative signal types beyond echolo-
cation or even acoustics. Neurophysiological data were collected from E. fuscus using
an electrode inserted into the cochlear nucleus [5]. Linear FM pulses were played
through a loudspeaker directed at the anesthetized animal, received through the ex-
ternal ears, and transduced into electrical signals by the piezoelectric effect of the
cochlea’s inner hair cells. The signal received at the electrode was proportional to the


                                          146
amplitude of the acoustic pulse and the majority of the signal was surprisingly found
to be in-phase with little distortion effects or group delay as might be expected for a
mechanical traveling wave.


6.2.2     Beam Pattern Measurement Instrumentation and Techniques

A large database of in-flight echolocation calls and post-processed flight tracks has
been aggregated throughout the course of many different obstacle avoidance experi-
ments in clutter. These experimental data were collected in a controlled flight-room
in our laboratory using ultrasonic audio instrumentation and thermal infrared stereo-
scopic cameras. The beam reconstruction methods developed in Chapter 4 are provid-
ing new uses for these data, namely the estimation of in-flight beam patterns using the
sparse array of 24 wall-mounted microphones (the original microphone preamplifier
circuit boards that were later redesigned for use on the large reconfigurable array).
        This method has already been used by several colleagues to estimate beam width
of new flight-room data [6]. One of several big brown bats, E. fuscus, were released at
the start of a long corridor of hanging plastic chains, which served as dense acoustic
clutter. Animals were localized and tracked using their emitted acoustic pulses as
they flew through the corridor. The width of the narrow channel changed to one
of four settings for each experimental trial: 40, 70, 100, and 140 cm. The beam
measurement methods were used to determine if and how the beam shape changed
as it progressed through the dense chain array for each corridor width. These data
show that transmitted beam widths remained relatively fixed throughout various
clutter conditions; however, the total amplitude of the emitted beams were reduced by
several decibels at the most difficult setting. This reduction in signal intensity could
potentially be an active strategy for reducing the levels of received reverberation
backscatter. These pilot experimental results are to be followed up with a more
rigorous study using “Arrayzilla.”


                                          147
6.3       Future Directions

6.3.1     Time-Frequency Analysis of Bio-Acoustic Signals

There are many different ways to employ the FrFT to extract time-frequency infor-
mation. Capus et. al developed a short-time FrFT whereby a sliding window was
used in a similar manner to the STFT [7]. The principle idea was to recreate a
higher-resolution TFR by finding an optimal FrFT rotation angle, α, at each time
instant and displaying |F rF T (α, t)|2 for that column. This works well for FM mono-
component signals; however, multi-component signals are not guaranteed to have one
optimal rotation for each point in time. Furthermore, the use of a sliding time window
spreads energy over time and may not reliably predict instantaneous amplitude.
        A logical extension to the FrFT-based technique presented in Chapter 3 is cur-
rently under development to overcome these limitations. An implicit assumption is
made that each component is an approximately linear FM within a short time win-
dow. Rather than constructing a global rotation-fraction plane for the entire signal,
short time windows are applied in the usual way to analyze overlapping segments of
the signal. For each time segment, individual components are identified using local
maxima in the rotation-fraction plane. Therefore, no restriction is made on having a
single optimal rotation angle, α. Instead, each component has its own unique slope
and amplitude at one particular time instant. These line segments are matched to the
nearest component in the neighboring time instants. In this way, an improved TFR
for each component may be constructed by plotting |F rF T (α, t)|2 , or alternatively
each component may be isolated using the time-variant filter method in Chapter 3.
        This new approach may be more appropriate for cases where the signals and
number of components are unknown a priori, because fewer parameters are required
to automate the process. A potential downside to this approach is that it is com-
putationally more expensive and may preclude real-time operation until computing
speed improves or more efficient FrFT methods are developed.

                                          148
        The field of time-frequency analysis is considered to be fairly mature; however,
currently no single method exists that can simultaneously overcome all of the problems
associated with existing methods. Researchers do recognize this shortfall and biosonar
signals such as the bat’s echolocation pulse will remain the gold standard with which
to compare performance. Looking to the mammalian brain, and the auditory system
in particular, will inevitably inspire new ideas and approaches on this front.


6.3.2     Acoustic Measurement and Visualization of the Multi-Dimensional
          Sound Field

Research with the large microphone array is ongoing and future experiments will likely
dictate new requirements for the array hardware and signal processing algorithms.
This may include non-uniform arrangements, integration with other tools, or increased
acoustic sensitivity for other species. Fortunately, the array design was intended to
be modular and innovations with micro electro-mechanical systems (MEMS) sensor
technology can be taken advantage of by upgrading the microphone circuit boards
with these new devices. For example, the current generation of Knowles MEMS
ultrasonic microphones (SPU0410LR5H, Knowles Acoustics, Itasca, IL) introduced
a back-mounted “zero-height” packages where the devices are mounted on the non-
acoustic side of the board and acoustic sensing is performed through a drilled out hole
in the printed circuit board. Drilling with a sufficiently small diameter guarantees an
omni-directional beam pattern in the front hemisphere since there are no protrusions
obstructing the sound path at steep angles. Aside from acoustic directivity, MEMS
microphones will continue to improve their frequency response to the point that little
or no correction is necessary. Furthermore, it seems feasible that coupling MEMS
accelerometers with the existing ultrasonic pressure sensors could result in new types
of vector sensors – opening a new door in bio-acoustics research.
        Beyond any future improvements to the microphone hardware and the beam
reconstruction algorithms, it is our hope that “Arrayzilla” will have broader impact.


                                           149
This high-density, modular approach to sensing and visualizing the multi-dimensional
sound field may inspire other researchers to create their own innovative measurement
systems. Some common scientific applications that could benefit from these ideas
include beam pattern measurements of underwater marine mammals, conference room
acoustics for future-generation business communications, and acoustic holography
measurements of machinery noise in the near-field.


6.3.3     Bio-Inspired Broadband Sonar for Micro-Aperture Imaging

Toward the development of a bio-inspired broadband sonar system, the physical prin-
ciples and ideas developed in Chapter 5 need to be demonstrated experimentally in a
real acoustic environment. Before finding some optimal method to extract broadband
information for acoustic imaging, we need to better understand how the system might
fail to produce accurate or reasonable results. The potential difficulty with uncer-
tainties in environmental parameters is addressed to some extent through sensitivity
analysis in Chapter 5. A more serious problem is that for this system to be imple-
mented successfully, it must appropriately handle multiple concurrent echoes. We
have addressed the problem of mutual-interference by two scatterers, but operating
in real environments warrants a more statistical view of the problem. For example,
is there a probabilistic pattern in the pulse-to-pulse spectral interference of multiple
scatterers that can be used in deconvolving the echoes? The solutions to these prob-
lems lie in further understanding the neural networks of the bat’s auditory system.
Echolocating animals have proven that this method of broadband imaging does work
and that solutions to these difficulties already exist. It is only a matter of finding how
biosonar solves the problems and what we might do to replicate and improve upon
their solutions.
        This mode of acoustic sensing represents a significant departure from what is
current practice in array signal processing, whereby improving resolution traditionally
requires higher operating frequencies or larger array apertures. The implication for

                                           150
demonstrating a bio-inspired broadband sonar is that array hardware only requires a
handful of sensing elements and supporting electronics which results in a system that
is orders of magnitude more compact. The bio-inspired approach to acoustic sensing
effectively transfers the complexity of acoustic imaging from the physical hardware
into a signal processing domain that continues to become smaller, faster, and more
affordable with time.
      A surprising result of the modeling work in Chapter 5 was that broadband
acoustic imaging does not necessarily require the complex acoustic baffle structures
found in many echolocating animals; simple piezoelectric elements may meet perfor-
mance objectives by properly designing and orientating beam patterns. This finding
may ease the transition of bio-inspired sonar techniques into the realm of man-made
sonar imaging and sensing systems.


6.4     Concluding Remarks
The research contained in this dissertation represents a bottom-up approach to biomimetic
design. Many engineers and physicists attempt to force existing signal processing so-
lutions onto an explanation of how bats achieve bio-acoustic imaging. The bat’s sonar
imaging process is a complex system that requires a thorough examination of what
salient information survives the acoustic-to-neural transduction from sound emission
to reception and interpretation. There exists a vast body of research that was lever-
aged to understand the fundamental mechanisms of biosonar. Evidence in the form
of neurophysiological studies of the bat’s brain and empirical data from behavioral
experiments serve to piece together the echolocation story. As these individual pieces
come together they will lead to a new unified approach to acoustic imaging.
      Despite decades of research in this area, there are still many important ques-
tions about biosonar that remain unanswered. For example, as mentioned in Chap-
ter 3, how does a slight perturbation of the time-frequency structure disambiguate


                                         151
the barrage of overlapping echoes when all the signals and echoes remain highly cor-
related [8]? Another mystery lies in the dynamics of beam patterns. We are now
developing an understanding of the interplay between broadband beam patterns and
complex targets, but how does a rapidly changing beam pattern assist in forming
images [9]? Even more intriguing is the extreme tolerance to interference and jam-
ming [10, 11]. How can bats fly in extremely dense clutter within close proximity to
tens or even hundreds of other bats, all simultaneously operating with nearly identi-
cal sonar signals? The answers to these questions lie in part with future technologies
for imaging the bat’s brain, but perhaps most importantly in the creativity of future
behavioral and neurophysiological experiments with the animals.
      A more generalized question is how does a bat’s neural information processing
differ from their underwater echolocating mammalian counterparts. As stated by
Whitlow Au in “The Sonar of Dolphins” [12, Ch. 11]:

         There are many obvious differences—in fact, hardly any similarities—
     between bats and dolphins in general. [...] A seemingly endless list of
     differences between the two classes of animals can be compiled, compared
     with only a few similarities, among those being that both are mammals
     and echolocators. Although a common and important sonar function of
     both animals involves the capture of prey, there are large differences in
     the physical characteristics and behavior of prey types as well as in the
     environment they inhabit. Therefore it would not be surprising to find
     vast differences in the functioning, characteristics, and capabilities of the
     two sonar systems.

Despite the obvious physical differences between these classes of animals and their
respective environments, all mammals have the same basic organization of neural
structures in the brain. Specializations do exist and are quite frequently found across
species; however, the fundamental physics of acoustic wave propagation does not
drastically change across the air-water boundary aside from the differences mentioned
in Chapter 5. With this point of view, the broadband acoustic information available
to all echolocators looks remarkably alike. Therefore, how this broadband acoustic
information is used by these different classes of echolocators may be more similar

                                         152
than we ever realized.


References
 [1] R. M¨uller, H. Lu, and J. Buck, “Sound-diffracting flap in the ear of a bat gener-
     ates spatial information”, Phys. Rev. Lett. 100, 108701 (2008).
 [2] R. Altes, “Angle estimation and binaural processing in animal echolocation”, J.
     Acoust. Soc. Am. 63, 155–173 (1978).
 [3] D. Vanderelst, J. Reijniers, F. Schillebeeckx, and H. Peremans, “Evaluat-
     ing three-dimensional localisation information generated by bio-inspired in-air
     sonar”, IET Radar Sonar Navig. 6, 516–525 (2012).
 [4] L. N. Kloepper, J. E. Gaudette, J. R. Buck, and J. A. Simmons, “Influence of
     mouth opening and gape angle on the transmitted signals of big brown bats,
     Eptesicus fuscus”, J. Acoust. Soc. Am. in prep. (2014).
 [5] J. Knowles, J. A. Simmons, J. Barchi, J. E. Gaudette, S. S. Horowitz, and A. M.
     Simmons, “Cochlear processing in biosonar: Modeling sound transduction and
     the cochlear microphonic in echolocating bats”, in Society for Neuroscience, 1–1
     (Washington, DC) (2011).
 [6] I. Matsuo, A. R. Wheeler, L. N. Kloepper, J. E. Gaudette, and J. A. Simmons,
     “3D acoustic tracking of bats in clutter environments from microphone arrays”,
     in Acoustics, 1–1 (Tokyo, Japan) (2013).
 [7] C. Capus and K. Brown, “Short-time fractional Fourier methods for the time-
     frequency representation of chirp signals”, J. Acoust. Soc. Am. 113, 3253–3263
     (2003).
 [8] S. Hiryu, M. E. Bates, J. A. Simmons, and H. Riquimaroux, “FM echolocating
     bats shift frequencies to avoid broadcast-echo ambiguity in clutter”, Proc. Natl.
     Acad. Sci. U.S.A. 107, 7048–7053 (2010).
 [9] L. Gao, S. Balakrishnan, W. He, Z. Yan, and R. M¨      uller, “Ear deformations
     give bats a physical mechanism for fast adaptation of ultrasonic beam patterns”,
     Phys. Rev. Lett. 107, 214301 (2011).
[10] M. E. Bates, S. A. Stamper, and J. A. Simmons, “Jamming avoidance response
     of big brown bats in target detection”, J. Exp. Biol. 211, 106–113 (2008).
[11] M. Warnecke, M. E. Bates, V. Flores, and J. A. Simmons, “Spatial release from
     simultaneous echo masking in bat sonar”, J. Acoust. Soc. Am. 135, 1–9 (2014).
[12] W. W. Au, The Sonar of Dolphins (Springer, New York) (1993).


                                         153
Appendix A

Modeling of Precise Onset Spike
Timing for Echolocation

Abstract
This Appendix describes a biophysical model of the echolocating bat’s auditory pe-
ripheral system and cochlear nucleus to explore the timing precision of echo infor-
mation within the bat’s brainstem. In particular, this study focused on the Meddis
auditory model and a recurrent network of integrate-and-fire coincidence detection
neurons. Many details of the bat’s primary ascending auditory system have yet to
be understood; however, included here is an attempt at simulating the critical parts
of the peripheral auditory stage and early neuronal transduction that confer precise
timing of pulses and echoes. Results of this modeling study were presented at the
Acoustical Society of America [1].


A.1      Motivation for a Biophysical Model
Auditory computational models are not a new concept. On the contrary, they have
been around for at least as long as computing power has been available [2, 3]. The
current state-of-the-art in auditory system modeling includes detailed implementa-
tions based on the biophysical processes they are trying to mimic. Detailed models of
the mechanical-to-neural sound transduction now exist and can successfully account

                                        154
for a vast array of psychoacoustic behavior [4, 5, 6, 7, 8, 9, 10]. Auditory processing
in the bat is also beginning to be understood [11, 12, 13, 14, 15], but physiological
work on echolocating animals does require more emphasis to understand these highly
specialized systems.
      At present, existing computational biosonar models of bats use trivial functions
for auditory neural transduction and processing. Many attempts at creating a com-
putational model of bat echolocation end up looking more like a typically engineered
sonar signal processing system than the biological “wetware” they sought to mimic.
For instance, the head related transfer function (HRTF) and external/middle ear fre-
quency shaping characteristics that are essential for localizing sound in azimuth and
elevation are often neglected in lieu of using interaural timing difference (ITD) as the
primary horizontal cue. Another common problem of existing echolocation models is
the over-use of the classical matched-filter (a.k.a. replica correlation). Furthermore,
the spike generation process is typically either removed completely, or performed ad
hoc without considering the detailed biological processes. The one biological aspect
of bat echolocation models that appears consistent amongst researchers is segmenting
the incident sound channel into multiple frequency channels or bins using 1) a linear
or non-linear filter bank, 2) the short-time Fourier transform (i.e. spectrogram), or
3) an alternative time-frequency representation commonly used in signal processing
systems.
      There is a very good reason for these simplifications to occur. As fast as they are,
the computational power of today’s digital signal processors cannot feasibly calculate
all of the differential equations necessary to replicate how the brain functions on a
large scale. When there are additional time restrictions on computations, calculations
must be either simplified or farmed out in parallel as they are in the brain. Therefore,
over-simplification of the biological processes in conventional computing is necessary
to implement many signal processing auditory models.
      The additional demand for real-time echolocation processing will inevitably


                                           155
require 1) an exceedingly fast microprocessor or 2) a massively parallel grid of com-
putations. In either case, computations are expensive and designers must carefully
trade off performance with power, size, and cost. It is therefore essential to model
and understand the critical processes of bat echolocation before making any attempts
to replicate the underlying processing.


A.1.1     Coincidence Detection and Population Coding in the Auditory Sys-
          tem

The auditory system consists of a typical structure amongst all mammals, non-
echolocating and echolocating alike. Therefore, the differentiating factors that enable
echolocation appear to lie in the precise details of the brain’s neural networks and
connectivity. With the exception that a bat’s cochlea operates up to two octaves
higher in frequency than humans and many other mammals, the peripheral sensory
organ appears to function identically in response to auditory stimuli. We must then
ask the question, why can’t a guinea pig or other mammal1 learn to echolocate? Fig-
ure A.1 shows typical coincidence detection behavior of bushy cells located in the
anteroventral cochlear nucleus (AVCN) of a rat. Here, timing jitter is reduced down
to approximately 300 µs (less than the width of a single neural spike) using population
coding techniques employed in the cochlear nucleus.
       Bats’ ability to discriminate differences in shape down to 2 µs delay (millimeter
precision) and 20 ns in phase has been criticized as being infeasible given the amount
of timing jitter still present in the auditory system, even after population coding. The
existence of gap junctions in the CN of bats may be an answer to this unexplained
phenomenon. As an initial examination to this possibility, I created a computational
model of the well studied Bushy cells in the AVCN of mammals. Since parameters for
the Meddis IHC model have not yet been determined, the guinea-pig parameters for
    1
      There have been rare cases where blinded humans have developed the basic ability to echolocate using
clicks, similar to dolphins. One hypothesis for this surprising adaptation might be that learning to use
broadband sound to echolocate at an early developmental stage for the brain causes a significant change in
functional organization. Perhaps electrical synapses, or gap junctions, in these humans are retained from
early development for use in echolocation.


                                                  156
Figure A.1. Action potentials recorded from a rat when presented with a low frequency sinusoidal
stimulus. (left) The ANC (auditory nerve) and AVCN (bushy cell) exhibit a significant reduction
in timing jitter due to populations of coincidence detection neurons. (right) Post-stimulus time
histograms (PSTH) relative to stimulus peaks showing the distribution of neural spikes relative to
the sinusoidal stimulus. Adopted from [16, p. 110].


medium spontaneous-rate (MSR) fibers are used as in Reijniers and Peremens [17]. If
a guinea-pig is capable of echolocation, the timing information of FM sweeps produced
by this model would be sufficient for sub-millisecond discrimination. Rejecting this
null hypothesis will be an important step toward proving bats must have and use
sharper timing information in the brain.


                                               157
A.2      Methods
The proposed model architecture is shown in Figure A.2, below. The input signal
consists of synthetic generated FM sweeps comparable to the rate of bat calls. The
monaural signal enters the cochlear block where it is split into N different frequency
channels using a Gammatone filter-bank [9]. The basilar membrane (BM) movement
at each channel is then individually passed through the IHC peripheral model [4, 5],
which is described in more detail below. The output of the model is essentially a
probability of firing a spike on each auditory fiber. Emulating a realistic number of
auditory fibers (30:1 ANC-to-IHC ratio) is easily accomplished by generating as many
random sequences as ANC on the same IHC spiking probability.


     Figure A.2. Proposed neural network architecture of the auditory population coding.


      The random spike processes generated in the peripheral system are then con-
nected to the cochlear nucleus block, which consists of a line array of M leaky
integrate-and-fire (IaF) bushy cells. These cells are innervated by a tonotopic distri-
bution of auditory fibers, overlapping in input frequency by an unknown amount. The
neurons can be connected in a strictly feedforward manner or using any configuration

                                            158
of recurrent topology for experimentation with the model. Here, it is assumed that
there are approximately 200-500 of such cells in the bat’s AVCN. The output from
each IaF neuron consists of a discrete spike train representing a reduction of spike
timing jitter.


A.2.1     Peripheral System

A.2.1.1     Outer and Middle Ear

The sound waves passing through the outer ear (OE) and middle ear (ME) are shaped
by both, mechanical damping and resonance. Since this system is entirely mechanical,
we use a linear time-invariant filter to model the band-pass frequency shaping effects.
        This model uses the following set of cascaded band-pass filter responses for the
OE and ME model, following Reijniers and Peremens [17]:

   • 2nd order filter, f3dB = (4 kHz – 80 kHz)

   • 3rd order filter, f3dB = (700 Hz – 100 kHz)


A.2.1.2     Cochlea and Basilar Membrane

Mechanical vibration of sound projected onto the basilar membrane (BM) is modeled
using a gammatone linear filter-bank. The output from the filter-bank simulates
the vibration of the BM at each of the N inner hair cells. In most mammals, N is
approximately 1,000 cells, but has never been quantified in the bat. The gammatone
filter type has been shown to provide a fairly accurate tuning curve for the inner hair
cells (IHC) for many mammals [9]. Computational limitations prevent simulation of
the thousands of hair cells in the cochlea, so the number of channels is limited to
approximately 100. The bandwidth of each IHC overlaps the nearest neighbors such
that a subset of channels (10 here) can be substituted with little loss of information
[17].


                                           159
A.2.1.3   Meddis Auditory Peripheral Model

The Meddis [4, 5] auditory model of basilar membrane movement to inner hair cell
(IHC) neurotransmitter release is used for the first stage of neural spike generation.
There are 3 differential equations and 9 parameters that define the dynamical be-
havior. Modifying these parameters, it is straightforward to create high, medium,
and low spontaneous-rate (HSR, MSR, and LSR) nerve fibers as demonstrated by
Sumner et. al. [7, 18]. The output of each IHC frequency channel passed through
the Meddis model is the available amount of neurotransmitter between the IHC and
auditory nerve cell (ANC) synaptic cleft. Using a uniform random process, U(0, 1),
we can create the spikes given a probability of spike occurrence dependent upon past
and present stimuli.
     The time-varying movement of the BM, S(t), is the stimulus that directly affects
the membrane permeability, k(t), in a nonlinear manner by applying the equation

                                 
                                     g[S(t)+A]
                                 
                                     S(t)+A+B
                                                 for S(t) + A > 0
                        k(t) =                                                     (A.1)
                                       0        otherwise

Parameters g, A, and B can be set to increase or decrease this non-linear compression
to achieve various dynamic ranges, maximum rate, and spontaneous spiking rates.


          Figure A.3. Block diagram of the Meddis IHC model. Adapted from [4, 5]


     From Figure A.3 we see that IHC neurotransmitter (i.e. glutamate) is man-
ufactured by the factory on demand until full. The neurotransmitter available for

                                             160
release, q(t), is transferred through the cell membrane into a synaptic cleft at a rate
proportional to the cell permeability, k(t). q(t) is defined by the differential equation


                       dq
                          = y(M − q(t)) + x · w(t) − k(t) · q(t)                   (A.2)
                       dt

where M is the maximum amount of neurotransmitter available for uptake at any
given time, y is the neurotransmitter production constant, w(t) is the reprocessing
store for released neurotransmitter, and x is the constant that determines release rate
of reprocessed neurotransmitter. The amount of neurotransmitter in the reprocessing
store is also a differential equation defined as


                                dw
                                   = r · c(t) − x · w(t)                           (A.3)
                                dt

where r is the rate constant of neurotransmitter re-uptake and c(t) is the total amount
of neurotransmitter in the synaptic cleft. Lastly, c(t) is defined as


                           dc
                              = k(t) · q(t) − l · c(t) − r · c(t)                  (A.4)
                           dt

and the only new parameter is l, which is the amount of neurotransmitter lost.
      The intuition behind the Meddis model is that some transmitter is eventually
lost into the surrounding fluid and the remaining amount is reabsorbed for repro-
cessing. The reprocessing store acts as a temporary cache that releases broken down
neurotransmitter molecules at a rate, x · w(t), so that it reflects the longer time delay
before transmitter can be reused. The probability of firing a spike is directly propor-
tional to the amount of neurotransmitter currently in the synaptic cleft, c(t). h is the
number of vesicles in the synapse and mathematically is just a probability multiplier.


A.2.1.4   Spike Refractory Equations

Sumner et al. [7, 18] modeled the refractoriness in the inner hair cells by modifying
the probability of an action potential by setting

                                            161
                         
                          1 − cτ e−(t−tl −RA )/sτ for (t − tl ) ≥ RA
                  p(t) =                                              .           (A.5)
                                    0             otherwise

RA is an absolute refractory period, cτ is the maximum amount of relative refrac-
toriness, and sτ is the exponentially decaying time constant for refractoriness. In
addition, t is the current time instant and tl is the time of the last spike. This model
replicates this method using the same parameters as in Sumner (2002), with RA =
0.75 ms, cτ = 0.55, and sτ = 0.8 ms.


A.2.2     Cochlear Nucleus

A.2.2.1    Leaky IaF Model

Due to the nature of the frequency modulated (FM) sweeps used by the bat, it
would be impossible to implement a classical feedforward or recurrent firing-rate
neural network model [19] with neurons producing an average of only 1.2 spikes per
echo [15]. Therefore, a model is required that accounts for a reasonable amount of
accuracy in the neuronal spiking process. A good trade-off between model accuracy
and complexity is the well known leaky IaF neuron model [19].
      This model tracks the sub-threshold membrane potential of neurons by inte-
grating the excitatory and inhibitory post-synaptic potentials (EPSPs and IPSPs)
over time, and triggers a spike when the membrane potential reaches a predefined
threshold. After a spike occurs, the membrane potential is set to an optional reset
level below resting potential. This model requires three differential equations and
two update equations. The membrane potential, V , is updated with an exponential
decay time constant of τm as


                    dV
               τm      = Vrest − V + gex (t)(Eex − V ) + gin (t)(Ein − V )        (A.6)
                    dt

where Vrest is the inactive resting potential, Eex and Ein are the excitatory and in-


                                           162
hibitory presynaptic potentials, and gex and gin are the excitatory and inhibitory
synaptic conductances. The conductance is further defined by the simple differential
equations


                                       dgex
                                   τex      = −gex                               (A.7)
                                        dt
                                       dgin
                                   τin      = −gin                               (A.8)
                                        dt

and


                                 gex → gex (t) + ∆gex                            (A.9)

                                 gin → gin (t) + ∆gin                           (A.10)


which relax exponentially with time constants τex and τin . Through this set of differ-
ential equations, excitatory and inhibitory presynaptic events will trigger a temporary
increase of the two independent synaptic conductances.
      This model expanded the LiF equations to include recurrent synaptic connec-
tions between neurons in the same stage. This required adding an additional excita-
tory synaptic transconductance, gre , with time constant, τre , as implemented just as
in the equations for gex and gin above. When an LiF neuron fires an action poten-
tial, an immediate increase in conductance is added to the recurrent synapses of all
neurons within some distance, D. Matrix, R, was used to describe synaptic weights
from the M neurons to each of the other M neurons. For example, Rij = 1 refers to
a strong excitatory synapse from neuron j to neuron i, such that all action potentials
generated by i will cause an excitatory post-synaptic potential (EPSP) in j. Likewise,
Rij = −1 refers to a strong inhibitory synapse from neuron j to neuron i, such that all
action potentials generated by i will cause inhibitory post-synaptic potential (IPSP)
in j. To avoid artificial positive feedback, a self-connected neuron (i = j) should be

                                         163
avoided.


A.3        Results

A.3.1      Auditory Stimuli

The signals used in this model can generally consist of a synthetic or recorded bat
echolocation transmission followed by 1 or 2 overlapping echoes, which constitute a
“glint.” Figure A.4 shows the signals used for this particular example simulation.
Note that overlapping FM waveforms exhibit spectral interference as a function of
the delay (30 µs and 60 µs for the second and third signal, respectively).


Figure A.4. Time series (left) and spectrogram (right) of a synthetic linear FM and 2 pairs of
echoes. Note that the echoes have spectral notches introduced by the spectral interference pattern
that correspond to two echoes spaced apart by only 30 and 60 µs, respectively.


      The cochlear model consists of the constant bandwidth gammatone filter bank.
This filter bank creates a time-frequency representation that mimics how the cochlea
splits up incident sound into multiple narrow-band channels. For demonstration, Fig-
ure A.5 shows the magnitude and phase response of four of these filter channels. The
time series signals in Figure A.6 are generated by running the signal from Figure A.4
through this filterbank. Note that the filterbank preserves the linear frequency mod-
ulated (LFM) sweep rate of the synthetic chirp.


                                               164
A.3.1.1   Meddis Auditory Peripheral Model

The results from the Meddis inner hair cell (IHC) model are shown in Figure A.7
for the same signal and filter-bank shown in the last section. k(t), the membrane
permeability rises sharply as a function of the signal amplitude. This allows release
of neurotransmitter from the free pool, q(t), into the synaptic cleft, c(t). Lastly,
neurotransmitter is reprocessed, w(t), before it is placed back into the free pool.
      The output generated from the Meddis model is a matrix of spike trains. This
model allows for a fanout for similar channels. In Figure A.7 there are 4 frequency
channels and a fanout of 10, creating 40 nerve fibers that are passed on to the cochlear
nucleus stage.


A.3.1.2   IaF Neurons

The cochlear nucleus stage uses leaky IaF neurons with feedforward or recurrent
synaptic connections. The spike generation is highly dependent on the amount and
activity levels of presynaptic potentials. Parameters can be adjusted to obtain the
correct amount of sensitivity for inputs, though. Figure A.9 demonstrates the case
for 4 independent feedforward neurons accepting the presynaptic input shown in
Figure A.8. As we can see, the spike timing is fairly accurate, but additional model


Figure A.5. Magnitude and phase plot of 4 channels in a gammatone filterbank between 25kHz
and 100kHz.


                                           165
validation needs to be performed before any statistical tests can be performed on
these results.
       The effect of recurrence on the network is largely dependent upon the amount
of synaptic input from the auditory nerve stage. If the neurons are firing regularly
due to synaptic bombardment, then recurrent synapses will not have a large effect.
However, when the synaptic input is low enough to cause irregular spiking patterns,
then recurrent connections can cause spikes to fire that would otherwise depress (see
Figure A.10 for an example of this).


A.3.1.3     Integration with BiSCAT

BiSCAT is a biosonar simulation tool developed primarily in MATLAB. Its intended
use is to quickly and efficiently compare various monaural and binaural auditory
processing models and parameters at various stages in the auditory processing stages.
Figure A.11 shows the layout of three panels of the GUI.


A.4        Discussion
The auditory system is one of the few places in the brain where the role of precise
timing is definitive. Acoustic waves are naturally abundant, but typically carry in-


Figure A.6. Example gammatone filterbank output using the signal as shown above and generated
at 4 arbitrary frequencies. (left) Entire time sequence and (right) close-up of transmission signal at
100 ms.


                                                 166
Figure A.7. Internal states of the Meddis model (k, q, c, & w) in response to the stimulus in
Figure A.4.


formation in relatively sparse packets. For animals to extract any useful information
from sound, the auditory system must therefore respond within a very short time
scale. Auditory neurons, particularly in the CN, are tuned to extremely small time
windows on the order of one-tenth the width of an individual spike. For these rea-
sons, the auditory systems of mammals, birds, and insects are ideal animal models to


   Figure A.8. Pspike (left) and resulting spike train (right) for 40 LSR auditory nerve fibers


                                               167
explore the function and dynamics of spiking neural networks.
      The model presented in this chapter shows an example of coincidence detection
neurons based upon a detailed biophysical model of the cochlea and a subsequent net-
work of IaF neurons with locally recurrent connections to adjacent frequency channels.
This model is oversimplified in several ways. First, there are numerous tunable pa-
rameters in the Meddis model that adjust the sensitivity of auditory nerve cells and
membrane permeability. These parameters have never been measured in bats, so the
model is prone to error and over-fitting. The Heil model requires less parameters than
the Meddis model, yet still encodes the onset response of acoustic waves [20, 21]. This
approach has been taken by Reijniers and Peremans [22]; however, this study focused
on passive localization cues with a general mammalian anatomy rather than active
echolocation.
      Another important simplification is the use of IaF neurons in the subsequent
neural network layer. These models cannot account for realistic spiking neurons,
because the biological neuron dynamics are absent in the basic set of differential


                Figure A.9. Membrane potential (and spikes) with 4 IaF neurons.


                                             168
equations governing subthreshold behavior. By replacing the IaF network with an
Izhikevich model [23, 24], a broader range of spike timing behavior can be accounted
for. One of the benefits of the Izhikevich model is that, unlike the biophysical Hodgkin
and Huxley model, the exact concentration and type of chemical neurotransmitters
are unnecessary to know. Instead, the realistic dynamics of auditory neurons can
be measured empirically using patch-clamp methods on a sub-population of neurons
in the CN and elsewhere in the auditory brainstem and translated directly to the
neural network in the form of generalized parameters to the reduced-order differential
equations. Even without physiological measurements, the Izhikevich neural network
is a more appropriate model than IaF to study neural information processing via
precise onset spike timing.
       The interconnectivity of neurons within the CN is highly complex and the func-
tion of each type of neuron has yet to be identified [25]. Although octopus and bushy
cells have been shown to perform coincidence detection, the degree of connectivity
within and across broader frequency bands (e.g. T-stellate and multipolar cells in
the ventral CN) is unknown at this time. The morphology and associated computa-
tional function becomes even more convoluted within the IC, because many different
afferent and efferent fibers connect through this neural complex [26, 14]. This model


Figure A.10. IaF neurons (M=4) with random, but overlapping synaptic input (N=100). (left)
With a strictly feedforward network, only one neuron fires an AP. (right) In this excitatory recurrent
network, all three neighboring neurons receive a strong EPSP from the first neuron, which is just
enough to cause APs in two of the three cells. Simulation was performed using identical pseudo-
random generator seeds.


                                                 169
assumes a very primitive form of a coincident detection network of homogenous cells.
Although cells are frequency selective due to the bandpass filter bank mimicking the
cochlea, no automatic training is performed on the cells’ synaptic weights. Even
the most realistic neuron model must have realistic connections. This includes the
degree of connectivity within and between the various cell layers, as well as an accu-
rate description of dendritic delay (i.e. delay sensitive octopus cells). As techniques
for probing the cell morphology improve, so too will the capability of computational
neuroscience to replicate these auditory networks in simulation and silicon.


References
 [1] J. E. Gaudette and J. A. Simmons, “Modeling of precise onset spike timing for
     echolocation in the big brown bat, Eptesicus fuscus”, J. Acoust. Soc. Am. 127,
     1861 (2010).
 [2] J. C. R. Licklider, “A duplex theory of pitch perception”, Experientia 7, 128–134
     (1951).
 [3] P. Joris, P. Smith, and T. Yin, “Coincidence detection minireview in the auditory
     system: 50 years after Jeffress”, Neuron 21, 1235–1238 (1998).
 [4] R. Meddis, “Simulation of mechanical to neural transduction in the auditory
     receptor”, J. Acoust. Soc. Am. 79, 702–711 (1986).
 [5] R. Meddis, “Simulation of auditory-neural transduction: Further studies”, J.
     Acoust. Soc. Am. 83, 1056–1063 (1988).
 [6] M. Hewitt and R. Meddis, “An evaluation of eight computer models of mam-
     malian inner haircell function”, J. Acoust. Soc. Am. 90, 904 (1991).
 [7] C. Sumner, E. Lopez-Poveda, L. O’Mard, and R. Meddis, “A revised model of the
     inner-hair cell and auditory-nerve complex”, J. Acoust. Soc. Am. 111, 2178–2188
     (2002).
 [8] C. J. Sumner, R. Meddis, and I. M. Winter, “The role of auditory nerve inner-
     vation and dendritic filtering in shaping onset responses in the ventral cochlear
     nucleus”, Brain Res. 1247, 221–234 (2009).
 [9] E. Lopez-Poveda, “Spectral processing by the peripheral auditory system: Facts
     and models”, Int. Rev. Neurobiol. 70, 7–48 (2005).


                                         170
[10] R. Meddis, “Auditory-nerve first-spike latency and auditory absolute threshold:
     A computer model”, J. Acoust. Soc. Am. 119, 406–417 (2006).
[11] S. Haplea, E. Covey, and J. Casseday, “Frequency tuning and response latencies
     at three levels in the brainstem of the echolocating bat, Eptesicus fuscus”, J.
     Comp. Physiol. A 174, 671–683 (1994).
[12] S. Dear and N. Suga, “Delay-tuned neurons in the midbrain of the big brown
     bat”, J. Neurophysiol. 73, 1084–1100 (1995).
[13] H. L. Hawkins, T. A. McMullen, A. N. Popper, and R. R. Fay, eds., Auditory
     Computation, volume 6 of Springer Handbook on Auditory Research (Springer,
     New York) (1995).
[14] M. Sanderson and J. Simmons, “Neural responses to overlapping FM sounds
     in the inferior colliculus of echolocating bats”, J. Neurophysiol. 83, 1840–1855
     (2000).
[15] M. Sanderson and J. Simmons, “Target representation of naturalistic echoloca-
     tion sequences in single unit responses from the inferior colliculus of big brown
     bats”, J. Acoust. Soc. Am. 118, 3352–3361 (2005).
[16] A. R. Moller, Hearing, Anatomy, Physiology, and Disorders of the Auditory
     System, 2nd edition (Academic Press, Burlington, MA) (2006).
[17] J. Reijniers and H. Peremans, “On population encoding and decoding of auditory
     information for bat echolocation”, Biol. Cybern. 102, 311–326 (2010).
[18] C. Sumner, E. Lopez-Poveda, L. O’Mard, and R. Meddis, “Adaptation in a
     revised inner-hair cell model”, J. Acoust. Soc. Am. 113, 893–901 (2003).
[19] P. Dayan and L. Abbott, Theoretical Neuroscience: Computational and Mathe-
     matical Modeling of Neural Systems (MIT Press, Cambridge, MA) (2001).
[20] P. Heil and D. Irvine, “First-spike timing of auditory-nerve fibers and comparison
     with auditory cortex”, J. Neurophysiol. 78, 2438–2454 (1997).
[21] P. Heil, “First-spike latency of auditory neurons revisited”, Curr. Opin. Neuro-
     biol. 14, 461–467 (2004).
[22] B. Fontaine and H. Peremans, “Bat echolocation processing using first-spike
     latency coding”, Neural Networks 22, 1372–1382 (2009).
[23] E. M. Izhikevich, “Hybrid spiking models”, Philos. T. Roy. Soc. A 368, 5061–
     5070 (2010).
[24] E. M. Izhikevich, Dynamical systems in neuroscience, the geometry of excitability
     and bursting (MIT Press, Cambridge, MA) (2007).
[25] D. Oertel and E. Young, “What’s a cerebellar circuit doing in the auditory sys-
     tem?”, Trends Neurosci. 27, 104–110 (2004).
[26] E. Covey and J. Casseday, “Timing in the auditory system of the bat”, Annu.
     Rev. Physiol. 61, 457–476 (1999).


                                         171
Figure A.11. Layout of each of three tabbed panels in the BiSCAT GUI


                                172