Bio-Inspired Broadband Sonar: Methods for Acoustical Analysis of Bat Echolocation and Computational Modeling of Biosonar Signal Processing By Jason E. Gaudette M.S., University of Rhode Island, May 2005 B.S., Worcester Polytechnic Institute, May 2003 Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Center for Biomedical Engineering at Brown University Providence, Rhode Island May 2014 © Copyright 2014 by Jason E. Gaudette This dissertation by Jason E. Gaudette is accepted in its present form by the Center for Biomedical Engineering as satisfying the dissertation requirement for the degree of Doctor of Philosophy. Date James A. Simmons, Advisor Recommended to the Graduate Council Date Elie L. Bienenstock, Reader Date Rodney J. Clifton, Reader Date Diane Hoffman-Kim, Reader Date Sherief Reda, Reader Date John R. Buck, External Reader Approved by the Graduate Council Date Peter M. Weber, Dean of the Graduate School iii Curriculum Vitae Jason E. Gaudette was born on October 9th , 1980 and raised with his younger sister Renee in Raynham, Massachussets to Edward and Mary Gaudette. Graduating from Bridgewater-Raynham High School in 1999, he continued on to Worcester Polytech- nic Institute to pursue a degree in Electrical Engineering. While an undergraduate Jason studied abroad on three occasions in Madrid, Spain; San Juan, Puerto Rico; and Limerick, Ireland. He received his Bachelor of Science in 2003 with distinction, a concentration in Computer Engineering, and a minor in International Studies. Imme- diately following graduation, Jason began his career at the Naval Undersea Warfare Center in Newport, RI as an Electrical Engineer. He enrolled in the graduate program at the University of Rhode Island in the Fall of 2003 and graduated in 2005 to obtain the Master of Science in Electrical Engineering. Soon thereafter, Jason married his wife Elena and had two children, Lucas and Alexander, born in 2006 and 2008. In the Fall of 2008 Jason enrolled in the Biomedical Engineering program at Brown Uni- versity. Working with his advisor, Prof. James A. Simmons, Jason has been part of a highly interdisciplinary team of researchers studying bat echolocation. As an active member of this laboratory, Jason has co-authored several peer-reviewed journal arti- cles, conference proceedings and abstracts, invited presentations, numerous research proposals, and a technical patent. iv Jason E. Gaudette jason.e.gaudette@navy.mil Naval Undersea Warfare Center 1176 Howell Street Newport, RI 02841 Professional Experience Naval Undersea Warfare Center, Newport, RI 2003 – present Electrical Engineer and Research Scientist • Lead engineer for electronics design and acoustic signal processing on various sonar programs, including acoustic countermeasure devices and forward-looking active sonar systems • Principal investigator for bio-inspired broadband sonar research • Experienced with design of low-noise acoustic transducer interface electronics, acoustic signal processing and analysis, and embedded systems development Analog Devices, Inc., Limerick, Ireland Fall 2002 Precision Digital to Analog Converters • Developed electronics and software for two customer evaluation board designs • Completed WPI Senior design team project (MQP) in 10 weeks abroad Analog Devices, Inc., Wilmington, MA Summer 2002 High-Speed Networking (HSN) Engineering Intern • Developed and tested an integrated circuit communication interface using Agi- lent VEE and the I2 C protocol • Characterized high-speed transceiver electronics for laser diode driver IC Education Brown University, Providence, RI May 2014 (exp.) Ph.D. Biomedical Engineering Advised by Dr. James A. Simmons University of Rhode Island, Kingston, RI May 2005 M.S. Electrical Engineering Worcester Polytechnic Institute, Worcester, MA May 2003 B.S. Electrical Engineering with Distinction Concentration in Computer Engineering Minor in International Studies v Awards and Honors 1. Full Member, Sigma Xi, Scientific Research Society, Brown University Chapter, (2014). 2. J. E. Gaudette, L. N. Kloepper, M. Warnecke and J. A. Simmons, “Arrayzilla Lives! Visualizing the dynamic beam pattern of an echolocating bat,” 1st place video entry in the Gallery of Acoustics displayed at the 164th Meeting of the Acoustical Society of America, Kansas City, MO, (October 2012). 3. “Special Achievement Award for Excellence in the Area of Basic and Applied Research,” Swampworks Lightweight Torpedo Project Team, Naval Undersea Warfare Center, Newport, RI, (2007). 4. “Special Achievement Award for Excellence in the Area of Basic and Applied Research,” Biorobotic Research Team, Naval Undersea Warfare Center, New- port, RI, (2006). 5. Member, Eta Kappa Nu, Electrical Engineering Honor Society, Gamma Delta Chapter at Worcester Polytechnic Institute, Worcester, MA, (2003). Grants and Fellowships 1. 2014–2016, ONR Research Grant, Code 341 Bio-Inspired Autonomous Systems Program, (J. E. Gaudette, Principle Investigator), $275K, “Computational modeling and experimental evaluation of a bio-inspired broadband sonar sys- tem.” 2. 2014–2016, NUWC Division Newport FY14 Independent Applied Research (IAR) Award, (J. E. Gaudette, Principal Investigator), $300K, “Bio-inspired broadband sonar system for high-resolution acoustic imaging applications.” 3. 2014–2016, NUWC Division Newport FY14 In-House Laboratory Independent Research (ILIR) Award, (J. DiCecco, P. I.; J. E. Gaudette, Associate Investi- gator), $300K, “Novel reconfigurable neuromorphic computing architectures for neural information processing.” 4. 2011–2013, NUWC Division Newport FY11-FY13 In-House Laboratory Inde- pendent Research (ILIR) Award, (J. E. Gaudette, Principal Investigator), $300K, “Bio-inspired broadband sonar receiver for clutter reduction: Computa- tional modeling and system evaluation.” 5. 2010, NUWC Division Newport Academic Fellowship Award, (J. E. Gaudette, Principal Investigator), one-year sabbatical leave to Simmons’ Laboratory, Brown University, Providence, RI. vi 6. 2009, NUWC Division Newport FY09 Virtual In-House Laboratory Independent Research (V-ILIR) Award, (J. E. Gaudette, Principal Investigator), $85K. “Bio-inspired broadband sonar receiver for clutter reduction.” Peer-Reviewed Journal Articles 1. J. E. Gaudette, L. N. Kloepper and J. A. Simmons, “Modeling of bio-inspired broadband sonar for high-resolution angular imaging,” J. Acoust. Soc. Am., (in prep.). 2. L. N. Kloepper, J. E. Gaudette, J. R. Buck, and J. A. Simmons, “Influence of mouth opening and gape angle on the transmitted signals of big brown bats (Eptesicus fuscus),” J. Acoust. Soc. Am., (in prep.). 3. L. N. Kloepper and J. E. Gaudette, “Exploring the dynamics of mammalian vocal-motor processes with emerging advanced technologies,” J. PostDoc. Res., (in review.). 4. J. E. Gaudette, L. N. Kloepper, M. Warnecke and J. A. Simmons, “High res- olution acoustic measurement system and beam pattern reconstruction method for bat echolocation emissions,” J. Acoust. Soc. Am., 135 (1), 513–520 (2014). doi: [10.121/1.4829661] 5. J. DiCecco, J. E. Gaudette and J. A. Simmons, “Multi-component separation and analysis of bat echolocation calls,” J. Acoust. Soc. Am., 133 (1), 538–546 (2013). doi: [10.121/1.4768877] 6. J. A. Simmons and J. E. Gaudette, “Biosonar echo processing by frequency- modulated bats,” Radar Sonar Navig. IET, 6 (6), 556–565 (2012). doi: [10.1049/iet-rsn.2012.0009] Conference Papers and Abstracts Presented 1. J. E. Gaudette† and J. A. Simmons, “Encoding phase information is critical for high resolution spatial imaging in biosonar,” in J. Acoust. Soc. Am., Providence, RI, May 2014 2. J. E. Gaudette† and J. A. Simmons, “Modeling of bio-inspired broadband sonar for high-resolution angular imaging,” in J. Acoust. Soc. Am., San Francisco, CA, December 2013, p. 4052. doi: [10.1121/1.4830787] 3. L. N. Kloepper† , J. A. Simmons, J. E. Gaudette, R. Himmelwright and D. Robitzski, “Timing patterns of strobe groups for echolocating big brown bats † presented vii performing a target detection task,” in J. Acoust. Soc. Am., San Francisco, CA, December 2013, p. 4119. doi: [10.1121/1.4831129] 4. J. E. Gaudette, L. N. Kloepper† and J. A. Simmons, “Object selection by head aim and acoustic gaze in the big brown bat,” in J. Acoust. Soc. Am., 133 (5), Montreal, Quebec, June 2013, p. 3406. doi: [10.1121/1.4805938] 5. J. A. Simmons, J. E. Gaudette and L. N. Kloepper† , “Object selection by head aim and acoustic gaze in the big brown bat,” in Proc. Meetings on Acoustics, Vol. 19, (010036), June 2013. doi: [10.1121/1.4800651] 6. J. E. Gaudette, L. N. Kloepper† and J. A. Simmons, “Large reconfigurable microphone array for transmit beam measurements of echolocating bats,” in J. Acoust. Soc. Am., 131 (4), Hong Kong, China, May 2012, p. 3361. doi: [10.1121/1.4708666] 7. J. E. Gaudette† and J. DiCecco, “Bio-inspired broadband sonar and multi- component time-frequency analysis,” presented at the Maritime Systems and Technology (MAST) Americas Conference, Washington, DC, 14 November 2011. 8. J. E. Gaudette† J. M. Knowles, J. R. Barchi, and J. A. Simmons, “Computa- tional model of a bio-inspired broadband receiver for sonar clutter reduction,” in J. Acoust. Soc. Am., 129 (4), Seattle, WA, 25 May 2011, p. 2507. doi: [10.1121/1.3588282] 9. J. M. Knowles† , J. E. Gaudette, J. R. Barchi and J. A. Simmons, “Recon- structing echolocation behavior using time difference of arrival localization and a distributed microphone array as a virtual Telemike,” in J. Acoust. Soc. Am., 129 (4), Seattle, WA, 23-27 May 2011, p. 2574. doi: [10.1121/1.3588496] 10. J. DiCecco† and J. E. Gaudette† , “Analysis of Active Sonar Waveform Design by Echolocating Mammals,” presented at the Nato Undersea Research Center (NURC) Maritime Rapid Environmental Assessment (MREA10) Conference, Lerichi, Italy, 13 October 2010. 11. J. E. Gaudette† and J. A. Simmons, “Modeling of precise onset spike timing for echolocation in the big brown bat, Eptesicus fuscus,” in J. Acoust. Soc. Am., 127 (3), Baltimore, MD, April 2010, p. 1861. doi: [10.1121/1.3384433]. 12. J. R. Barchi† , J. E. Gaudette, J. M. Knowles and J. A. Simmons, “Bioa- coustic and behavioral correlates of spatial memory in echolocating bats,” in J. Acoust. Soc. Am., 127 (3), Baltimore, MD, April 2010, p. 2030. doi: [10.1121/1.3385329]. Invited Lectures 1. J. E. Gaudette† and J. A. Simmons, “Bio-inspired broadband sonar: Exploiting biological solutions to simplify acoustic imaging,” Keynote Speaker for Winter viii Meeting of the Acoustical Society of America, Narragansett Chapter, 24 Febru- ary 2014. Middletown, RI. 2. J. E. Gaudette† and J. A. Simmons, “Bio-inspired broadband sonar: Compu- tational modeling and system evaluation,” NUWC Newport – Naval Research Laboratory (NRL) Joint Lecture Series, 18 June 2013. Washington, DC. 3. J. E. Gaudette† and J. A. Simmons, “Bio-inspired broadband sonar: Compu- tational modeling and system evaluation,” NUWC ILIR Science and Technology ILIR Seminar Series, 2013. Newport, RI. 4. J. E. Gaudette† and J. A. Simmons, “Bio-inspired broadband sonar: Exploiting biological solutions to simplify acoustic imaging.” Virtual teleconference presen- tation - ONR N-STAR lecture series, 3 April 2013. NUWC Division Newport, RI; Office of Naval Research, Arlington, VA; NSWC Panama City, FL. 5. J. E. Gaudette† and J. A. Simmons, “Bio-inspired broadband sonar for clutter reduction.” Presentation at the UMASS Dartmouth – NUWC Newport Joint Technical Seminar Series, Dartmouth, MA, 2 November 2012 6. J. E. Gaudette† and J. A. Simmons, “Bio-inspired broadband sonar: Compu- tational modeling and system evaluation,” NUWC ILIR Science and Technology ILIR Seminar Series, 10 February 2012. Newport, RI. 7. J. E. Gaudette† and J. A. Simmons, “Bio-inspired broadband sonar: Compu- tational modeling and system evaluation,” Brown University Biomedical Engi- neering Graduate Seminar Lecture, 7 February 2012. Providence, RI. 8. J. E. Gaudette† and J. A. Simmons, “Bio-inspired broadband sonar receiver for clutter reduction: Computational modeling and system evaluation,” Brown University Biomedical Engineering Graduate Seminar Lecture, 18 April 2011. Providence, RI. 9. J. E. Gaudette† and J. A. Simmons, “Bio-inspired broadband sonar receiver for clutter reduction: Computational modeling and system evaluation,” NUWC ILIR Science and Technology ILIR Seminar Series, 30 March 2011. Newport, RI. Poster Sessions 1. J. E. Gaudette† and J. A. Simmons, “Bio-inspired broadband sonar for micro- aperture imaging,” poster presented at the FY2013 In-House Laboratory Inde- pendent Research (ILIR) Annual Program Review, 29 October 2013, Newport, RI. 2. J. E. Gaudette† and J. A. Simmons, “Bio-inspired broadband sonar,” poster presented at the N-STAR symposium, June 2012, Arlington, VA. ix 3. J. E. Gaudette† and J. A. Simmons, “Bio-inspired broadband sonar: Com- putational modeling and system evaluation,” poster presented at the FY2012 In-House Laboratory Independent Research (ILIR) Annual Program Review, October 2012, Newport, RI. 4. J. M. Knowles† , J. A. Simmons, J. M. Barchi, J. E. Gaudette, S. S. Horowitz and A. M. Simmons, “Cochlear processing in biosonar: Modeling sound trans- duction and the cochlear microphonic in echolocating bats,” poster presented at the Society for Neuroscience, 477.D.02, November 2011, Washington, DC. 5. J. E. Gaudette† and J. A. Simmons, “Bio-inspired broadband sonar: Com- putational modeling and system evaluation,” poster presented at the FY2011 In-House Laboratory Independent Research (ILIR) Annual Program Review, October 2011, Newport, RI. 6. J. E. Gaudette† and J. A. Simmons, “Sonar clutter reduction using bio-inspired broadband template matching,” poster presented at the FY2009 In-House Lab- oratory Independent Research (ILIR) Annual Program Review, October 2009, Newport, RI. Teaching Experience 1. BN065, Biology of Hearing, guest lecturer, designed and delivered lecture notes with computer examples to approx. 80-100 students, “Fourier transform and spectral analysis related to acoustics and the auditory system,” 1 February 2012. Brown University, Providence, RI. 2. Sheridan Teaching Certificate: Level I Seminar Program, May 2010, Sheridan Center for Teaching and Learning, Brown University, Providence, RI. 3. BN065, Biology of Hearing, guest lecturer, presented two consecutive seminars of approx. 100-120 students, “Computational modeling of the auditory system,” 10 and 12 March 2010. Brown University, Providence, RI. x Preface and Acknowledgments From the commencement of my graduate studies, my intention was to focus on some- thing unique and interesting. I think most people would agree that researching bat sonar is exactly that. So much has been learned through this experience, both pro- fessionally and personally. Ultimately, the most important lesson is that time is truly our most valuable and limited resource and it must be spent wisely. I would first like to thank my wonderful wife, Elena. You have kept me going through the many times of uncertainty and frustration, reviewed my endless supply of presentations and manuscript revisions, and supported me in all of my endeavors. This was certainly a long journey and I could not have done it without your devotion. To my parents, I would like to say that this is all your fault. You encouraged me to learn, and taught me the value of education, but forgot to tell me when to stop. Nevertheless, I will always appreciate everything you have done for me. I can only hope that I am able to instill the same set of values into my children. Among the many other people who deserve acknowledgment for this dissertation are my family, close friends, and many of my teachers and colleagues at Brown, URI, and NUWC. My personal drive stems from all of these relationships and I would be remiss to overlook this fact. There are far too many people to thank individually, but I would regret not mentioning at least a few. My earliest interests in bio-inspired engineering stemmed from working closely with Alberico Menozzi, Henry Leinhos, David Beal, and Pro- mode Bandyopadhyay at NUWC, and it was this initial exposure to biorobotics that has had a lasting impact. It was also by great fortune that I met John DiCecco, as his ideas on non-linear time-frequency analysis are what shaped the early parts of this dissertation. From the bat lab, I feel honored to have worked closely with xi Jeff Knowles, an outstanding academic with whom I’ve shared many a philosophi- cal felafel and who also launched my sailing career; Michaela Warnecke, who quickly transformed into the German I ask for answers to everything; Alyssa Wheeler, who made me appreciate the sheer difficulty of lab work; and Laura Kloepper, who taught me how to write good. Among the many people to review various drafts of my dis- sertation, I would also like to thank Andrea Simmons, David Segala, Robin Murray, and Jennifer Wardell for their many helpful comments and suggestions. I would like to thank all of the members of my thesis committee for their com- ments, suggestions, criticisms, and overall guidance of my research. Shaping such broad objectives into substantial research requires the highly interdisciplinary ex- pertise afforded by this group. I sincerely appreciate the considerable commitment toward this effort. I am particularly indebted to Prof. John Buck who, as an exter- nal advisor from UMass Dartmouth, asked the difficult questions that helped me to improve the overall quality of this research. I owe a great deal of thanks to my advisor, Prof. James Simmons, who deserves most of the credit for inspiring the research in this dissertation. Jim’s devotion, creative ideas, cheerfulness, and infinite patience are just a few of the reasons that keep me passionate about this work. Finally, all of my graduate courses and research to date has been funded through internal investments by the Naval Undersea Warfare Center in Newport, Rhode Is- land. I am pleased and extremely grateful to the Chief Technology Office as well as my management and colleagues for committing to employees’ professional and educa- tional goals. Without this continued support, none of this could have been achieved. xii Dedication To my inquisitive children, Lucas and Alexander. xiii Table of Contents Table of Contents xiv List of Figures xvii List of Symbols xx 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.1 Time-Frequency Analysis and the Auditory System . . . . . . 5 1.2.2 Dynamic Behavior and Adaptation in Echolocation . . . . . . 6 1.2.3 Toward the Design of a Bio-Inspired Broadband Sonar System 7 1.3 Dissertation Objectives and Overview . . . . . . . . . . . . . . . . . . 9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2 Background 13 2.1 Acoustic Information Sensing and Processing by Mammals . . . . . . 13 2.1.1 The Mammalian Auditory System . . . . . . . . . . . . . . . . 14 2.1.2 Neural Information Processing by the Auditory System . . . . 16 2.1.3 Auditory Cues for Passive Localization in Biological Systems . 17 2.1.4 Specializations for High-Resolution Active Acoustic Imaging . 19 2.2 Acoustic Imaging in Technological Systems . . . . . . . . . . . . . . . 24 2.2.1 Conventional Array Signal Processing . . . . . . . . . . . . . . 24 2.2.2 Beam Patterns and Angular Resolution . . . . . . . . . . . . . 26 2.3 Model-Based Approach to Bio-Inspired Acoustic Imaging . . . . . . . 30 2.3.1 Auditory Modeling Insights and Oversights with Filter Banks 31 2.3.2 Signal Processing Models for High-Resolution Range Estimates 32 2.3.3 Models for Angular Target Localization and Acoustic Imaging 36 2.3.4 Mathematical Models of Echolocation Performance . . . . . . 37 2.3.5 Hardware Prototypes as Exploratory Models . . . . . . . . . . 38 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3 Multi-Component Separation and Analysis of Bat Echolocation Calls 53 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.1 Separation of Harmonic Components . . . . . . . . . . . . . . 58 3.3.1.1 Fractional Fourier Transform . . . . . . . . . . . . . 59 xiv 3.3.1.2 Rough Approximation of Instantaneous Frequency . 60 3.3.1.3 Zero-Phase Component Filtering . . . . . . . . . . . 62 3.3.2 Monocomponent Decomposition . . . . . . . . . . . . . . . . . 63 3.3.2.1 Empirical Mode Decomposition . . . . . . . . . . . . 63 3.3.2.2 Hilbert Spectral Analysis . . . . . . . . . . . . . . . 65 3.3.3 Waveform Synthesis and Ground Truth . . . . . . . . . . . . . 66 3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.4.1 Telemike Data Series . . . . . . . . . . . . . . . . . . . . . . . 67 3.4.2 Synthesized Multi-Component FM Analysis . . . . . . . . . . 68 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.6 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 A Multi-Component Frequency-Modulated Waveforms . . . . . . . . . . 72 B Hilbert Spectral Analysis of Modulated Waveforms . . . . . . . . . . 73 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4 High Resolution Acoustic Measurement System and Beam Pattern Reconstruction Method for Bat Echolocation Emissions 79 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.3.1 Beam Pattern Reconstruction . . . . . . . . . . . . . . . . . . 85 4.3.2 Microphone and System Calibration . . . . . . . . . . . . . . . 88 4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.4.1 Example Beam Pattern of a Circular Electrostatic Projector . 90 4.4.2 Example Beam Pattern of the Big Brown Bat, Eptesicus fuscus 93 4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.6 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5 Modeling Bio-Inspired Broadband Sonar for High-Resolution Angu- lar Imaging 101 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.2 Modeling Broadband Acoustic Information . . . . . . . . . . . . . . . 102 5.2.1 Environmental Acoustics . . . . . . . . . . . . . . . . . . . . . 103 5.2.1.1 The Transformation of Broadband Information in the Physical Environment . . . . . . . . . . . . . . . . . 103 5.2.1.2 Application of Broadband Transmission Loss to the Active Sonar Equation . . . . . . . . . . . . . . . . . 105 5.2.2 Transducer Directivity Patterns . . . . . . . . . . . . . . . . . 109 5.2.2.1 Broadband Spectral Information in Conventional Trans- ducers . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.2.2.2 Bio-Acoustic Baffle Structures and Implications for Modeling . . . . . . . . . . . . . . . . . . . . . . . . 111 5.2.3 Reflective Scatterer Structure and Composition . . . . . . . . 114 5.2.4 The Broadband Echo Spectrum in the Range-Azimuth Plane . 116 5.3 Extraction of Broadband Spatial Information from Echoes . . . . . . 117 xv 5.3.1 Quantifying the Angular Resolution Limit . . . . . . . . . . . 117 5.3.2 Broadband Acoustic Focusing with a Single Piston Transducer 120 5.3.3 Broadband Acoustic Focusing with a Bio-Inspired Array . . . 121 5.3.4 Mutual Interference and the Diffraction Patterns of Scatterers 123 5.4 Performance Comparison with Conventional Acoustic Imaging . . . . 125 5.4.1 Processing Broadband Signals with Suboptimal Element Spacing126 5.4.2 Coherent Summation of Broadband Signals . . . . . . . . . . . 129 5.4.3 Limitations to Conventional Beamforming Comparisons . . . . 131 5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.6 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 A Applying Biosonar Modeling to Underwater Acoustic Imaging . . . . 135 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6 Discussion, Applications, Future Directions, and Concluding Re- marks 143 6.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 6.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 6.2.1 Multi-Component Signals and Time-Frequency Analysis . . . 146 6.2.2 Beam Pattern Measurement Instrumentation and Techniques . 147 6.3 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 6.3.1 Time-Frequency Analysis of Bio-Acoustic Signals . . . . . . . 148 6.3.2 Acoustic Measurement and Visualization of the Multi-Dimensional Sound Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 6.3.3 Bio-Inspired Broadband Sonar for Micro-Aperture Imaging . . 150 6.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 A Modeling of Precise Onset Spike Timing for Echolocation 154 A.1 Motivation for a Biophysical Model . . . . . . . . . . . . . . . . . . . 154 A.1.1 Coincidence Detection and Population Coding in the Auditory System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 A.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 A.2.1 Peripheral System . . . . . . . . . . . . . . . . . . . . . . . . . 159 A.2.1.1 Outer and Middle Ear . . . . . . . . . . . . . . . . . 159 A.2.1.2 Cochlea and Basilar Membrane . . . . . . . . . . . . 159 A.2.1.3 Meddis Auditory Peripheral Model . . . . . . . . . . 160 A.2.1.4 Spike Refractory Equations . . . . . . . . . . . . . . 161 A.2.2 Cochlear Nucleus . . . . . . . . . . . . . . . . . . . . . . . . . 162 A.2.2.1 Leaky IaF Model . . . . . . . . . . . . . . . . . . . . 162 A.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 A.3.1 Auditory Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . 164 A.3.1.1 Meddis Auditory Peripheral Model . . . . . . . . . . 165 A.3.1.2 IaF Neurons . . . . . . . . . . . . . . . . . . . . . . . 165 A.3.1.3 Integration with BiSCAT . . . . . . . . . . . . . . . 166 A.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 xvi List of Figures 1.1 Close-up photograph of the big brown bat, Eptesicus fuscus and time- frequency diagram (spectrogram) for an example E. fuscus echoloca- tion call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 The measured transmit and receive acoustic directivity, or beam pat- terns, of E fuscus are plotted across the azimuth plane . . . . . . . . 3 2.1 The mammalian auditory system mapped from the cochlea to the cortex 15 2.2 Beam patterns in air from a line array of N = 10 omni-directional elements that are spaced at d = 1.72 cm . . . . . . . . . . . . . . . . 27 2.3 Active underwater sonar data collected from the site of a shipwreck in Narragansett Bay, Rhode Island . . . . . . . . . . . . . . . . . . . . . 29 2.4 The magnitude, phase, and group delay response for a gammatone filter bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.5 Block diagram of the Spectrogram Correlation and Transformation (SCAT) receiver model . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.1 Four different time-frequency distributions of an FM echolocation call from E. fuscus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.2 Rotation-fraction domain of the E. fuscus signal from the FrFT . . . 61 3.3 Overview of harmonic component separation using a least-squares cu- bic approximation of instantaneous frequency, fi (t) . . . . . . . . . . 63 3.4 Results of the empirical mode decomposition on the separated second harmonic, FM2, from E. fuscus . . . . . . . . . . . . . . . . . . . . . 64 3.5 Hilbert spectral analysis results showing instantaneous amplitude, ai (t), and frequency, fi (t), for each harmonic component of the E. fuscus call 66 3.6 Multi-component analysis performed on call sequences from radioteleme- try recordings of E. fuscus and three Asian bat species . . . . . . . . 67 3.7 Multi-component analysis results from the telemike data series plotted separately for FM1 and FM2 . . . . . . . . . . . . . . . . . . . . . . . 69 3.8 Standard time-frequency representations and multi-component analy- sis results for synthetic signals . . . . . . . . . . . . . . . . . . . . . . 70 4.1 Photograph of fully constructed microphone array and close-up view of a microphone preamplifier circuit board showing the integrated MEMS microphone unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.2 Flow chart describing the signal processing steps to reconstruct each beam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 xvii 4.3 Diagram showing microphone sensor positions mapped to spherical co- ordinates with the sound source positioned at the origin . . . . . . . . 87 4.4 Aspect view and contour plot of the reconstructed transmit beam pat- tern of a 2 cm diameter transducer at its resonant frequency of 60 kHz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.5 Theoretical beam pattern of a piston transducer with 2 cm diameter in air . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.6 Aspect view and 6 dB contour plot of the reconstructed beam patterns for a single E. fuscus transmit pulse . . . . . . . . . . . . . . . . . . . 94 5.1 The total absorption effect in air and the three individual components that dominate in different frequency regions . . . . . . . . . . . . . . 106 5.2 Absorption vs. frequency at 50% relative humidity plotted for temper- atures between 0◦ C and 40◦ C in steps of 5◦ . . . . . . . . . . . . . . . 107 5.3 Combined transmission loss components due to both spherical spread- ing and absorption . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.4 Relative echo strength vs. distance at different frequencies for an ideal 0 dB point reflector . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.5 Theoretical directivity pattern for a piston transducer in air with a fixed circular aperture of 0.94 cm . . . . . . . . . . . . . . . . . . . . 111 5.6 Example beam pattern data measured from an obliquely truncated horn113 5.7 The target strength of individual fish at dorsal aspect versus length . 115 5.8 Relative echo intensity as a function of range, azimuth, and frequency 118 5.9 The region of focus after applying the L1 spectral distance around 4.5 m at 0◦ azimuth (a) and 25◦ off-axis . . . . . . . . . . . . . . . . 120 5.10 A bio-inspired broadband sonar array utilizing only three circular piston- like elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.11 The region of focus after applying the L1 spectral distance around 4.5 m at 0◦ azimuth for a single transmitter and a pair of identical receive transducers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.12 The time difference of arrival between two receiving transducers when separated by 1.4 cm in air . . . . . . . . . . . . . . . . . . . . . . . . 123 5.13 The region of focus after combining binaural spectrogram correlation and TDOA estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.14 The beam patterns of an array with N = 10 omni-directional elements spaced at d = 1.4 cm in air . . . . . . . . . . . . . . . . . . . . . . . 128 5.15 The beam patterns of an array with N = 2 omni-directional elements spaced apart by d = 1.4 cm in air . . . . . . . . . . . . . . . . . . . . 128 5.16 Summed beam patterns for a simple array of N = 2 elements spaced apart by d = 1.4 cm in air . . . . . . . . . . . . . . . . . . . . . . . . 130 5.17 Absorption coefficient in water vs. frequency at various temperatures between -5◦ C and 35◦ C, depth of 0 m, salinity of 35 ppt, and acidity of 8.0 pH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 A.1 Action potentials recorded from a rat when presented with a low fre- quency sinusoidal stimulus . . . . . . . . . . . . . . . . . . . . . . . . 157 A.2 Proposed neural network architecture of the auditory population coding158 xviii A.3 Block diagram of the Meddis IHC model . . . . . . . . . . . . . . . . 160 A.4 Time series and spectrogram of a synthetic linear FM and 2 pairs of echoes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 A.5 Magnitude and phase plot of 4 channels in a gammatone filterbank between 25kHz and 100kHz . . . . . . . . . . . . . . . . . . . . . . . 165 A.6 Example gammatone filterbank output using the signal as shown above and generated at 4 arbitrary frequencies . . . . . . . . . . . . . . . . 166 A.7 Internal states of the Meddis model (k, q, c, & w) in response to a synthesized acoustic stimulus . . . . . . . . . . . . . . . . . . . . . . 167 A.8 Pspike and resulting spike train for 40 LSR auditory nerve fibers . . . 167 A.9 Membrane potential and spikes with 4 integrate-and-fire neurons . . . 168 A.10 Integrate-and-fire neurons (M=4) with random, but overlapping synap- tic input (N=100) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 A.11 Layout of each of three tabbed panels in the BiSCAT GUI . . . . . . 172 xix List of Symbols This dissertation spans many fields, including acoustics, biology, and engineering. Where noted in the descriptions below, the application of symbols is context spe- cific. Acoust: acoustics and acoustic modeling, Anat: anatomy, ASP: array signal processing, Model: Auditory modeling and linear filter theory, TFA: time-frequency analysis. Abbreviations AC Anat: auditory cortex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 AN Anat: auditory nerve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 ARMA Model: auto-regressive moving-average . . . . . . . . . . . . . . . . . . 86 ATR automatic target recognition . . . . . . . . . . . . . . . . . . . . . . . . . 116 AVCN Anat: anteroventral cochlear nucleus . . . . . . . . . . . . . . . . . . . . 15 BM Anat: basilar membrane . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 CN Anat: cochlear nucleus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 CRLB Cramer-Rao lower bound . . . . . . . . . . . . . . . . . . . . . . . . . . 37 DCN Anat: dorsal cochlear nucleus . . . . . . . . . . . . . . . . . . . . . . . . 15 DRNL Model: dual-resonance non-linear . . . . . . . . . . . . . . . . . . . . . 31 EMD TFA: empirical mode decomposition . . . . . . . . . . . . . . . . . . . . 64 FFT TFA: fast Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . 88 FM frequency modulated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 FPGA field programmable gate array . . . . . . . . . . . . . . . . . . . . . . . . 7 FrFT TFA: fractional Fourier transform . . . . . . . . . . . . . . . . . . . . . . . 5 FT TFA: Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 HPBW ASP: half-power beam width . . . . . . . . . . . . . . . . . . . . . . . . 28 xx HRTF Acoust: head-related transfer function . . . . . . . . . . . . . . . . . . . 18 IC Anat: inferior colliculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 IHC Anat: inner hair cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 IID Acoust: interaural intensity difference . . . . . . . . . . . . . . . . . . . 18 IIR Model: infinite impulse response . . . . . . . . . . . . . . . . . . . . . . . 31 IMF TFA: intrinsic mode function . . . . . . . . . . . . . . . . . . . . . . . . 64 ITD Acoust: interaural time difference . . . . . . . . . . . . . . . . . . . . . . 18 JAMF TFA: joint acoustic and modulation frequency . . . . . . . . . . . . . . . 6 LSO Anat: lateral superior olive . . . . . . . . . . . . . . . . . . . . . . . . . . 15 LTI Model: linear time-invariant . . . . . . . . . . . . . . . . . . . . . . . . . . 6 MA Model: moving average . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 MEMS micro electro-mechanical systems . . . . . . . . . . . . . . . . . . . . . 83 MRA Acoust: main response axis . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 MSO Anat: medial superior olive . . . . . . . . . . . . . . . . . . . . . . . . . 15 NLL Anat: nucleus of the lateral lemniscus . . . . . . . . . . . . . . . . . . . 15 NTB Anat: nucleus of the trapezoidal body . . . . . . . . . . . . . . . . . . . 15 OHC Anat: outer hair cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 PVCN Anat: posteroventral cochlear nucleus . . . . . . . . . . . . . . . . . . . 15 RCF Model: rectify, compress, and filter . . . . . . . . . . . . . . . . . . . . . 34 RWT TFA: Radon-Wigner transform . . . . . . . . . . . . . . . . . . . . . . . 59 SCAT Model: spectrogram correlation and transformation . . . . . . . . . . . 34 SOC Anat: superior olivary complex . . . . . . . . . . . . . . . . . . . . . . . 14 SPL Acoust: sound pressure level . . . . . . . . . . . . . . . . . . . . . . . . . 90 STFT TFA: short-time Fourier transform . . . . . . . . . . . . . . . . . . . . . . 5 TDOA time difference of arrival . . . . . . . . . . . . . . . . . . . . . . . . . . 85 TFR TFA: time-frequency representation . . . . . . . . . . . . . . . . . . . . . 56 VLSI very-large scale integrated . . . . . . . . . . . . . . . . . . . . . . . . . . 38 VRDR Model: variable resolution and detection receiver . . . . . . . . . . . . 36 xxi WVD TFA: Wigner-Ville distribution . . . . . . . . . . . . . . . . . . . . . . . . 5 Variables α Acoust: frequency dependent acoustic absorption coefficient . . . . . . . 88 α TFA: normalized fractional angle of rotation . . . . . . . . . . . . . . . . 59 β angle of truncation for an acoustic horn . . . . . . . . . . . . . . . . . . 112 λ Acoust: wavelength in the medium . . . . . . . . . . . . . . . . . . . . . 18 φ TFA: angle of fractional rotation in radians . . . . . . . . . . . . . . . . 59 φ(f ) Model: phase response of a filter . . . . . . . . . . . . . . . . . . . . . . 33 φ0 TFA: initial phase of a modulated signal . . . . . . . . . . . . . . . . . . 66 φi (t) TFA: instantaneous phase law . . . . . . . . . . . . . . . . . . . . . . . . 62 ρ Acoust: atmospheric pressure . . . . . . . . . . . . . . . . . . . . . . . . 104 ψ ASP: steered angle of an array . . . . . . . . . . . . . . . . . . . . . . . . 25 xˇ(t) TFA: original analytic signal, demodulated . . . . . . . . . . . . . . . . 62 yˇ(t) TFA: isolated analytic component, demodulated . . . . . . . . . . . . . 62 df (θ) ASP: array steering vector, 1 × N . . . . . . . . . . . . . . . . . . . . . . 26 x˜(t) TFA: original analytic signal, unmodulated . . . . . . . . . . . . . . . . 60 y˜(t) TFA: isolated analytic component, unmodulated . . . . . . . . . . . . . 62 ai (t) TFA: instantaneous amplitude . . . . . . . . . . . . . . . . . . . . . . . . 65 D Acoust: depth in water, m . . . . . . . . . . . . . . . . . . . . . . . . . . 136 d ASP: distance between sensors . . . . . . . . . . . . . . . . . . . . . . . . 18 d Acoust: acoustic propagation distance . . . . . . . . . . . . . . . . . . . 88 d0 Acoust: reference distance of a sound source . . . . . . . . . . . . . . . . 88 f frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 fi (t) TFA: instantaneous frequency . . . . . . . . . . . . . . . . . . . . . . . . 60 fs sampling rate of a discrete-time signal . . . . . . . . . . . . . . . . . . . 65 hr Acoust: relative humidity . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 N ASP: number of elements in an array . . . . . . . . . . . . . . . . . . . . 25 pH Acoust: acidity, pH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 xxii S Acoust: salinity, ppt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 T Acoust: temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 TL Acoust: transmission loss . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 u TFA: fractional dimension between time and frequency . . . . . . . . . 59 W ASP: aperture shading matrix, diagonal N × N . . . . . . . . . . . . . . 26 x ASP: array data vector, 1 × N . . . . . . . . . . . . . . . . . . . . . . . . 26 Y (f, ψ) ASP: frequency domain array response . . . . . . . . . . . . . . . . . . 26 xxiii Chapter 1 Introduction The biosonar system of echolocating bats, dolphins, and whales represents the most advanced acoustic imaging solution known to exist. The sophistication of biosonar lies not in its complexity, but in the real-time performance that is achievable by a minimalistic set of hardware; a few acoustic baffles1 and a compact network of neural circuitry. The primary focus of this dissertation is on improving our understanding of how animals perceive images of objects from the packets of acoustic echoes. The motivation behind this research is presented first, followed by the significance in the context of the current state-of-the-art. The last section states the research objectives and provides an overview of the remaining dissertation chapters. 1.1 Motivation Echolocation is a complex active sensory system in which animals forage and nav- igate in their environment primarily using emitted acoustic signals. By producing intense, ultrasonic signals and receiving their returning echoes, echolocating animals can identify, discriminate and track prey, often in highly cluttered environments. Bats and toothed whales (Microchiroptera and Odontoceti) are two distinctly dif- 1 Acoustic baffles refer to any physical boundary layers or structures in close proximity to the sound transmission source or receiving sensors. Acoustic baffles serve to block or guide sound waves propagating in a particular direction. In biosonar, acoustic baffles refer specifically to a bat’s mouth or nose for transmission and its ears for reception. The baffles of underwater marine mammals consist of the melon for sound emission and the mandibles for sound reception. In general, the head may be included when it has a significant impact on propagating sound waves. 1 ferent suborders of mammals that convergently evolved echolocation, and both have been intensely investigated to understand their mechanisms that may translate to man-made sonar and radar systems [1]. The big brown bat, Eptesicus fuscus, is an ideal model organism for investigating echolocation. These bats produce short broadband signals with ultrasonic frequencies between 20 and 100 kHz and with a bandwidth-to-center frequency ratio greater than unity (Fig. 1.1b). The signals are downward FM sweeps with three harmonically related components spanning several octaves. The duration and the repetition rate of the signals depend on the distance of nearby objects, with both decreasing as the bat approaches targets [2]. Based on the intensity of emitted sounds, transmission losses, and strength of acoustic reflections from insect prey, big brown bats can detect prey at distances up to 20 m [3]. A 120 35 FM3 30 100 FM2 Frequency (kHz) 25 80 20 60 FM1 15 40 10 20 5 B 0 0 35 15 0 0.5 1 1.5 2 2.5 3 3.5 dB Time (ms) Figure 1.1. (a) A close-up photograph of the big brown bat, Eptesicus fuscus, is shown to highlight the complex set of acoustic baffles – its ears and mouth. The spatial beam or directivity patterns are determined by the geometry of these baffles, which transform the magnitude and phase of sound waves propagating into the inner ears or out from the larynx. (b) The time-frequency diagram (spectrogram) is shown for an example E. fuscus echolocation call along with the corresponding time series (top) and spectral density (side) of the same call. This bat species emits broadband signals that consist of harmonically related components spanning several octaves. The ratio of the bandwidth to center frequency provides an indirect measure of how much a directivity pattern will change naturally over the entire operating frequency range. In the case of E. fuscus, this ratio is greater than unity, but quantities less than 0.2 are common for most man-made active sonar systems. The echolocation signals of big brown bats are produced in the larynx and transmitted through the mouth. The center of the directed energy, or main response axis (MRA), is straight forward at zero degrees across all frequencies. The angular 2 A Hartley & Suthers (1989) B C Aytekin et al. (2004) Aytekin et al. (2004) D Simmons et al. (1983) Figure 1.2. The measured transmit and receive acoustic directivity, or beam patterns, of E fuscus are plotted across the azimuth plane at the specific frequencies of 25 (red), 40 (green), 60 (blue), and 80 kHz (yellow). (a) The transmit beam is emitted through the bat’s mouth. The main response axis (MRA) is straight forward at 0◦ across all frequencies and can be reasonably approximated by a 4.7 mm radius piston transducer [4]. (b and c) The sound reception pattern as measured bilaterally through each ear [5]. Notice that the MRA shifts from off-axis at low frequencies toward on-axis at high frequencies, which is a characteristic of the shape of the ears and can be approximated as an obliquely truncated horn [6]. Due to the limited acoustic aperture, the beam patterns are very broad in angle, even as they become narrower at high frequencies. (d) Despite having very broad beam widths, the angular acuity as measured by a behavioral discrimination task is 1.5◦ in azimuth [7] and 3.0◦ in elevation [8]. This is surprising, because man-made imaging sonar systems generally depend upon narrow transmit and/or receive beams, which require a much larger acoustic aperture (physical or synthetic) for the same frequencies considered here. width of the energy can be reasonably approximated by a 4.7 mm radius piston transducer (Fig. 1.2a) [4]. The returning echoes are received bilaterally through each ear. The receiver MRA shifts from off-axis at low frequencies toward on-axis at high frequencies due to the shape of the ears, which can be approximated as obliquely truncated horns (Fig. 1.2b-c) [5, 6]. A common characteristic among biosonar is that these beams are broad in angle, even at high frequencies. Despite having broad beams, these bats are able to achieve angular acuity of 1.5◦ and 3◦ in azimuth and elevation, respectively (Fig. 1.2d) [7, 8]. The fundamental question is how can bats achieve such fine degrees of acuity 3 with broad beamwidths? A conventional sonar system operating in air over the same frequency range as the big brown bat would require an array length, or aperture, of approximately 1.1 m to achieve 1.5◦ angular resolution in azimuth. Furthermore, element-to-element spacing of 1.7 mm would need to be maintained to avoid ambigu- ous localization [9], which demands approximately 640 array elements in total. This array design becomes completely intractable if the requirement of 3.0◦ is simultane- ously imposed for elevation. Remarkably, the big brown bat requires only two ears spaced 1.4 cm apart (Fig. 1.1a) – a reduction in array aperture of about 80 times and at least two orders of magnitude less sensors. Behavioral and neurophysiological evidence show that bats perform spatial imaging by exploiting three pieces of salient information: 1) the absolute time delay between an emitted pulse and incident echoes, 2) the relative time delay of echoes between ears, and 3) the broadband spectral patterns encoded internally by the bat’s complex acoustic baffles and externally by the environment and reflective scatter- ers. Acoustic imaging in azimuth requires fusing this information together, whereas imaging in elevation is achieved with only the spectral information available to each ear. More specifically, it is known that the spatial imaging process relies upon precise neural timing of echoes arriving at each ear [10, 11] and neural decoding of the fre- quency dependent spectral patterns introduced by the unique structure of the bats’ ears [8, 12]. Biosonar research, indeed neuroscience in general, has advanced prodigiously in a relatively short period of time. Nevertheless, this field is still in its infancy compared to the direction it is heading. Numerous mysteries remain about the underlying mech- anisms for animal echolocation and also how the biological solution can be exploited for improving man-made technologies. Ultimately, the persistence of researchers in this field will be rewarded by a higher level of understanding of acoustic information processing in the mammalian brain. Although mimicking biosonar may not be an optimal solution for all aspects of engineered acoustic sensing and imaging, there are 4 a multitude of important applications where biosonar has the potential to change the way future generations of acoustic imaging systems are conceptualized and designed. 1.2 Significance 1.2.1 Time-Frequency Analysis and the Auditory System Time-frequency analysis, at the most basic level, is the extraction or interpretation of information from a signal that varies in time. It has traditionally been understood as a decomposition of individual sine waves of different frequencies and amplitudes, i.e., the Fourier transform and its time-varying counterpart, the short-time Fourier trans- form (STFT). Considerable effort has been spent on understanding the relationship between time and frequency, or perhaps time and other domains (e.g. scale). Today, we have alternative developments such as the quadratic representations (Wigner- Ville distribution (WVD), Altes Distribution, etc.) [13, 14], the scalogram, fractional Fourier transform (FrFT) [15], reassignment method [16], wavelets and synchrosqueez- ing [17]. Most of this work has been toward the creation of tools for humans and machines to better understand, analyze, and visualize complex time-based signals, especially for propagating waves in acoustics, electromagnetics (including radar and light), seismic waves, etc. that are abundant in the real physical world. In the field of bio-acoustics, time-frequency analysis is an essential tool for researchers to understand and interpret the sounds emitted by animals; however, in- tercepting and recording the sounds of live animals is only part of the problem. We currently have a great number of mathematical and computational models of the auditory system. These include models of the cochlea at the molecular level, mechan- ical micro-models of the elastic basilar membrane, random stochastic models of the auditory-to-neural transduction, and linear time-invariant (LTI) filter bank models. There also exist a great number of models that seek to interpret sound mathematically 5 using alternative transforms (e.g. spectro-temporal modulations [18], joint acoustic and modulation frequency (JAMF) [19]) or higher-order statistics [20]. Despite all of these models, the basic relationship that links pitch, timbre, and loudness to time- frequency analysis eludes us, because these characteristics are psychologically and physiologically induced effects, not physical manifestations of sound. Even so, these effects are unambiguously understood and agreed upon by all humans when we listen to the difference between a note played on the piano and that same note played on the guitar. The relationship between time and frequency within the auditory system is at the core of understanding the intricate nuances of music, speech, communication, and biosonar. 1.2.2 Dynamic Behavior and Adaptation in Echolocation Echolocating animals exhibit a great deal of adaptability with their sonar systems. This dynamic control is seen in time-frequency pulse design [21], as well as the spa- tial directivity of the emitted signals [22]. Even at the reception of acoustic echoes, echolocating animals can rapidly change their receiver directivity patterns by me- chanical adjustments to the acoustic baffles [23]. The ultimate example of biosonar adaptation lies within the neural computations of the brain. Short-term plasticity in the auditory system is responsible for adapting to environmental uncertainties and maintaining highly precise internal spatial representations [24, 25]. Neural adaptation is the reason echolocation has been so successful across the many different species of echolocating bats, dolphins, and whales. Without this adaptation, animals would be ill-equipped to handle any new challenges found in the natural world. We are now at the forefront of exploring dynamic behavior in echolocation and have only recently begun to realize the extent to which it is used [26, 27, 28, 29]. Understanding the nature of this dynamic behavior in echolocation requires new and creative approaches to experimental design. For example, past approaches at measuring beam patterns have been hampered by assuming that transmit and 6 receive beams remain constant from pulse-to-pulse. This choice was partly a conse- quence of limitations to measurement technology, but also because these assumptions are highly convenient. Advances in sensing and computing are enabling the creation of new tools and methods for studying behavioral dynamics that were never before possible. In particular, field programmable gate arrays (FPGA) are being used to rapidly build customized digital hardware with increasing complexity. One impor- tant application for FPGAs is acoustic measurement systems that demand a large number of data acquisition channels. Data collection must be performed in paral- lel to maintain synchronous sampling, and without these new technologies, options are prohibitively complex or expensive to implement. The data volume requirements that go along with this new capability are also expanding, which implies the use of high-throughput high-density transceivers and storage devices. One difficulty is that as sensing and measurement become easier, data dimensionality increases and new visualization techniques are needed. Fortunately, computing power and data process- ing have paced sensing developments. Amongst the vast amount of bio-diversity in echolocating mammals, there remain countless discoveries to make of dynamic be- havior and physiological adaptations. As researchers, we must acknowledge that our assumptions may be questionable and find new, intelligent ways of correcting and validating our hypotheses. 1.2.3 Toward the Design of a Bio-Inspired Broadband Sonar System The implications of developing a bio-inspired broadband sonar system are profound and far-reaching. Biosonar is not a merely theoretical development, it is a proven high-resolution acoustic imaging system that is functional and robust. The excep- tional performance and adaptability by animal echolocators in the midst of dense clutter is what draws engineers and scientists to marvel at its simplicity. Section 2.2 describes how conventional beamforming is done and shows a clear example that this acoustic imaging approach is in wide use today. Advanced sonar systems are consid- 7 ered advanced because they employ some way to improve acoustic imaging perfor- mance beyond the fundamental limitations imposed by the wavelength-to-aperture ratio, λ/L. Performance gain always comes with tradeoffs, which could be extra pro- cessing or making bold assumptions that limit widespread application. Resolution improvements of 2 to 5 times are immediately championed as a success, but biosonar has shown that it is possible to achieve the same angular resolution with orders of magnitude less hardware complexity. Besides achieving higher resolution with fewer sensors, biosonar is superior in numerous aspects over conventional sonar systems. The versatility and adaptability already mentioned are traits that man-made systems severely lack. Echolocating bats use strobe groups to avoid pulse-echo ambiguity and increase pulse-repetition rates when more information is needed. Dynamic usage of echolocation beams is not new, but the way in which bats, dolphins, and whales direct their beams off-axis is. Animals are clearly capable of sonar self-calibration as a superior form of matched- field processing. Any sonar system that can mimic biosonar in these respects would be capable of functioning in a broader range of environments and situations, such as dense foliage in air, or cluttered harbors in shallow water. Biomimetic sonar systems will ultimately bring advanced sensing and imaging capabilities to smaller autonomous systems and wearable augmented sensing devices for humans. In the very near future, a slew of new processing methods will be developed while attempting to replicate the neural information processing of the auditory sys- tem. Alongside these developments come the general advancement of neuroscience on the auditory system. The ability to truly understand and replicate the neural dynamics and architectures at various stages of the auditory system will bring new brain-machine interfaces for the hearing impaired. Advances in technology for speech recognition and synthesis are already showing promise for many commercial applica- tions, such as automated call routing, portable phone and GPS devices, and instant language translation. With the advent of such technological advancements, humans 8 are not far from the creation of fully autonomous systems and machines that hear, interpret, and produce sound in exactly the same manner as animals. 1.3 Dissertation Objectives and Overview The research objectives of this dissertation are to 1) improve our understanding of acoustic imaging in biosonar from an engineering perspective, and 2) apply this in- sight toward the development of a compact bio-inspired broadband sonar system. Chapter 2 presents the background information necessary for the rest of the disserta- tion. Chapters 3 and 4 are comprised of recently published methods for bio-acoustic analysis. In particular, Chapter 3 addresses the need for a set of new time-frequency analysis methods needed to study multi-harmonic waveforms, such as bat echoloca- tion signals. This new approach enables bioacousticians to perform multi-component signal analysis with improved resolution and accuracy. The robust method enables automatic extraction of useful information from a large ensemble of transmitted sig- nals. Chapter 4 describes the design and construction of an apparatus for capturing the beam patterns of bats’ consecutive transmit pulses with high fidelity. Also de- scribed is a method for processing the acoustic signals to reconstruct the beam pat- terns for visualization and further analysis. Such a system is unprecedented and will elucidate the dynamics of bats’ beam patterns during controlled echolocation experi- ments. Chapter 5 outlines a numerical model of the physical acoustics to understand the rich set of information available in broadband bio-acoustic echoes. The modeling approach is unique, because it is the first study of its kind to look in detail at how broadband signals are transformed in the frequency domain from sound emission to reception of echoes. This chapter also shows a simple method for first quantifying the achievable resolution and then analyzing the sensitivity of resolution to changing environmental parameters. A significant development here is to demonstrate that high-resolution can be achieved using only a few transducers without any complex 9 acoustic baffles. Chapter 6 presents a discussion on applications, future directions, and concluding remarks. Finally, Appendix A describes a biophysical model of the bat’s auditory peripheral system and demonstrates a simple example of event-based neuronal coincidence detection. References [1] W. Au and J. Simmons, “Echolocation in dolphins and bats”, Phys. Today 60, 40–45 (2007). [2] A. Surlykke and C. F. Moss, “Echolocation behavior of big brown bats, Eptesicus fuscus, in the field and the laboratory”, J. Acoust. Soc. Am. 108, 2419–2429 (2000). [3] A. Surlykke, P. E. Nachtigall, R. R. Fay, and A. N. Popper, eds., Biosonar, volume 51 of Springer Handbook of Auditory Research (Springer, New York) (2014). [4] D. Hartley and R. Suthers, “The sound emission pattern of the echolocating bat, Eptesicus fuscus”, J. Acoust. Soc. Am. 85, 1348–1351 (1989). [5] M. Aytekin, E. Grassi, M. Sahota, and C. Moss, “The bat head-related transfer function reveals binaural cues for sound localization in azimuth and elevation”, J. Acoust. Soc. Am. 116, 3594–3605 (2004). [6] N. H. Fletcher and S. Thwaites, “Obliquely truncated simple horns: Idealized models for vertebrate pinnae”, Acustica 65, 194–204 (1988). [7] J. A. Simmons, S. A. Kick, B. D. Lawrence, C. Hale, C. Bard, and B. Escudie, “Acuity of horizontal angle discrimination by the echolocating bat, Eptesicus fuscus”, J. Comp. Physiol. A 153, 321–330 (1983). [8] J. Wotton, T. Haresign, M. Ferragamo, and J. Simmons, “Sound source elevation and external ear cues influence the discrimination of spectral notches by the big brown bat, Eptesicus fuscus”, J. Acoust. Soc. Am. 100, 1764–1776 (1996). [9] D. H. Johnson and D. E. Dudgeon, Array Signal Processing: Concepts and Tech- niques (Prentice Hall PTR, Upper Saddle River, NJ) (1993). [10] C. Moss and J. Simmons, “Acoustic image representation of a point target in the bat Eptesicus fuscus: Evidence for sensitivity to echo phase in bat sonar”, J. Acoust. Soc. Am. 93, 1553–1562 (1993). [11] J. A. Simmons and J. E. Gaudette, “Biosonar echo processing by frequency- modulated bats”, IET Radar Sonar Navig. 6, 556–565 (2012). 10 [12] J. Wotton and J. Simmons, “Spectral cues and perception of the vertical position of targets by the big brown bat, Eptesicus fuscus”, J. Acoust. Soc. Am. 107, 1034–1041 (2000). [13] A. Papandreou, F. Hlawatsch, and G. Boudreaux-Bartels, “The hyperbolic class of quadratic time-frequency representations. I. Constant-Q warping, the hyper- bolic paradigm, properties, and members”, IEEE Trans. Signal Process. 41, 3425–3444 (1993). [14] A. Papandreou-Suppappola and L. T. Antonelli, “Use of quadratic time- frequency representations to analyze cetacean mammal sounds”, Technical Re- port 11,284, Naval Undersea Warfare Center, Newport, RI (2001). [15] H. M. Ozaktas, M. A. Kutay, and D. Mendlovic, “Introduction to the fractional Fourier transform and its applications”, Adv. Imag. Elect. Phys. 106, 239–291 (1999). [16] F. Auger and P. Flandrin, “Improving the readability of time-frequency and time- scale representations by the reassignment method”, IEEE Trans. Signal Process. 43, 1068–1089 (1995). [17] F. Auger, P. Flandrin, L. Qiang, S. McLaughlin, S. Meignen, T. Oberlin, and H.-T. Wu, “Time-frequency reassignment and synchrosqueezing: An overview”, IEEE Signal Process. Mag. 30, 32–41 (2013). [18] T.-S. Chi and C.-C. Hsu, “Multiband analysis and synthesis of spectro-temporal modulations of Fourier spectrogram”, J. Acoust. Soc. Am. 129, EL190–EL196 (2011). [19] L. Atlas and S. A. Shamma, “Joint acoustic and modulation frequency”, EURASIP Journal on Applied Signal Processing 2003, 668–675 (2003). [20] S. Bourennane and A. Bendjama, “Locating wide band acoustic sources using higher order statistics”, Applied Acoustics 63, 235–251 (2002). [21] S. Hiryu, M. E. Bates, J. A. Simmons, and H. Riquimaroux, “FM echolocating bats shift frequencies to avoid broadcast-echo ambiguity in clutter”, Proc. Natl. Acad. Sci. U.S.A. 107, 7048–7053 (2010). [22] N. Matsuta, S. Hiryu, E. Fujioka, Y. Yamada, H. Riquimaroux, and Y. Watan- abe, “Adaptive beam-width control of echolocation sounds by CF-FM bats, Rhi- nolophus ferrumequinum nippon, during prey-capture flight”, J. Exp. Biol. 216, 1210–1218 (2013). [23] L. Gao, S. Balakrishnan, W. He, Z. Yan, and R. M¨ uller, “Ear deformations give bats a physical mechanism for fast adaptation of ultrasonic beam patterns”, Phys. Rev. Lett. 107, 214301 (2011). [24] B. J. Fischer, L. J. Steinberg, B. Fontaine, R. Brette, and J. L. Pe˜ na, “Effect of instantaneous frequency glides on interaural time difference processing by au- ditory coincidence detectors”, Proc. Natl. Acad. Sci. U.S.A. 108, 18138–18143 (2011). 11 [25] R. Rao and T. Sejnowski, “Spike-timing-dependent Hebbian plasticity as tem- poral difference learning”, Neural Comput. 13, 2221–2237 (2001). [26] L. N. Kloepper, P. E. Nachtigall, M. J. Donahue, and M. Breese, “Active echolo- cation beam focusing in the false killer whale, Pseudorca crassidens”, J. Exp. Biol. 215, 1306–1312 (2012). [27] P. H. S. Jen, “Adaptive mechanisms underlying the bat biosonar behavior”, Front. Biol. 5, 128–155 (2010). [28] M. Aytekin, B. Mao, and C. F. Moss, “Spatial perception and adaptive sonar behavior”, J. Acoust. Soc. Am. 128, 3788–3798 (2010). [29] P. W. Moore, L. A. Dankiewicz, and D. S. Houser, “Beamwidth control and angu- lar target detection in an echolocating bottlenose dolphin (Tursiops truncatus)”, J. Acoust. Soc. Am. 124, 3324–3332 (2008). 12 Chapter 2 Background This chapter introduces background material relevant to the research found in the next several chapters of the dissertation. The first section reviews the general mam- malian auditory system, auditory cues for passive sound source localization, and specializations that enable bats to perform high-resolution active acoustic imaging. Following Section 2.1, Section 2.2 describes the current technological means of acous- tic imaging and contrasts conventional array signal processing with the biosonar solu- tion. The last background topic in Section 2.3 introduces the model-based approach to understanding and replicating biosonar and discusses recent progress in this area. 2.1 Acoustic Information Sensing and Processing by Mammals Acoustic waves are produced and sensed by nearly all motile animals. Sound provides a fundamental means of communication, detection and classification of predator and prey, localization of sound sources, and orientation relative to the immediate envi- ronment. Most animals rely upon sound for survival, but a select few have developed a refined sense of hearing. Nocturnal birds such as the barn owl excel at passive localization for capturing prey at night [1]. A specialized group of mammals (e.g. microchiropteran bats and odontocetes) have evolved to use acoustic waves as their primary active sense in the absence of visual information in the electromagnetic spec- trum [2]. These echolocating mammals have developed an extreme acuity and agility 13 with which their external world is precisely reconstructed from the stream of echoes received; however, the exact physical and neuronal mechanisms responsible for this precision are not well understood nor are they matched by any existing technological system. The following sections provide a brief overview of the mammalian auditory system, acoustic neural information processing, sound source localization by mam- mals, and specializations required for echolocation. 2.1.1 The Mammalian Auditory System The mammalian auditory system utilizes a complex set of sensory organs at its periph- ery – the external ear, ossicular chain, and cochlea – that are tightly integrated with neural circuitry in the cochlear nucleus (CN) by way of the auditory nerve (AN) fibers as illustrated in Figure 2.1 [3]. Originating in the CN there are multiple ascending, as well as descending pathways throughout the auditory system [4, 5]. While many of these pathways are monaural, there are several neural stages where specific nuclei in the midbrain receive bilateral input and integrate the information between the ipsilateral and contralateral auditory circuitry (e.g. superior olivary complex (SOC) and inferior colliculus (IC)). The entire auditory system from the cochleae up through the auditory cortex (AC) has a tonotopic organization where neural nuclei at specific regions of the brain appear to be spatially organized by frequency selectivity. The location where acoustic-to-neural transduction occurs is within the inner hair cells (IHC) of the cochlea [9, 10]. The primary information required to localize sound sources lies in the onset response of IHCs tuned to different frequencies [11]. AN fibers mark the onset of sound with a time-delay (i.e. first spike latency) that is related non-linearly to the acceleration of the acoustic pressure waves [12, 13, 14, 15, 16]. Subsequent neural spikes encode other features of the sound, such as duration, intensity, and relation to other frequency channels. Beginning with the AN fibers, all acoustic information is carried by neural spikes throughout the complex of neural pathways mirrored on either side of the brain. Neural spikes are essentially point 14 AC AC MGB MGB IC IC NLL NLL DNLL DNLL INLL INLL VNLLc VNLLc VNLLm VNLLm SOC SOC LSO ILD MSO ITD MSO LSO NTB NTB LNTB MNTB MNTB LNTB CN CN DCN AGC DCN PVCN Spectral PVCN AVCN AVCN c Timing AN b OHC d IHC a Cochlea Midline Cochlea Figure 2.1. The mammalian auditory system mapped from the cochlea to the cortex. Monaural and binaural projections from one cochlea are shown. Auditory input from the right cochlea has been omitted for clarity, but all pathways are mirrored across the brain’s midline. Excitatory and inhibitory synaptic connections are marked by triangles and bars, respectively. (a) Acoustic-to- neural transduction begins with the inner hair cells (IHC) of the cochlea. (b) The auditory nerve (AN) fibers respond to the neurotransmitter chemicals released by the IHC in response to sound pressure waves. (c) The cochlear nucleus (CN) receives all ipsilateral AN inputs in three subregions: dorsal, anteroventral, and posteroventral cochlear nucleus (DCN, AVCN, PVCN). (d) The DCN projects efferent connections to the outer hair cells (OHC) in the cochlea, which are thought to provide a mechanism for automatic gain control by amplifying the mechanical vibrations in the cochlea’s basilar membrane (BM). Numerous specializations have been identified in echolocating bats, including a significantly hypertrophied IC and peculiarly organized VNLLc [6, 7, 8] 15 processes, where the probability of a neuron firing a spike is proportional to the group activity level of attached synapses in the network. Acoustic events are encoded by the stochastic response of neural populations tuned to different amplitude ranges. To date, the relationship between morphological connectivity and physiological functions of the mammalian auditory system is not completely understood [17]. 2.1.2 Neural Information Processing by the Auditory System At the peripheral stage of the mammalian cochlea, acoustic information arrives rapidly compared to the time scale of a single neural spike [12]. To encode this information, AN fibers that innervate the cochlea must remain highly sensitive to acoustic stimuli, but this also increases spontaneous spiking (i.e. noise)[13]. To compensate for this, AN fibers are overrepresented at each narrow frequency band along the cochlea’s basilar membrane (BM). The frequency selective regions along the BM contain many redundant IHCs, and every IHC has many redundant AN fibers synapsed to it. As the BM is deflected in response to an acoustic wave, IHCs release bursts of neuro- transmitter, and AN fibers take up this neurotransmitter to respond with a spike sent into the CN [18]. The simultaneous coincidence of neural spikes from many redun- dant AN fibers is the reason the auditory system is able to encode precisely timed acoustic information. Coincidence detection is therefore a critical responsibility of the CN and it is performed through the population response of a large number of AN fibers – essentially averaging out the noise of spontaneous responses [19]. The CN is the gateway of acoustic information into the brain, because this is where all AN fibers innervate. If precision of spike timing is important anywhere in the brain, it is here in the CN, because once this precisely timed acoustic information is lost it cannot be recovered through any amount of data processing [20]. The CN contains an assortment of cell types, many of which are not fully understood [21, 22, 23]. Above the CN, a large portion of the neural complex in the auditory brainstem is used in the feedback necessary for motor control and does not contribute directly 16 to sound source localization; for example, reflexes controlling head aim or automatic gain control of the OHC in the cochlea and muscles [24]. There are a class of general models of neural information processing that are based on registering the timing of spikes across different neurons (i.e. coincidence detection cells) [25]. These models are usually put forth as generalized networks of cortical information processing using the timing of individual spikes across cells rather than conventional spike-rate codes [26]. The relevance of these models to auditory processing in the brainstem is that specific spike timing models have been proposed for the perception of sound pitch [15, 27, 14, 12, 28], for sound localization using interaural timing cues [29, 1], and for determination of target range of echo delay in bats [30, 31, 32] Many attempts have been made at understanding and quantifying the informa- tion content in neural spikes, particularly with respect to precise timing [33, 34, 35, 36]. Neural spikes must carry all information about peripheral stimuli throughout the brain and the brain must be able to interpret this information without any supplemen- tary guidance [37]. Synfire chains, for example, are models where spike timing plays a crucial role in self-constructing complex binding networks and compositionality [25]. Polychronization has also surfaced as a neural information processing mechanism that relies upon understanding the neuronal dynamics [38, 39]. Effectively, all spike timing models can be reduced to having coincidence detecting neurons at a higher level look- ing downward to detect the simultaneity of spikes along multiple inputs. For sound localization, even at the level of the AC, “spatial acoustic information is represented by relative timings of pyramidal cell output” [40]. 2.1.3 Auditory Cues for Passive Localization in Biological Systems Traditionally, the mammalian auditory system has been understood as having two primary methods for localizing sound sources: Interaural time difference (ITD) and interaural intensity difference (IID). Recent work has shed light on a third critical 17 piece of information, which is the angular dependent spectra of broadband sounds, also known as the head-related transfer function (HRTF) [17]. ITD is the relative time delay for a propagating sound wave to reach both ears. This delay is used by mammals to localize a sound’s point of origin. In perceptual tasks, human listeners are typically presented with sounds from an array of loud- speakers or a stereo headset and are asked to localize the source [41, 42, 43]. Based on early psycho-acoustic results from tonal stimuli, ITD was historically only consid- ered useful for frequencies with a wavelength greater than the distance between ears. The reason ITD works in these experiments is that the neural response to continuous tones can phase lock on each period of the wave and encode location based on the relatively small time difference between ears [29, 19]. Since the refractory period for neural spikes exceeds the time period for frequencies above approximately 1 kHz, ITD is generally considered useful for low-frequency sound source localization in the horizontal plane, or azimuth [44]. These ITD experimental results are not valid for sounds that occur naturally, especially for echolocation signals. The primary reason is that acoustic signals in nature are not continuous pure tones; but are instead short transient waveforms. For example, the broadband clicks produced by echolocating dolphins and short frequency modulated pulses by bats consist of frequencies well above the phase locking threshold, yet ITD is a crucial auditory cue for these animals. Such short transient signals contain very few cycles within a particular frequency band and there are not enough wave periods to phase-lock. Instead of phase locking, the auditory system encodes the onset response to these transient events with extremely high timing precision – approximately 100 µs [3] in a general mammalian model, which is 10 times less than the width of a single neural spike [15, 45, 46]. These acoustic signals arrive relatively sparsely in time, leaving sufficient margin for auditory neurons to recover from their refractory period before the next sound event. IID is the acoustic intensity difference between each ear and has been attributed 18 as a major auditory cue for high-frequency sound localization in azimuth. For humans, the head acts as an acoustic baffle, masking contralateral sound sources such that the two ears receive different amplitude levels. In other mammals commonly studied (e.g. cats and guinea pigs), the ears are positioned more dorsal and rostral than primates, so the head does not play as large of a role. Nevertheless, the structure of the external ear, or pinna, in many of these mammals can be reasonably approximated as obliquely truncated horns [47, 48]. These horns provide spatial directivity, which means that the amplitude of a sound wave changes depending upon the angle of incidence. Therefore, IID is manifested in these animals by the shape and orientation of the external ears that form acoustic receiving baffles. One notable problem with the basic concept of IID is that it does not encode sufficient information to localize sound sources in elevation. Most acoustic signals in nature are inherently broadband or at least contain some degree of harmonic structure and span multiple frequencies. When a signal arrives at the ears, each acoustic baffle modifies the sound by encoding unique spectral characteristics for any given angle. Therefore, to localize sounds in elevation, the full spectrum of a received sound is compared with the a priori spatial intensity patterns of the ears, which is the HRTF [49]. The HRTF is a complicated function of frequency and angle, but this complexity is necessary to encode a unique spectrum for any particular direction, either monaurally or binaurally. One important piece that is missing from the truncated horn model is the tragus, which encodes notches specifically used for vertical localization [50, 51]. The full spectral characteristics of the HRTF are not only useful for localization in elevation, but also azimuth and range. 2.1.4 Specializations for High-Resolution Active Acoustic Imaging The passive localization cues as described above are commonly exploited by many species [52, 28]. The active perception systems of echolocating bats, dolphins, and whales have improved upon passive hearing mechanisms by broadcasting high fre- 19 quency acoustic sounds into the environment, whose echoes can then be accurately localized. In this sense, acoustic echoes are just sound sources originating from many different reflecting objects. Thus, echolocation enables precise control over the acous- tic localization process and results in high-resolution spatial images from the contin- uous flow of information [2]. From the same basic mammalian auditory system, echolocators have evolved to fit the specific needs prescribed by individual echolocation strategies [53]. The types of specializations extend from the physical acoustic baffles of sound reception and transmission [47, 54], to the specific waveforms used for echolocation [55], and even throughout the brain at the various neural complexes [6]. These biological specializations can be thought of as an iterative process of design optimization. The biosonar optimization criteria are not just maximizing performance (e.g. acoustic field-of-view, spatial resolution, signal-to-noise ratio); an equally important criterion for animals is minimizing the energy required to achieve “good-enough” performance. As a result, evolution has produced significant biodiversity in echolocating mammals while still maintaining the minimalist approach to acoustic design. The sound production mechanisms are one of the most important developments for echolocation. Marine mammals such as dolphins and toothed whales produce sound through a highly unique structure in the melon of their head [56, 57]. The intense sounds are produced pneumatically by forcing air through a set of phonic lips, recapturing the air held in sacs, and repeating the process. The broadband echolocation signals are best described as short transient “clicks” that are typically on the order of 10 to 100 µs in duration. The sound pressure waves are guided by bone and tissue through lipids, or acoustic fats, in the melon where it is then prop- agated outward into the water [58, 57, 59]. Bats have evolved their echolocation strategies to fit a particular foraging environment [47]. The result is an extremely diverse set of acoustic baffle structures and echolocation waveforms. For example, to augment their vision Egyptian fruit bats (Rousettus aegyptiacus) echolocate using 20 broadband transient “clicks” of their tongue [60]. Other bats (mostly from the subor- der Microchiroptera) emit a variety of frequency modulated signals using the larynx through either the oral or nasal cavities. The noseleaf structures of nasally emit- ting bats are notoriously complex and prominent [61, 62]. The types of echolocation waveforms may be classified as frequency modulated (FM), constant frequency (CF), or both (CF-FM) [63]. CF waveforms are useful for bats detecting Doppler shifts from moving prey in an open environment [6]. FM waveforms provide excellent range resolution and are better suited for operating in densely cluttered environments, but are Doppler invariant [64]. The echolocation signals produced by both bats and dol- phins are usually stereotypical such that a particular species can be identified by the characteristics of its time-frequency signature. The reception of acoustic waves by echolocating mammals is hyper-sensitive [65]. Although the sounds emitted for echolocation are generally high intensity, the re- flected signals that return to the ears are many orders of magnitude lower. The dissipation and absorption of acoustic energy enforces an upper limit on the useful range of animal echolocation. To compensate, echolocators have evolved auditory systems with high sensitivity and large dynamic range. Many of these specializations exist within the brain, such as an overrepresentation of AN fibers in the cochlea, hypertrophied auditory nuclei (e.g. IC, CN, and LL) [6], and extreme timing preci- sion at the early neural processing stages [7]. Other specializations appear obvious, such as acoustic baffles and directivity patterns that are well matched to the emitted sounds [47]. Perhaps not-so-obvious is the mechanism by which underwater marine mammals receive acoustic echoes. Although the topic was historically controver- sial [66, 67, 68, 69, 70, 71], dolphins and toothed whales receive sounds bilaterally at the mandible. The hollow bone structures form an acoustic waveguide for sound pressure waves to travel within acoustic fats and to each inner ear [58, 57]. There are certainly many other neurological and anatomical specializations for echolocation that have yet to be discovered. 21 The role of vision in echolocating animals depends upon the species. Some mammals (i.e. Megachiroptera and Delphinids) rely a great deal on vision for guid- ance, foraging, and other routine behaviors. However, animals that must function in the complete absence of light use their auditory system as the primary sensory modality. In these animals vision can still aid the senses to some degree, but the en- vironment is actively probed and perceived through sound. A fundamental question is, what do these animals “see” in terms of acoustic images and how does it differ from vision? Spatial resolution provides a direct measure of the three-dimensional image quality perceived by echolocating animals. In this context, resolution is the minimum spacing between two distinct acoustic echoes that can be unambiguously differenti- ated [72]. Spatial resolution is typically characterized by three separate, but related quantities: Angle, range, and range-rate (i.e. Doppler) [73]. Angular resolution can be further separated by azimuth and elevation. Echolocating mammals such as the big brown bat (Eptesicus fuscus) and the bottlenose dolphin (Tursipos truncatus) are well-known for their high-resolution sonar systems, especially in range [2]. Although high-resolution is a subjective term, in the context of biosonar it refers to the abil- ity of an echolocating bat, dolphin, or whale to perceive spatial images with greater detail than a man-made sonar given the same set of signals and acoustic apparatus. One aspect of echolocation that has been studied extensively is the extreme range-resolution for bats [30, 74, 75, 76, 77, 78, 79, 80] and cetaceans [59, 81, 82]. When two or more acoustic waves overlap in time, they constructively and destruc- tively interfere to produce spectral interference patterns. The big brown bat (E. fuscus) exploits these patterns of interference to deconvolve the echoes and produce a “hyper-resolution” image in range. These broadband spectral patterns have been shown to persist throughout the auditory system in this species [83, 84, 85] and appear to contribute reliable information to the bat’s acoustic imaging process. Angular localization, in general, has been studied behaviorally [86, 87, 88, 89, 22 90], analytically [91], and computationally [58, 92, 93, 94]; however, angular perfor- mance in the presence of multiple closely-spaced targets (i.e. angular resolution, as defined above) has not been a primary focus. Nevertheless, a few experiments do exist where angular resolution was directly or indirectly measured in E. fuscus [95, 96] and T. truncatus [86]. Behavioral evidence has shown that E. fuscus primarily utilizes the spectral notches encoded by its HRTF to encode elevation information [51, 87, 88, 89]. In addition, recent work has shown that off-axis echoes of echolocation signals can be completely rejected even when overlapping in time [97, 98]; an echolocation version of the cocktail-party problem. Decades of behavioral studies have been performed on bats, dolphins, and whales to provide additional clues about the resolution limits of echolocation. Unlike bats, however, echolocation research in marine mammals is restricted to behavioral tasks and infrequent necropsies from strandings. Furthermore, the costs associated with marine mammal research are much greater, because of substantial investment in acoustic facilities, the larger physical size of the animals and all their supporting equipment and food, and the difficulties with testing in an aquatic environment. For these reasons, significantly more is known about echolocation in bats; specifically the neurophysiological and morphology of the auditory system. Regardless of the type of echolocation waveforms used by bats (i.e. CF or FM), a common signal characteristic is the presence of multiple harmonics. Multi- harmonic waveforms have the advantage of increasing the natural bandwidth of a signal to one or more octaves, significantly improving performance in range [72]. The relative phase coherence between harmonics in an echo is also important for angular imaging [99]. Furthermore, given that broadband spectral information are the only known mechanism by which bats can localize echoes in elevation, it seems unlikely that they would successfully evolve by emitting a narrowband CF pulse having only a single component – exactly the type of waveforms that pervade man-made sonar. 23 2.2 Acoustic Imaging in Technological Systems The technological development of acoustic imaging was borne out of necessity. In seawater, the electromagnetic radiation spectrum is significantly attenuated by the density of the medium [100, 101], which means that neither the visible light spec- trum nor radio waves are useful beyond very short distances. This fact is particularly troublesome in naval applications, where information is critical to situational aware- ness for large ships, submarines, and unmanned undersea vehicles. The problem is addressed by using acoustic waves since they propagate quickly over long distances, exhibit strong reflections, and pass relatively uninhibited in the dense medium [102]. The invention of piezoelectric materials enabled the design of acoustic trans- ducers to convert electrical signals into sound pressure waves and vice versa. Early devices were fairly basic and consisted of a single source and receiver that permitted echo ranging in the open ocean [103]. With the coupling of multiple piezoelectric sensors came the advent of array signal processing and the ability to produce cross- range images of objects from sound waves [102, 104]. Apart from its undersea origins, acoustic imaging has found uses in a wide range of applications such as biomedical diagnostics, geophysical tomography, and devices for the visually impaired. 2.2.1 Conventional Array Signal Processing Array signal processing is the method used to produce images from an array of dis- crete acoustic elements. The critical piece of information to localize sound sources is the relative time delay of acoustic waves as they propagate across the entire array. With knowledge of the array geometry and the speed of sound propagation, pressure waveforms at each transducer element can be delayed in time and summed to cor- respond with any incident direction (defined as the steered angle). This concept is known as a delay and sum beamformer and represents the most basic idea in acoustic imaging. 24 When an acoustic wave arrives from a direction matching the steered angle, the correlated signals combine additively and the beamformer produces the strongest response. An acoustic wave arriving from some different angle will not align properly and the beamformer produces a weakened response due to lack of correlation. Noise, which can be acoustic, thermal, or electronic will not produce a strong response unless it is correlated in time between elements. For example, in the presence of uncorrelated ambient noise, a correlated signal across N array elements will have an improved signal-to-noise ratio (i.e. array gain) of 10 log10 N , one important advantage to using an array [105, p. 306]. In practice, a beamformer is almost always implemented in the frequency do- main [106, 107], since discrete-time delays would require high-order interpolation or fractional delay filters [108]. The response of an N element array at frequency, f , for the steered angle, θ, is computed as N X Y (f, θ) = dj (f, θ)wj Xj (f ) (2.1) j=1 where dj (f, θ) is the delay (frequency and angle dependent) of the j th element, wj is the aperture shading coefficient applied to element j, and Xj (f ) is the frequency domain data of the j th element [109, Ch. 4]. In the frequency domain, dj (f, θ) is a phase shift that is equivalent to the time delay relative to some fixed point on the array, given f and θ: dj (f, θ) = e−ik∆j . (2.2) Here, k = 2π/λ is the acoustic wavenumber and ∆j is the distance from element j to a fixed reference point along the projected direction θ. ∆j = ~δj · ζ~ for distance vector, ~δj , and unit vector, ζ~ = eiθ . In matrix form, Equation 2.1 simplifies to 25 Y (f, θ) = df (θ)WxTf (2.3) where df (θ) is the 1 × N steering vector of complex phase delays, W is a diagonal N × N aperture shading matrix, and xf is the 1 × N complex data vector (T denotes the transpose), all corresponding to frequency, f [110]. 2.2.2 Beam Patterns and Angular Resolution A commonly used method to describe an array’s imaging performance is through the directivity, or beam pattern. The beam pattern of an array is simply the beamformer’s angular response to an ideal unity-power acoustic source located in the direction of ψ. This can be computed by replacing the complex data vector, xf , in Equation 2.3 by the complex steering vector, df (ψ): D(f, θ) = df (θ)Wdf (ψ)T . (2.4) For a line array, when df (ψ) is steered to 0◦ all of its elements are equal to 1 and we are left with the array’s natural response, D(f, θ) = df (θ)W. Figure 2.2 illustrates the beam pattern of an N = 10 element uniformly-spaced line array steered to 0◦ and 45◦ at two different frequencies. With proper element spacing, d ≤ λ/2, the beam pattern response is approximately D(f, θ) = sinc(L/λ cosθ), for an array aperture length, L = d(N − 1). A phase-delay beamformer is equivalent to applying a Fourier transform in the spatial domain. As such, the discrete elements suffer from spatial aliasing in exactly the same way as a signal sampled in the time domain. The presence of grating lobes is simply an aliasing artifact introduced by designing an array with improper element spacing (d > λ/2). The consequence is that there will be ambiguity regarding what angle the sound wave originated from. There is also a direct corollary between the 26 Beam Response (ψ=0°, N=10, d=1.72cm) Beam Response (ψ=45°, N=10, d=1.72cm) 10 10 A C Mag. (dB) Mag. (dB) 0 0 −10 −10 −20 −20 −80 −60 −40 −20 0 20 40 60 80 −80 −60 −40 −20 0 20 40 60 80 10 kHz 60 kHz 10 kHz 60 kHz 1 1 B D Amplitude Amplitude 0.5 0.5 0 0 −0.5 −0.5 −1 −1 −80 −60 −40 −20 0 20 40 60 80 −80 −60 −40 −20 0 20 40 60 80 Bearing Angle, θ (deg.) Bearing Angle, θ (deg.) Figure 2.2. Beam patterns are the angular response of an array due to the presence of an ideal acoustic source located in the steered direction. They are traditionally plotted on a log-magnitude scale and phase is ignored, but in reality the response exhibits a 180◦ phase reversal when the amplitude response becomes negative. Shown here are example beam patterns in air from a line array of N = 10 omni-directional elements that are spaced at d = 1.72 cm. Steer angles are plotted for 0◦ (a - log, b - linear) and 45◦ (c - log, d - linear). No aperture shading function is applied to this example, so W is the identity matrix. Each plot shows two different frequencies, 10 kHz (blue) and 60 kHz (green), which correspond to proper element spacing of λ/2 and undersampled spacing of 3λ, respectively. The width of the main lobe is one measure of angular resolution. Although the 60 kHz pattern has better resolution, the elements are not spaced properly and the result becomes ambiguous due to grating lobes. Regardless of frequency, the main and sidelobe responses are wider at angles off to the side. This is due to the effective array aperture decreasing with the cosine of the angle, θ. window function used for spectral analysis and the array aperture shading function used in array signal processing. Selecting the aperture shading weights is a tradeoff between mainlobe resolution and sidelobe reduction [111, Ch. 10]. The angular resolution of a uniformly spaced line array can be defined as the minimum angular spacing between two point sources of equal strength, whereby both can be simultaneously resolved [109, p. 142]. This limit occurs at the half-power beam width, β, of the beam pattern’s mainlobe and can be approximated through series expansion [110] as     −1 λ −1 λ β(ψ) ≈ sin cosψ − γwin − sin cosψ + γwin (2.5) L L where γwin is an aperture shading constant (e.g. γwin = 0.402 for uniform weighting; γwin = 0.484 for 26-dB Chebychev weighting). L and λ are the array aperture length 27 and wavelength, as defined previously. As seen in Figure 2.2, β is dependent upon the steer angle, ψ. The maximum achievable resolution for a line array is when ψ = 0◦ :   −1 λ β3dB ≈ 2sin γwin . (2.6) L These equations show that resolution of an array is critically dependent upon the ratio, λ/L. By increasing the aperture of the array, L, this improves the resolution by reducing the width of the main lobe. Alternatively, resolution can be improved by increasing the operating frequency, thereby reducing λ. It is clear that improv- ing resolution requires adding more elements, finding ways to increase the effective aperture, or handling insufficient element spacing in some other way. Under conventional beamforming, acoustic imaging is achieved by iterating the beamformer through multiple overlapping angles, θ, and repeating over subsequent time windows.1 The magnitude of each complex result from Equation 2.3 is plotted at the corresponding range and angle to produce an image of the spatially distributed acoustic energy. Figure 2.3 shows an example of high-resolution acoustic imaging in the range-azimuth plane produced by beamforming the underwater sonar data from a shipwreck. Images from consecutive transmit-receive cycles on a moving vehicle can be stitched together to map a much larger area. This concept of acoustic imaging with conventional beamforming is readily extended to a second angular dimension (e.g. azimuth and elevation). There is an impressive amount of literature on the various theories, methods, and implementations that improve upon classical array signal processing as described above. Some noteworthy techniques are sub-optimally spaced arrays (e.g. sparse, co- prime, Costas) [112, 113, 114, 115], synthetic aperture sonar (SAS) [116, 117, 118], and monopulse direction finding [119, 120]. Other methods, such as split aper- 1 Time, t, corresponds directly to range, r, in an active sonar system. The translation is r = tc/2, where c is the speed of sound in the medium and the factor of two accounts for the two-way propagation path. 28 Figure 2.3. The concept of acoustic imaging in the range-azimuth plane is demonstrated using ac- tive underwater sonar data collected from the site of a shipwreck in Narragansett Bay, Rhode Island. The sonar array (SeaBat 7130 prototype, Teledyne-Reson, Denmark) is a forward-looking 635 kHz line array with N = 256 elements spaced at λ/2 (d = 1.1 mm, L = 0.3 m). The active transmit waveform is a 17 ms, 30 kHz linear FM pulse (4.7% bandwidth-to-center-frequency ratio). This image was produced from a single transmit-receive cycle (66 ms) using a phase-delay beamformer and has 0.48◦ angular resolution at θ = 0◦ . Brightness in the image corresponds to the beamformer’s magnitude response when steered at a particular range and azimuth. The brightest locations are specular reflections alongside the ship’s hull and the darker red areas consist mostly of returns from the sea floor. Two faint rings of energy can be seen around 17 and 21 m, which are caused by the most intense ship reflections being present in the sidelobes when steered to other angles. The large, well-defined dark region behind the ship is an acoustic shadow created from the occlusion of acoustic energy by the ship. Note that the beams are only steered to ±60◦ due to limited transmit beam coverage and widening receive beams. Data were collected and processed by the Naval Undersea Warfare Center, Newport, RI. ture processing [105, p. 329] and Vernier interferometry [121, 122], are based off of the narrowband phase comparison between widely spaced elements. A variety of high-resolution techniques have been applied successfully, but performance de- grades when their many assumptions break down (e.g. minimum variance and adap- tive beamforming [105, 123, 124], eigenvector and multiple-signal classification (MU- SIC) [102, 105, 125, 126], and matched field processing [127, 128]). There have also 29 been some interesting departures from the traditional line-array concepts; in partic- ular, blazed arrays [129, 130] and vector sensing2 [133, 134]. This is by no means an exhaustive list of existing high-resolution angular techniques in array signal pro- cessing. A full review of this field lies beyond the scope of this section, but we can generalize many of these methods with respect to their intended goals and the infor- mation they use for acoustic imaging. Array signal processing traditionally uses the signal correlation and time delay between elements to localize sound sources and perform acoustic imaging. Many of the advanced techniques mentioned above serve to improve array resolution beyond the aperture constraints in Equations 2.5 and 2.6. They often achieve these performance gains at great cost by increasing the effective aperture, synthesizing more elements, or taking advantage of destructive interference of grating lobes and sidelobes. By contrast, biosonar uses very broad beam patterns and exploits the additional infor- mation contained in broadband, multi-harmonic signals. This enables bio-inspired broadband sonar to achieve high-resolution acoustic imaging with extremely small apertures and a minimal number of sensors. In this manner, biosonar represents a significant departure from the conventional approach that is in common use today. 2.3 Model-Based Approach to Bio-Inspired Acous- tic Imaging The model-based approach is a generic term used to describe numerical solutions to a variety of signal processing problems [135]. Models that include additional infor- mation about a physical process and its dynamics should, in theory, improve overall performance. These models usually consist of linearized systems, such as linear and adaptive filters [136, 137]; state-space estimation, e.g. Kalman filtering and its many 2 Most piezoelectric sensors are simple pressure-field measurement devices, while vector sensors measure both magnitude and direction from the particle velocity component of the acoustic wave. Since mammalian ears do not have a means of measuring particle velocity [131, 132], this additional information is explicitly omitted from further consideration. 30 adaptive and non-linear variants [138, 139]; statistical processors, like Markov chains and support vector machines [140, 141]; or neural networks, including classical firing- rate based and dynamical spiking neural models [142, 39]. The model-based approach may be used for creating new technological systems, or to better understand an ex- isting physical system. In the context of biosonar, we are interested in using the model-based approach for both purposes – gaining insight about animal echolocation and applying this toward development of new innovative acoustic imaging systems. 2.3.1 Auditory Modeling Insights and Oversights with Filter Banks Auditory modeling has embraced the idea of using parallel banks of linear filters to mimic the frequency selectivity of the cochlea’s mechanical response. To construct these filter banks, hearing researchers began mimicking the physiological and psycho- logical findings from various auditory studies in humans, cats, and guinea pigs. Early attempts to capture the critical bandwidth and asymmetrical roll-off characteristics of hearing used low-order band-pass Roex and Gammatone filters [49], which have an infinite impulse response (IIR). These filter designs are purely linear and time- invariant models of the cochlear mechanics. As neurophysiology provided new insight about the active non-linear feedback processes of the OHCs, more complicated filter shapes emerged; such as the Gammachirp and Dual-Resonance Non-Linear (DRNL) filters [143, 144]. These filter types expanded upon the existing models by including time-variant compression that is based upon the amplitude of the acoustic stim- uli [17]. Filter banks have become ubiquitous in many aspects of auditory research, from human audition to bat echolocation, and they remain a highly valuable tool for learning about how the auditory system encodes acoustic information. Using filter bank models, the benefit of decades of linear systems theory can be applied. There are unfortunately some drawbacks to this tool as well. One problem with using a filter bank model of the cochlea is the phase response. Great care has been taken to capture the exact amplitude response of these band-pass 31 auditory filters, yet little or no attention has been paid to the phase response of the filter and, perhaps most importantly, the implications for group delay. Figure 2.4 shows the frequency response of an auditory filter bank for the ultrasonic range of frequencies between 20 kHz and 100 kHz; those relevant to biosonar hearing in bats and cetaceans. Filters are usually spaced on a logarithmic frequency axis to reflect the distribution of neurons in the auditory system [4, 145]. The magnitude response matches fairly close to what has been found in other mammals at reasonable sound intensities. The phase response varies predictably near the poles and zeros of each band-pass filter such that the phase response changes most rapidly within the pass- band. The group-delay of a filter is simply the negative derivative of the phase response and can be understood as the literal time-delay of a signal passing through the filter. Signals passed through the filter bank will be amplified or attenuated based on the magnitude response, but delayed in time according to the group delay. If the group delay varies over frequency, signals with any bandwidth will become dispersive in time. This artifact becomes important when modeling the auditory system’s response to complex acoustic signals. In many cases, using an auditory filter bank is an appropriate model of the cochlea; however, accounting for phase is especially important when modeling a broadband system like the bat’s that can process information down to the microsecond [74, 146] or even nanosecond scale [147, 148]. 2.3.2 Signal Processing Models for High-Resolution Range Estimates Some of the earliest computational modeling work related to bat echolocation was de- veloped to explain the results of hyper-resolution experiments on range-discrimination [74, 146, 147]. These behavioral experiments were highly controversial [75], because they showed that bats were clearly achieving timing resolution well beyond what was thought possible (at the time) by neural coding in the auditory system [150]. Many questions about the neural mechanisms remain unanswered, even decades later. Nev- 32 Gammatone Filterbank Frequency Response Gammatone Filterbank Group Delay Magnitude (dB) 300 100 0 A C 85 −20 250 −40 72 Group Delay (µs) −60 62 200 −80 53 0 25 50 75 100 125 150 fc (kHz) 150 45 0 38 B 100 Phase (°) −180 32 −360 28 50 −540 23 −720 0 20 0 25 50 75 100 125 150 0 25 50 75 100 125 150 Frequency (kHz) Frequency (kHz) Figure 2.4. The gammatone filter bank is an example of an auditory cochlear model that is commonly used in hearing and echolocation research. Each band-pass filter represents the vibratory motion at a single physical point along the basilar membrane (BM) of the cochlea. This location is where numerous afferent AN fibers synapse with each local cluster of IHCs and translates BM displacement to neural spikes. (a) The magnitude response shows the logarithmic spacing of a gammatone filter bank designed from 20 to 100 kHz. The bandwidth-to-center-frequency ratio is normally kept constant to match the widening of the auditory critical bands at higher frequencies. This consistent ratio also ensures a constant overlap between filter channels. Only 11 channels are shown here for illustration, but practical models of bat echolocation require at least 80 channels per ear [149]. (b) The phase response, φ(f ), varies significantly in the pass-band of each filter. (c) Group delay, which is the negative derivative of phase (− dφdf ), is a commonly overlooked artifact of using a linear filter model. The consequence of non-constant group-delay is that broadband signals become dispersive within and between channels – that is, they are delayed in time by different amounts depending on frequency. This effect can have unknown consequences for auditory modeling, especially since the interaural time delay for a bat (0 to 40 µs) is one to two orders of magnitude lower than the group delay for a gammatone filter bank. Color is used to separate overlapping lines and corresponds to the center frequency of each filter channel. ertheless, signal processing models were developed to understand how animals might be achieving hyper-acuity in the range dimension. The Spectrogram Correlation and Transformation (SCAT) receiver [78, 80] is a biosonar model that mimics the echolocating bat’s hyper-resolution of a closely spaced pair of point scatterers. SCAT was the first known computational model that attempted to mimic bat echolocation based upon experimental evidence of neural information processing. SCAT has served as the basis for many later models of bat echolocation, and therefore requires a short description of how it functions. Figure 2.5 shows a block diagram of the monaural model, which includes a constant-Q filter bank to separate time series auditory input into multiple narrowband channels and convert time series waveforms into neural spikes. Following the cochlear filter bank 33 are two distinct spectrogram functions (correlation and transformation) that operate in parallel across all frequency channels. Figure 2.5. Block diagram of the Spectrogram Correlation and Transformation (SCAT) receiver model. Time series data enters the model through the cochlear filter bank, which consists of 2nd order Butterworth band-pass filters (hyperbolically spaced) followed by half-wave rectification, non- linear compression, and low-pass filtering (RCF) for each frequency channel. Neural spikes are produced at the output of each frequency channel to mimic information encoded by the auditory nerve fibers. The spectrogram correlation block produces a response with course echo resolution for detection. Once an echo is detected, the spectrogram transformation block is triggered to split this echo into multiple high-resolution echoes by a process of spectral deconvolution. The result is a hyper-resolution receiver that exceeds the resolution of a conventional cross-correlation receiver. The spectrogram correlation block takes the narrowband spike events and per- forms the neural equivalent to a parallel cross-correlation in time. When an echolo- cation pulse is emitted, it triggers a broadband onset response across all frequencies. Any echoes received will also produce a broadband onset response at the appropriate time delay. The coincidence of spikes across multiple channels indicates the reception of one or more target echoes. Due to the inherent time-delay in the FM signals, some narrowband frequency channels will spike earlier than others. This apparent incoherence (or time separation) across channels will match the incoherence between the outgoing pulse and any received echoes, thereby eliminating the need to de-chirp received signals. Although the detection of a single pulse-echo pair is sufficient to estimate target range, the SCAT receiver goes further to deconvolve the spectral information into hyper-resolution images. Closely spaced point targets will produce acoustic echoes that overlap in the time-frequency plane. When this occurs, deterministic interference patterns arise in the form of spectral notches. Each pair of echoes separated in time 1 by ∆T produces the first notch at f0 = 2∆T , and subsequent notches at intervals of 34 1 fj = fj−1 + ∆T for j = 1, 2, 3 . . . For signals with bandwidth between 20 and 100 kHz, these spectral notches occur for ∆T > 5µs = 1.7 mm until the echoes no longer overlap in the time-frequency plane. Unlike a traditional cross-correlation receiver, the spectrogram transformation block uses this additional spectral information to produce fine delay estimates. In the original SCAT model, the spectrogram transformation block is imple- mented as a “voting mechanism” with a set of cosine basis functions. Each frequency channel contains its own unique basis function with a period proportional to the center frequency of the filter. The amplitude of a basis function was scaled by the received echo level in each frequency channel. Despite its simplicity and lack of biological rel- evance, the summation across all channels produces impulses at the correct locations of two overlapping spikes. As pointed out by Peremans and Hallam [151], the SCAT model incorrectly estimates the times of two echoes having different amplitudes and produces artificial phantom echoes. Even with these nonlinearities, the SCAT model remains one of several models to date that can replicate bats’ hyper-resolution images of two-point targets. A recent review by Park and Allen [152] has likened the spectrogram transforma- tion process to a pattern recognition problem, where notches are actively detected and matched to corresponding time delays. This is in contrast to the original model that detects spectral energy and simply ignores the contributions from channels containing spectral notches. The cosine basis functions in the spectrogram transformation block produce many oscillatory peaks that can be incorrectly classified as point targets. Park and Allen proposed a method to suppress these unwanted peaks by predicting their locations and canceling them out. The goal of this process is comparable to the way interference cross-terms in a Wigner-Ville time-frequency distribution are smoothed [153]. Just as in Wigner-Ville smoothing, we sacrifice some resolution for reduced cross-term interference. Since SCAT was first published, other models have emerged that take on the 35 idea of spectral deconvolution for hyper-resolution range estimates. For example, Sanderson and Neretti used auditory filter bank models to address the question of biological relevance of the SCAT model [77, 76, 154]. By modifying the low-pass smoothing parameters at the RCF stage, they found that despite the low-temporal resolution of higher cortical areas in auditory system, there is indeed sufficient infor- mation across the time-frequency representation to register the interference patterns of two or more closely spaced echoes. Matsuo has applied Gaussian chirplet filter- banks [155] to the two-point resolution problem without relying upon an acoustic- to-neural transduction component [156, 157, 158]. More recently, Sharma and Buck proposed the variable resolution detection receiver (VRDR) without requiring filter banks [159, 160]. The VRDR model approaches the ideal impulse resolution of an inverse filter while maintaining a stable filter that can adapt to noise levels using a tuning parameter. Many of these modeling developments have focused on the prob- lem of achieving greater range resolution based on the hyper-resolution exemplified by echolocating bats. An equally intriguing problem is how echolocating animals are able to achieve hyper-acuity in angle. 2.3.3 Models for Angular Target Localization and Acoustic Imaging A binaural version of SCAT, named Artificial SCAT, was created to reconstruct two- dimensional images of simple objects in the range-azimuth plane [79]. The superior range resolution allowed two separate SCAT processes to be used to localize in az- imuth by comparing ITD. Echoes from wires and spheres were recorded using a pair of microphones and a loudspeaker. The stereo time series recordings were presented to the SCAT processing model one channel at a time and triangulation with intersecting ellipses generated the 2-dimensional images from each time series signal. Although implementation details were not published, some of the range-azimuth imaging re- sults were made available [80]. Other binaural sonar models that explicitly use ITD for angular imaging have appeared in the literature [158, 161, 162]. These models 36 take advantage of the large bandwidth that yields improved range resolution, but additional spectral information is useful to improve azimuthal performance and is absolutely necessary for localization in elevation. Only recently have models begun to include spectral cues in the source localization process, including azimuth and el- evation [11, 163, 93], but many of these models abandon the filter bank approach in favor of more traditional signal processing tools. 2.3.4 Mathematical Models of Echolocation Performance Taking a systems of systems approach to biosonar modeling and not concerning our- selves with the complexities of the brain can prove useful. There have been several interesting mathematical models published that aim to provide an explanation of echolocation performance by animals. In one of the earliest (and possibly most il- luminating) mathematical studies on a binaural sonar system, Altes calculated the Cramer-Rao lower bound (CRLB) for azimuth and elevation, and derived the max- imum likelihood estimator based on these results [91]. This analytical model found that azimuth localization accuracy is not only a function of ITD and SNR, but also of the gradient (i.e. sensitivity) of the magnitude and phase of broadband beam patterns versus angle. Since this work was ahead of its time, it did not include a numerical analysis with any measured biosonar beam patterns that have become available. Although the spectral effects for both, transmit and receive beam patterns were considered, none of the frequency-dependent effects in signal propagation were included. This particular study was limited to the accuracy of angular localization rather than resolution, which is required for acoustic imaging in densely cluttered environments. Altes does briefly comment on the subject of resolution, “Accurate unambiguous azimuth resolution can be obtained with only two transducers, even if the beam patterns of the transducers are very broad. It is only necessary to utilize a wide-band signal with an autocorrelation width that is narrow relative to the distance between transducers.” 37 With advances in computed-tomography and computational power, finite-element methods were pioneered to estimate the complex spectral properties of HRTFs [164]. With these new techniques, high-resolution HRTF models of bats’ pinnae and nose- leaves can be quickly assembled into libraries [47]. The HRTF libraries can be used for high-fidelity acoustic simulations, or quantifying the spectral information by the CRLB [165] or information theory [166, 167]. The information theoretic approach has also been used to evaluate performance of bio-inspired processing with conventional transducers [168]. 2.3.5 Hardware Prototypes as Exploratory Models As stated previously, modeling can lead to many insights into a problem if done properly. Unfortunately, models may also mask the true phenomenon of interest. In this vein, taking real acoustic measurements and constructing biomimetic prototype systems are necessary to test and verify models in the real world. Hardware prototypes are also the first step toward creating autonomous biomimetic sensors that can operate in real-time3 . Over the past 15 years, biomimetic sonar models have appeared on integrated circuits [169, 170, 171]. All-digital field-programmable gate arrays (FPGA) are ap- pealing for the real-time implementation of auditory filter banks, because of the sheer number of parallel computations required [172]. Unfortunately, neural information processing on digital hardware is computationally expensive and makes inefficient use of resources. This is the primary reason that very-large scale integrated (VLSI) analog circuits have appeared for various bio-inspired computations (e.g. echo ranging with delay lines [173, 174, 175], azimuthal localization using IID cues [176, 177, 178, 179], binaural comparison of spectral cues [180], and spike-based neural information pro- cessing [181, 182]). 3 Real-time has many interpretations that depend on the context. For a biosonar signal processor, real- time should be defined as having sufficient data throughput such that a bottleneck is never reached and latency that allows adequate response time to real-world events. 38 Various bio-inspired robotic sonar systems have been developed, which can be grouped by the basic set of information used for localization. Kuc used ITD with a simple pair of circular aperture receive transducers to localize and classify objects in realistic environments [183, 184]. Although only ITD was used for localization, the transducers were oriented off-axis so that a comparison between the broadband time-based signals could be used to perform classification. Schillebeeckx and Pere- mans have applied Bayesian probabilistic techniques [185] and maximum likelihood estimation (MLE) [186] to the localization problem from binaural HRTF. Using the spectrum of an emitted sound in a different manner, Guarato et al. showed that es- timating source orientation is possible [187]. Combining the concept of sparse arrays and bio-inspired processing, Steckel and Peremans used bandwidth to average out grating lobes over multiple frequency octaves [188, 189, 190]. A model and hardware processor was also created for simultaneous localization and mapping for guidance and control of a robot [191]. Each hardware prototype has individual merit, but together they demonstrate the clear advantages of biosonar acoustic imaging. References [1] W. Gerstner, R. Kempter, J. Van Hemmen, and H. Wagner, “A neuronal learn- ing rule for sub-millisecond temporal coding”, Nature 383, 76–78 (1996). [2] W. Au and J. Simmons, “Echolocation in dolphins and bats”, Phys. Today 60, 40–45 (2007). [3] C. J. Sumner, R. Meddis, and I. M. Winter, “The role of auditory nerve inner- vation and dendritic filtering in shaping onset responses in the ventral cochlear nucleus”, Brain Res. 1247, 221–234 (2009). [4] E. Covey and J. H. Casseday, “The lower brainstem auditory pathways”, in Hearing by bats, 235–295 (Springer, New York, NY) (1995). [5] N. Suga, E. Gao, Y. Zhang, and X. Ma, “The corticofugal system for hearing: Recent progress”, Proc. Natl. Acad. Sci. U.S.A. 97, 11807–11814 (2000). [6] E. Covey, “Neurobiological specializations in echolocating bats”, Anat. Rec. Part A 287, 1103–1116 (2005). 39 [7] E. Covey and J. Casseday, “Timing in the auditory system of the bat”, Annu. Rev. Physiol. 61, 457–476 (1999). [8] J. Casseday, “The monaural nuclei of the lateral lemniscus in an echolocating bat: Parallel pathways for analyzing temporal features of sound”, J. Neurosci. 11, 3456–3470 (1991). [9] R. Meddis, “Simulation of auditory-neural transduction: Further studies”, J. Acoust. Soc. Am. 83, 1056–1063 (1988). [10] R. Meddis, “Simulation of mechanical to neural transduction in the auditory receptor”, J. Acoust. Soc. Am. 79, 702–711 (1986). [11] B. Fontaine and H. Peremans, “Bat echolocation processing using first-spike latency coding”, Neural Networks 22, 1372–1382 (2009). [12] P. Heil, H. Neubauer, M. Brown, and D. Irvine, “Towards a unifying basis of auditory thresholds: Distributions of the first-spike latencies of auditory-nerve fibers”, Hearing Res. 238, 25–38 (2008). [13] P. Heil, H. Neubauer, D. Irvine, and M. Brown, “Spontaneous activity of auditory-nerve fibers: Insights into stochastic processes at ribbon synapses”, J. Neurosci. 27, 8457–8474 (2007). [14] P. Heil, “First-spike latency of auditory neurons revisited”, Curr. Opin. Neuro- biol. 14, 461–467 (2004). [15] R. Meddis, “Auditory-nerve first-spike latency and auditory absolute threshold: A computer model”, J. Acoust. Soc. Am. 119, 406–417 (2006). [16] P. Heil and D. Irvine, “First-spike timing of auditory-nerve fibers and compar- ison with auditory cortex”, J. Neurophysiol. 78, 2438–2454 (1997). [17] “Computational Models of the Auditory System”, Springer, New York (2010). [18] A. R. Moller, Hearing, Anatomy, Physiology, and Disorders of the Auditory System, 2nd edition (Academic Press, Burlington, MA) (2006). [19] N. S. Harper and D. McAlpine, “Optimal neural population coding of an audi- tory spatial cue”, Nature 430, 682–686 (2004). [20] T. Cover and J. Thomas, Elements of Information Theory, Wiley Series in Telecommunications and Signal Processing, 2nd edition (Wiley-Interscience, Hoboken, NJ) (2006). [21] D. Oertel, “The role of timing in the brain stem auditory nuclei of vertebrates”, Annu. Rev. Physiol. 61, 497–519 (1999). [22] D. Oertel and E. Young, “What’s a cerebellar circuit doing in the auditory system?”, Trends Neurosci. 27, 104–110 (2004). 40 [23] D. Oertel, S. Wright, X. Cao, and M. Ferragamo, “The multiple functions of T stellate/multipolar/chopper cells in the ventral cochlear nucleus”, Hearing Res. 276, 61–69 (2011). [24] P. H. S. Jen, “Adaptive mechanisms underlying the bat biosonar behavior”, Front. Biol. 5, 128–155 (2010). [25] M. Abeles, G. Hayon, and D. Lehmann, “Modeling compositionality by dynamic binding of synfire chains.”, J Comput. Neurosci 17, 179–201 (2004). [26] P. Dayan and L. Abbott, Theoretical Neuroscience: Computational and Math- ematical Modeling of Neural Systems (MIT Press, Cambridge, MA) (2001). [27] J. C. R. Licklider, “A duplex theory of pitch perception”, Experientia 7, 128– 134 (1951). [28] S. Shamma, “On the role of space and time in auditory processing”, Trends Cogn. Sci. 5, 340–348 (2001). [29] P. Joris, P. Smith, and T. Yin, “Coincidence detection minireview in the audi- tory system: 50 years after Jeffress”, Neuron 21, 1235–1238 (1998). [30] S. Dear and N. Suga, “Delay-tuned neurons in the midbrain of the big brown bat”, J. Neurophysiol. 73, 1084–1100 (1995). [31] J. F. Olsen and N. Suga, “Combination-sensitive neurons in the medial genic- ulate body of the mustached bat: encoding of target range information.”, J. Neurophysiol. 65, 1275–1296 (1991). [32] J. A. Simmons and J. E. Gaudette, “Biosonar echo processing by frequency- modulated bats”, IET Radar Sonar Navig. 6, 556–565 (2012). [33] N. Tishby, F. Pereira, and W. Bialek, “The information bottleneck method”, Arxiv Preprint Physics 1–16 (2000). [34] L. Buesing and W. Maass, “A spiking neuron as information bottleneck”, Neural Comput. 22, 1961–1992 (2010). [35] D. Johnson, “Information Theory and Neural Information Processing”, IEEE Trans. Inf. Theory 56, 653–666 (2010). [36] T. Lu and X. Wang, “Information content of auditory cortical responses to time-varying acoustic stimuli”, J. Neurophysiol. 91, 301 (2004). [37] W. Bialek, F. Rieke, R. R. de Ruyter van Steveninck, and D. Warland, “Reading a neural code.”, Science 252, 1854–1857 (1991). [38] E. M. Izhikevich, “Polychronization: Computation with spikes”, Neural Com- put. 18, 245–282 (2006). [39] E. M. Izhikevich, Dynamical systems in neuroscience, the geometry of excitabil- ity and bursting (MIT Press, Cambridge, MA) (2007). 41 [40] P. Chadderton, J. P. Agapiou, D. Mcalpine, and T. W. Margrie, “The Synaptic Representation of Sound Source Location in Auditory Cortex”, J. Neurosci. 29, 14127–14135 (2009). [41] F. L. Wightman and D. J. Kistler, “Monaural sound localization revisited.”, J. Acoust. Soc. Am. 101, 1050–1063 (1997). [42] R. A. Butler and R. A. Humanski, “Localization of sound in the vertical plane with and without high-frequency spectral cues.”, Percept. Psychophys. 51, 182– 186 (1992). [43] R. A. Butler, R. A. Humanski, and A. D. Musicant, “Binaural and monaural localization of sound in two-dimensional space”, Perception 19, 241–256 (1990). [44] H. Neubauer and P. Heil, “A physiological model for the stimulus dependence of first-spike latency of auditory-nerve fibers”, Brain Res. 1220, 208–223 (2008). [45] B. J. Fischer, L. J. Steinberg, B. Fontaine, R. Brette, and J. L. Pe˜ na, “Effect of instantaneous frequency glides on interaural time difference processing by auditory coincidence detectors”, Proc. Natl. Acad. Sci. U.S.A. 108, 18138– 18143 (2011). [46] A. Brand, O. Behrend, T. Marquardt, D. Mcalpine, and B. Grothe, “Precise inhibition is essential for microsecond interaural time difference coding”, Nature 417, 543–547 (2002). [47] J. Ma and R. M¨ uller, “A method for characterizing the biodiversity in bat pin- nae as a basis for engineering analysis”, Bioinspiration Biomimetics 6, 026008 (2011). [48] N. H. Fletcher and S. Thwaites, “Obliquely truncated simple horns: Idealized models for vertebrate pinnae”, Acustica 65, 194–204 (1988). [49] E. Lopez-Poveda, “Spectral processing by the peripheral auditory system: Facts and models”, Int. Rev. Neurobiol. 70, 7–48 (2005). [50] R. M¨uller, “A numerical study of the role of the tragus in the big brown bat”, J. Acoust. Soc. Am. 116, 3701–3712 (2004). [51] M. Aytekin, E. Grassi, M. Sahota, and C. Moss, “The bat head-related transfer function reveals binaural cues for sound localization in azimuth and elevation”, J. Acoust. Soc. Am. 116, 3594–3605 (2004). [52] J. A. Simmons and A. Megela Simmons, “Bats and frogs and animals in be- tween: Evidence for a common central timing mechanism to extract periodicity pitch”, J. Comp. Physiol. A 197, 585–594 (2010). [53] D. Griffin, Listening in the Dark, The Acoustic Orientation of Bats and Men (Cornell University Press, London) (1958). 42 [54] N. Veselka, D. D. Mcerlain, D. W. Holdsworth, J. L. Eger, R. K. Chhem, M. J. Mason, K. L. Brain, P. A. Faure, and M. B. Fenton, “A bony connection signals laryngeal echolocation in bats”, Nature 463, 939–942 (2010). [55] G. Neuweiler, The Biology of Bats (Oxford University Press, New York, NY) (2000). [56] T. W. Cranford, M. Amundin, and K. S. Norris, “Functional morphology and homology in the odontocete nasal complex: Implications for sound generation”, J. Morphol. 228, 223–285 (1996). [57] J. L. Aroyan, “Three-dimensional numerical simulation of biosonar signal emis- sion and reception in the common dolphin”, Ph.D. thesis, University of Califor- nia at Santa Cruz, Santa Cruz, CA (1996). [58] T. W. Cranford, P. Krysl, and J. A. Hildebrand, “Acoustic pathways revealed: Simulated sound transmission and reception in Cuvier’s beaked whale (Ziphius cavirostris)”, Bioinspiration Biomimetics 3, 016001 (2008). [59] W. E. Evans, “Echolocation by marine delphinids and one species of fresh-water dolphin”, J. Acoust. Soc. Am. 54, 191–199 (1973). [60] Y. Yovel, B. Falk, C. F. Moss, and N. Ulanovsky, “Optimal localization by pointing off axis”, Science 327, 701–704 (2010). [61] Q. Zhuang and R. M¨ uller, “Noseleaf furrows in a horseshoe bat act as resonance cavities shaping the biosonar beam”, Phys. Rev. Lett. 97, 218701 (2006). [62] D. Vanderelst, F. De Mey, H. Peremans, I. Geipel, E. Kalko, and U. Firzlaff, “What noseleaves do for FM bats depends on their degree of sensorial special- ization”, PLoS ONE 5, e11893 (2010). [63] A. Surlykke and C. F. Moss, “Echolocation behavior of big brown bats, Eptesi- cus fuscus, in the field and the laboratory”, J. Acoust. Soc. Am. 108, 2419–2429 (2000). [64] R. Altes and E. Titlebaum, “Bat signals as optimally Doppler tolerant wave- forms”, J. Acoust. Soc. Am. 48, 1014–1020 (1970). [65] R. Altes, “Ubiquity of hyperacuity”, J. Acoust. Soc. Am. 85, 943–952 (1989). [66] F. C. Fraser and P. E. Purves, “Hearing in cetaceans”, Bulletin of the British Museum (Natural History) (1954). [67] F. C. Fraser and P. E. Purves, “Hearing in cetaceans: Evolution of the accessory air sacs and the structure and function of the outer and middle ear in recent cetaceans”, Bulletin of the British Museum (Natural History) (1960). [68] K. S. Norris, “Some problems of echolocation in cetaceans”, in Marine bioa- coustics, edited by W. N. Tavolga, 316–336 (Pergamon Press, New York, NY) (1964). 43 [69] K. S. Norris, “The evolution of acoustic mechanisms in odontocete cetaceans”, in Evolution and environment, edited by E. T. Drake, 297–324 (Yale University Press, New Haven, CT) (1968). [70] K. S. Norris, “The echolocation of marine mammals”, in The biology of marine mammals, edited by H. T. Anderson, 391–423 (Academic Press, New York, NY) (1969). [71] R. L. Brill, M. L. Sevenich, T. J. Sullivan, J. D. Sustman, and R. E. Witt, “Be- havioral evidence for hearing through the lower jaw by an echolocating dolphin (Tursiops truncatus)”, Marine Mammal Science 4, 223–230 (1988). [72] A. Rihaczek, Principles of High-Resolution Radar (Artech House, Norwood, MA) (1996). [73] M. I. Skolnik, Introduction to Radar Systems, 3rd edition (McGraw-Hill, Boston, MA) (2001). [74] J. A. Simmons, “The resolution of target range by echolocating bats”, J. Acoust. Soc. Am. 54, 157–173 (1973). [75] D. Menne and H. Hackbarth, “Accuracy of distance measurement in the bat Eptesicus fuscus: Theoretical aspects and computer simulations”, J. Acoust. Soc. Am. 79, 386–397 (1986). [76] M. I. Sanderson, N. Neretti, N. Intrator, and J. A. Simmons, “Evaluation of an auditory model for echo delay accuracy in wideband biosonar”, J. Acoust. Soc. Am. 114, 1648–1659 (2003). [77] N. Neretti, M. Sanderson, N. Intrator, and J. Simmons, “Time-frequency model for echo-delay resolution in wideband biosonar”, J. Acoust. Soc. Am. 113, 2137– 2147 (2003). [78] P. Saillant, J. Simmons, S. Dear, and T. McMullen, “A computational model of echo processing and acoustic imaging in frequency-modulated echolocating bats: The spectrogram correlation and transformation receiver”, J. Acoust. Soc. Am. 94, 2691–2712 (1993). [79] J. Simmons, P. Saillant, and S. Boatright, “Biologically inspired SCAT sonar receiver for 2-D imaging”, J. Acoust. Soc. Am. 102, 3153 (1997). [80] P. A. Saillant, “Neural Computations for Biosonar Imaging in the Big Brown Bat”, Ph.D. thesis, Brown University, Providence, RI (1995). [81] L. N. Kloepper, P. E. Nachtigall, M. J. Donahue, and M. Breese, “Active echolo- cation beam focusing in the false killer whale, Pseudorca crassidens”, J. Exp. Biol. 215, 1306–1312 (2012). [82] L. N. Kloepper, P. E. Nachtigall, C. Quintos, and S. A. Vlachos, “Single-lobed frequency-dependent beam shape in an echolocating false killer whale (Pseu- dorca crassidens)”, J. Acoust. Soc. Am. 131, 577–581 (2012). 44 [83] J. Simmons, C. Moss, and M. Ferragamo, “Convergence of temporal and spec- tral information into acoustic images of complex sonar targets perceived by the echolocating bat, Eptesicus fuscus”, J. Comp. Physiol. A 166, 449–470 (1990). [84] M. Sanderson and J. Simmons, “Neural responses to overlapping FM sounds in the inferior colliculus of echolocating bats”, J. Neurophysiol. 83, 1840–1855 (2000). [85] M. Sanderson and J. Simmons, “Selectivity for echo spectral interference and delay in the auditory cortex of the big brown bat Eptesicus fuscus”, J. Neuro- physiol. 87, 2823–2834 (2002). [86] B. K. Branstetter, S. J. Mevissen, L. M. Herman, A. Pack, and S. P. Roberts, “Horizontal angular discrimination by an echolocating bottlenose dolphin tur- siops truncatus”, Bioacoustics 14, 15–34 (2003). [87] J. Wotton and J. Simmons, “Spectral cues and perception of the vertical po- sition of targets by the big brown bat, Eptesicus fuscus”, J. Acoust. Soc. Am. 107, 1034–1041 (2000). [88] J. Wotton, T. Haresign, M. Ferragamo, and J. Simmons, “Sound source ele- vation and external ear cues influence the discrimination of spectral notches by the big brown bat, Eptesicus fuscus”, J. Acoust. Soc. Am. 100, 1764–1776 (1996). [89] Z. M. Fuzessery, “Monaural and binaural spectral cues created by the external ears of the pallid bat”, Hearing Res. 95, 1–17 (1996). [90] W. M. Masters, A. J. Moffat, and J. A. Simmons, “Sonar tracking of horizontally moving targets by the big brown bat Eptesicus fuscus”, Science 228, 1331–1333 (1985). [91] R. Altes, “Angle estimation and binaural processing in animal echolocation”, J. Acoust. Soc. Am. 63, 155–173 (1978). [92] R. M¨ uller, “Numerical analysis of biosonar beamforming mechanisms and strategies in bats”, J. Acoust. Soc. Am. 128, 1414–1425 (2010). [93] J. Reijniers and H. Peremans, “Biomimetic sonar system performing spectrum- based localization”, IEEE Trans. Robot. 23, 1151–1159 (2007). [94] B. K. Branstetter and E. Mercado, III, “Sound Localization by Cetaceans”, International Journal of Comparative Psychology 19, 26–61 (2006). [95] S. S¨ umer, A. Denzinger, and H.-U. Schnitzler, “Spatial unmasking in the echolo- cating Big Brown Bat, Eptesicus fuscus”, J. Comp. Physiol. A 195, 463–472 (2009). [96] J. A. Simmons, S. A. Kick, B. D. Lawrence, C. Hale, C. Bard, and B. Escudie, “Acuity of horizontal angle discrimination by the echolocating bat, Eptesicus fuscus”, J. Comp. Physiol. A 153, 321–330 (1983). 45 [97] M. E. Bates, S. A. Stamper, and J. A. Simmons, “Jamming avoidance response of big brown bats in target detection”, J. Exp. Biol. 211, 106–113 (2008). [98] M. Warnecke, M. E. Bates, V. Flores, and J. A. Simmons, “Spatial release from simultaneous echo masking in bat sonar”, J. Acoust. Soc. Am. 135, 1–9 (2014). [99] M. E. Bates, J. A. Simmons, and T. V. Zorikov, “Bats use echo harmonic structure to distinguish their targets from background clutter”, Science 333, 627–630 (2011). [100] R. M. Pope and E. S. Fry, “Absorption spectrum (380-700 nm) of pure water. II. Integrating cavity measurements”, Applied optics 36, 8710–8723 (1997). [101] G. E. Becker and S. H. Autler, “Water vapor absorption of electromagnetic radiation in the centimeter wave-length range”, Physical Review 70, 300–307 (1946). [102] X. Lurton, An Introduction to Underwater Acoustics, Principles and Applica- tions (Springer, New York) (2002). [103] H. S. Maxim, A New System for Preventing Collisions at Sea (Cassell and Company, London) (1912). [104] R. Urick, Principles of Underwater Sound, 3rd edition (Pennsylvania Publica- tions, Los Altos, CA) (1983). [105] W. Burdic, Underwater Acoustic System Analysis, 2nd edition (Pennsylvania Publications, Los Altos, CA) (2003). [106] B. Maranda, “Efficient digital beamforming in the frequency domain”, J. Acoust. Soc. Am. 86, 1813–1819 (1989). [107] M. Bono, B. Shapo, P. McCarty, and R. Bethel, “Subband energy detection in passive array processing”, Technical Report ADA405484, Univ. of Texas at Austin. Applied Research Labs., Austin, TX (2000). [108] V. Valimaki and T. Laakso, “Principles of fractional delay filters”, in Proc. IEEE ICASSP ’00, 3870–3873 (2000). [109] D. H. Johnson and D. E. Dudgeon, Array Signal Processing: Concepts and Techniques (Prentice Hall PTR, Upper Saddle River, NJ) (1993). [110] D. Abraham, “Short Course on Array Signal Processing for Sonar”, in 166th Meeting of the Acoustical Society of America (San Francisco, CA) (2013). [111] A. V. Oppenheim, R. W. Schafer, and J. R. Buck, Discrete-Time Signal Pro- cessing, 2nd edition (Prentice Hall PTR, Englewood Cliffs, NJ) (1999). [112] P. P. Vaidyanathan and P. Pal, “Sparse coprime sensing with multidimensional lattice arrays”, Digital Signal Processing Workshop IEEE 425–430 (2011). [113] M. J. Hinich, “Processing spatially aliased arrays”, J. Acoust. Soc. Am. 64, 792–794 (1978). 46 [114] K. Drakakis, “A review of Costas arrays”, J. Appl. Math. 2006, 1–32 (2006). [115] J. Costas, “A study of a class of detection waveforms having nearly ideal range— Doppler ambiguity properties”, Proc. IEEE 72, 996–1009 (1984). [116] M. P. Hayes and P. T. Gough, “Synthetic aperture sonar: A review of current status”, IEEE J. Ocean. Eng. 34, 207–224 (2009). [117] A. Bellettini and M. A. Pinto, “Theoretical accuracy of synthetic aperture sonar micronavigation using a displaced phase-center antenna”, IEEE J. Ocean. Eng. 27, 780–789 (2002). [118] M. Pinto, “Use of frequency and transmitter location diversities for ambiguity suppression in synthetic aperture sonar systems”, in OCEANS ’97. MTS/IEEE Proc., 363–368 (1997). [119] K. F. Nieman, K. A. Perrine, T. L. Henderson, K. H. Lent, T. J. Brudner, and B. L. Evans, “Wideband monopulse spatial filtering for large receiver arrays for reverberant underwater communication channels”, in Proc. IEEE OCEANS 2010 MTE, 1–8 (IEEE) (2010). [120] E. Mosca, “Angle estimation in amplitude comparison monopulse systems”, IEEE Trans. Aerosp. Electron. Syst. AES-5, 205–212 (1969). [121] G. Llort-Pujol, C. Sintes, and D. Gueriot, “Analysis of Vernier interferometers for sonar bathymetry”, in Proc. IEEE OCEANS ’08, 1–5 (IEEE) (2008). [122] G. Llort-Pujol, C. Sintes, and X. Lurton, “A new approach for fast and high- resolution interferometric bathymetry”, in Proc IEEE OCEANS ’06, 1–7 (2006). [123] R. G. Lorenz and S. P. Boyd, “Robust minimum variance beamforming”, IEEE Trans. Signal Process. 53, 1684–1696 (2005). [124] J. Capon, “High-resolution frequency-wavenumber spectrum analysis”, Proc. IEEE 57, 1408–1418 (1969). [125] J. W. Odendaal, E. Barnard, and C. W. I. Pistorius, “Two-dimensional super- resolution radar imaging using the MUSIC algorithm”, IEEE Trans. Antennas Propagat. 42, 1386–1391 (1994). [126] R. Schmidt, “Multiple emitter location and signal parameter estimation”, IEEE Trans. Antennas Propagat. 34, 276–280 (1986). [127] A. Baggeroer and W. Kuperman, “An overview of matched field methods in ocean acoustics”, IEEE J. Ocean. Eng. 18, 401–424 (1993). [128] A. Baggeroer, W. Kuperman, and H. Schmidt, “Matched field processing: Source localization in correlated noise as an optimum parameter estimation problem”, J. Acoust. Soc. Am. 83, 571–587 (1988). [129] R. L. Thompson, J. Seawall, and T. Josserand, “Two dimensional and three dimensional imaging results using blazed arrays”, in Proc. IEEE OCEANS ’01, 985–988 (2001). 47 [130] R. L. Thompson and W. J. Zehner, “Frequency-steered acoustic beam forming system and process”, US Patent Office 5,923,617 (1999). [131] M. Hiipakka, T. Kinnari, and V. Pulkki, “Estimating head-related transfer functions of human subjects from pressure–velocity measurements”, J. Acoust. Soc. Am. 131, 4051–4061 (2012). [132] V. A. Gordienko, V. I. Il’ichev, and L. N. Zakharov, Vector-phase methods in acoustics (George Washington University, Seattle, WA) (1989). [133] D. M. Donskoy and B. A. Cray, “Acoustic particle velocity horns”, J. Acoust. Soc. Am. 131, 3883 (2012). [134] A. Nehorai and E. Paldi, “Acoustic vector-sensor array processing”, IEEE Trans. Signal Process. 42, 2481–2491 (1994). [135] J. V. Candy, Model-Based Signal Processing (John Wiley & Sons, Hoboken, NJ) (2005). [136] L. B. Jackson, Digital Filters and Signal Processing with MATLAB Exercises, 3rd edition (Klewer Academic Publishers, Norwell, MA) (1995). [137] S. S. Haykin, Adaptive Filter Theory, 5th edition (Prentice Hall, Upper Saddle River, NJ) (2013). [138] D. Simon, Optimal State Estimation, Kalman, H Infinity, and Nonlinear Ap- proaches (John Wiley & Sons, Hoboken, NJ) (2006). [139] R. Van der Merwe and E. Wan, “The square-root unscented Kalman filter for state and parameter-estimation”, in IEEE ICASSP ’01 Proc., 3461–3464 vol.6 (2001). [140] D. Gamerman and H. F. Lopes, Markov Chain Monte Carlo, Stochastic Sim- ulation for Bayesian Inference, Second Edition, 2nd edition (CRC Press, Boca Raton, FL) (2006). [141] I. Steinwart and A. Christmann, Support Vector Machines (Springer, New York) (2008). [142] S. S. Haykin, Neural Networks and Learning Machines (Prentice Hall, Upper Saddle River, NJ) (2009). [143] T. Irino and R. Patterson, “A time-domain, level-dependent auditory filter: The gammachirp”, J. Acoust. Soc. Am. 101, 412–419 (1997). [144] C. Sumner, L. O’Mard, E. Lopez-Poveda, and R. Meddis, “A nonlinear filter- bank model of the guinea-pig cochlear nerve: Rate responses”, J. Acoust. Soc. Am. 113, 3264–3274 (2003). [145] E. Covey and J. H. Casseday, “Connectional basis for frequency representation in the nuclei of the lateral lemniscus of the bat Eptesicus fuscus”, J. Neurosci. (1986). 48 [146] J. A. Simmons, M. B. Fenton, and M. J. O’Farrel, “Echolocation and pursuit of prey by bats”, Science 203, 16–21 (1979). [147] Ferragamo, M. Sanderson, and J. Simmons, “Phase sensitivity of auditory brain- stem responses in echolocating big brown bats”, J. Acoust. Soc. Am. 112, 2288 (2002). [148] J. A. Simmons, M. Ferragamo, C. F. Moss, S. B. Stevenson, and R. A. Altes, “Discrimination of jittered sonar echoes by the echolocating bat, Eptesicus fus- cus: The shape of target images in echolocation”, J. Comp. Physiol. A 167, 589–616 (1990). [149] R. Roverud, “Complex sound analysis in the lesser bulldog bat: Evidence for a mechanism for processing frequency elements of frequency modulated signals over restricted time intervals”, J. Comp. Physiol. A 174, 559–565 (1994). [150] M. Ferragamo, T. Haresign, and J. Simmons, “Frequency tuning, latencies, and responses to frequency-modulated sweeps in the inferior colliculus of the echolocating bat, Eptesicus fuscus”, J. Comp. Physiol. A 182, 65–79 (1997). [151] H. Peremans and J. Hallam, “The spectrogram correlation and transformation receiver, revisited”, J. Acoust. Soc. Am. 104, 1101–1110 (1998). [152] M. Park and R. Allen, “Pattern-matching analysis of fine echo delays by the spectrogram correlation and transformation receiver”, J. Acoust. Soc. Am. 128, 1490–1500 (2010). [153] W. Martin and P. Flandrin, “Wigner-Ville spectral analysis of nonstationary processes”, IEEE Trans. Acoust., Speech, Signal Process. 33, 1461–1470 (1985). [154] M. I. Sanderson, “The representation of temporal and spectral information cor- responding to target range in the auditory system of the big brown bat”, Ph.D. thesis, Brown University, Providence, RI (2002). [155] S. Mann and S. S. Haykin, “The chirplet transform: physical considerations”, IEEE Trans. Signal Process. 43, 2745–2761 (1995). [156] I. Matsuo, K. Kunugiyama, and M. Yano, “An echolocation model for range dis- crimination of multiple closely spaced objects: Transformation of spectrogram into the reflected intensity distribution”, J. Acoust. Soc. Am. 115, 920–928 (2004). [157] I. Matsuo and M. Yano, “An echolocation model for the restoration of an acous- tic image from a single-emission echo”, J. Acoust. Soc. Am. 116, 3782–3788 (2004). [158] I. Matsuo, J. Tani, and M. Yano, “A model of echolocation of multiple targets in 3D space from a single emission”, J. Acoust. Soc. Am. 110, 607–624 (2001). [159] N. S. Sharma, J. R. Buck, and J. A. Simmons, “Trading detection for resolution in active sonar receivers”, J. Acoust. Soc. Am. 130, 1272 (2011). 49 [160] N. S. Sharma and J. Buck, “A generalized linear filter approach for sonar re- ceivers”, in IEEE DSP/SPE 2009, 507–512 (2009). [161] I. Matsuo, “Localization and tracking of moving objects in two-dimensional space by echolocation”, J. Acoust. Soc. Am. 133, 1151–1157 (2013). [162] S. E. Forsythe, H. A. Leinhos, and P. R. Bandyopadhyay, “Dolphin-inspired combined maneuvering and pinging for short-distance echolocation”, J. Acoust. Soc. Am. 124, EL255–EL261 (2008). [163] L. Wiegrebe, “An autocorrelation model of bat sonar”, Biol. Cybern. 98, 587– 595 (2008). [164] R. M¨uller and J. C. T. Hallam, “Knowledge mining for biomimetic smart an- tenna shapes”, Rob. Autom. Syst. 50, 131–145 (2005). [165] R. M¨uller, H. Lu, and J. Buck, “Sound-diffracting flap in the ear of a bat generates spatial information”, Phys. Rev. Lett. 100, 108701 (2008). [166] D. Vanderelst, J. Reijniers, J. Steckel, and H. Peremans, “Information gener- ated by the moving pinnae of Rhinolophus rouxi : Tuning of the morphology at different harmonics”, PLoS ONE 6, e20627 (2011). [167] J. Reijniers, D. Vanderelst, and H. Peremans, “Morphology-induced information transfer in bat sonar”, Phys. Rev. Lett. 105, 148701 (2010). [168] D. Vanderelst, J. Reijniers, F. Schillebeeckx, and H. Peremans, “Evaluat- ing three-dimensional localisation information generated by bio-inspired in-air sonar”, IET Radar Sonar Navig. 6, 516–525 (2012). [169] T. Horiuchi, “A systems view of a neuromorphic VLSI echolocation system”, IEEE ISCAS 2008 (2007). [170] T. Horiuchi, “Seeing in the dark: Neuromorphic VLSI modeling of bat echolo- cation”, IEEE Signal Process. Mag. 22, 134–139 (2005). [171] G. Cauwenberghs, R. Edwards, Y. Deng, R. Genov, and D. Lemonds, “Neuro- morphic processor for real-time biosonar object detection”, IEEE ICASSP ’02 Proc. 4, 3984–3987 (2001). [172] C. Clarke and L. Qiang, “Bat on an FPGA: A biomimetic implementation of a highly parallel signal processing system”, in Proc. IEEE ACSSC ’04, 456–460 (2004). [173] T. Horiuchi, “A spike-latency model for sonar-based navigation in obstacle fields”, IEEE Trans. Circuits Syst. I, Reg. Papers 56, 2393–2401 (2009). [174] T. Horiuchi, “A neural model for sonar-based navigation in obstacle fields”, IEEE ISCAS 2008 605–608 (2006). [175] M. Cheely and T. Horiuchi, “A VLSI model of range-tuned neurons in the bat echolocation system”, IEEE ISCAS 2003 4, 872–875 (2003). 50 [176] T. Horiuchi, “A neuromorphic VLSI model of bat interaural level difference pro- cessing for azimuthal echolocation”, IEEE Trans. Circuits Syst. I, Reg. Papers 54, 74–88 (2007). [177] T. Horiuchi, “A VLSI model of the bat dorsal nucleus of the lateral lemniscus for azimuthal echolocation”, IEEE ISCAS 2005 5, 4217–4220 (2005). [178] R. Z. Shi and T. K. Horiuchi, “A VLSI model of the bat lateral superior olive for azimuthal echolocation”, in IEEE ISCAS ’04, 900–903 (2004). [179] T. Horiuchi, “Spike-based VLSI modeling of the ILD system in the echolocating bat”, Neural Networks (2001). [180] T. Horiuchi, “Binaural spectral cues for ultrasonic localization”, IEEE ISCAS 2008 2110–2113 (2008). [181] H. Abdalla and T. K. Horiuchi, “Spike-based acoustic signal processing chips for detection and localization”, in 2008 IEEE Biomedical Circuits and Systems Conference, 225–228 (IEEE) (2008). [182] T. Horiuchi, “An ultrasonic filterbank with spiking neurons”, IEEE ISCAS 2008 (2005). [183] R. Kuc, “Biomimetic sonar and neuromorphic processing eliminate reverbera- tion artifacts”, IEEE Sensors J. 7, 361–369 (2007). [184] R. Kuc, “Biomimetic sonar locates and recognizes objects”, J. Ocean. Eng., IEEE 22, 616–624 (1997). [185] F. Schillebeeckx, J. Reijniers, and H. Peremans, “Probabilistic spectrum based azimuth estimation with a binaural robotic bat head”, in 2008 Fourth Inter- national Conference on Autonomic and Autonomous Systems (ICAS), 142–147 (IEEE) (2008). [186] F. Schillebeeckx and H. Peremans, “Biomimetic sonar: 3D-localization of mul- tiple reflectors”, in IEEE/RSJ International Conference on Intelligent Robots and Systems, 3079–3084 (2010). [187] F. Guarato, L. Jakobsen, D. Vanderelst, A. Surlykke, and J. Hallam, “A method for estimating the orientation of a directional sound source from source direc- tivity and multi-microphone recordings: Principles and application”, J. Acoust. Soc. Am. 129, 1046–1058 (2011). [188] J. Steckel, A. Boen, and H. Peremans, “Broadband 3-D sonar system using a sparse array for indoor navigation”, IEEE Trans. Robot. 29, 161–171 (2013). [189] J. Steckel and H. Peremans, “A novel biomimetic sonarhead using beamform- ing technology to mimic bat echolocation”, IEEE Tran. Ultrason., Ferroelectr., Freq. Control 59, 1369–1377 (2012). [190] J. Steckel, F. Schillebeeckx, and H. Peremans, “Biomimetic sonar, outer ears versus arrays”, in Sensors, 2011 IEEE, 821–824 (2011). 51 [191] J. Steckel and H. Peremans, “BatSLAM: Simultaneous Localization and Map- ping Using Biomimetic Sonar”, PLoS ONE 8, e54076 (2013). 52 Chapter 3 Multi-Component Separation and Analysis of Bat Echolocation Calls Abstract The vast majority of animal vocalizations contain multiple FM components with vary- ing amounts of non-linear modulation and harmonic instability. This is especially true of biosonar sounds where precise time-frequency templates are essential for neural in- formation processing of echoes. Understanding the dynamic waveform design by bats and other echolocating animals may help to improve the efficacy of man-made sonar through biomimetic design. Bats are known to adapt their call structure based on the echolocation task, proximity to nearby objects, and density of acoustic clutter. To interpret the significance of these changes, a method was developed for component separation and analysis of biosonar waveforms. Techniques for imaging in the time- frequency plane are typically limited due to the uncertainty principle and interference cross-terms. This problem is addressed by extending the use of the fractional Fourier transform to isolate each non-linear component for separate analysis. Once separated, Empirical Mode Decomposition (EMD) can be used to further examine each compo- nent. The Hilbert transform may then successfully extract detailed time-frequency information from each isolated component. This multi-component analysis method is The contents of this chapter were published in the Journal of the Acoustical Society of America. 2013 January; 133(1):538–546. [DOI: 10.1121/1.4768877]. 53 applied to the sonar signals of four species of bats recorded in-flight by radiotelemetry along with a comparison of other common time-frequency representations. 3.1 Introduction The active sonar call of the big brown bat (Eptesicus fuscus) contains multiple non- linear FM components that are harmonically related [1]. The scale invariant proper- ties of this species’ echolocation signals [2, 3] implies that cross-correlation between the signal and the echo returns are insensitive to in-flight Doppler shifts. Furthermore, the call of E. fuscus is a multi-component signal that naturally increases the effective bandwidth and consequently improves range resolution. Despite the advantages for active sonar pulse design, these non-linear and multi-component characteristics make it difficult to precisely localize energy in the time-frequency plane. Animal vocalizations are typically described using conventional spectrograms, which have intrinsically low time-frequency resolution. Alternative representations may better capture the information that animals actually use, particularly since bats manifest greater time-frequency acuity. Small details in the call signal structure may appear subtle and unimportant, but could actually lead to statistically significant ob- servations of the animals’ behavior. An example of nearly indistinct, yet intentional adaptive pulse design by E. fuscus is described in Hiryu et al. [4]. Using the spectro- gram, they found that bats shifted echolocation frequencies by several kHz (< 4-8% of total bandwidth) to avoid pulse-echo ambiguity in dense clutter. Most interesting is the fact that temporal cross-correlation between the pulse-echo pairs are nearly iden- tical, which strongly suggests that these bats do not simply use conventional matched filtering for echo processing. Many different time-frequency representations (TFR) are used to process multi- component, linear, quadratic, and higher-order FM signals. If the signal is stationary, the Fourier Transform (FT) is an effective tool for analyzing the frequency content. 54 0 0 120 120 A FM3 B 100 −5 100 −5 FM2 Frequency (kHz) Frequency (kHz) 80 −10 80 −10 60 FM1 −15 60 −15 40 −20 40 −20 20 −25 20 −25 0 −30 0 −30 0 0.5 1 1.5 2 2.5 3 3.5 0 0.5 1 1.5 2 2.5 3 3.5 Time (ms) Time (ms) 0 0 120 120 C D 100 −5 100 −5 Frequency (kHz) Frequency (kHz) 80 −10 80 −10 60 −15 60 −15 40 −20 40 −20 20 −25 20 −25 0 −30 0 −30 0 0.5 1 1.5 2 2.5 3 3.5 0 0.5 1 1.5 2 2.5 3 3.5 Time (ms) Time (ms) Figure 3.1. Four different time-frequency distributions of an FM echolocation call from E. fuscus. (a) The spectrogram shows that this species of bat produces at least two prominent harmonic com- ponents (labeled FM1, FM2, etc.), which is a common characteristic among many echolocating bats. (b) The Wigner-Ville distribution (WVD) provides very good resolution, but interference cross-terms incorrectly place energy within and between components. (c) Cross-terms are effectively removed at the cost of resolution in the Smoothed Pseudo WVD. (d) The reassignment method [7] (computed on c) is a highly effective technique for improving the readability of any TFR. Reassignment works by remapping the energy distributed in a TFR onto its center-of-gravity; however, it cannot show details that are unresolved in the base representation. All plots are shown on a normalized decibel scale. However, the FT provides little insight into the nature of signals from nonstationary or nonlinear systems. For instance, quadratic phase (linear FM) signals are poorly represented by the FT because it is a transform from time to frequency, i.e. not a joint distribution in time and frequency. A common way around this issue is to take the FT of short moving windows of the signal in time, thus providing frequency information as a function of time. This leads us to the short-time Fourier transform and its squared modulus, the spectrogram. The difficulty with this approach is that the window must be small enough in time to provide good time resolution and wide 55 enough in the bandwidth sense to provide good frequency resolution. These simul- taneous conflicting objectives lead to leakage of the spectral energy and a generally smeared appearance in the time-frequency plane. Use of the spectrogram has be- come ubiquitous due to its fast computation, simple interpretation, and widespread software integration; however, it is very difficult to resolve fine details from the spec- trogram alone, especially if attempting to automate the process. Fig. 3.1 illustrates the spectrogram of an example echolocation call by E. fuscus alongside other TFRs, including the Wigner-Ville distribution (WVD) [5], the smoothed psuedo-WVD [6], and the reassignment method [7]. Many different methods have been used to visualize biosonar signals beyond the common TFRs. These include time-scale analysis [8], the Fractional Fourier Transform (FrFT) [9, 10, 11], wavelets [12], and the minimum variance estimator [13]. In these methods only a small number of signals were analyzed to show the processing technique. For practical applications, it is important to consider how well a method can automatically extract waveform parameters in a large set of data. Recently, a host of TFR tools based on the idea of polynomial phase signal models have appeared [14, 15, 16, 17]. They generally rely on adaptations to the ambiguity function including multiple products, lagged versions, higher orders, or some combination thereof [18, 19, 20]. It is not surprising that this approach has received a great deal of attention, as the ambiguity function is itself the characteristic function of the WVD [21]. In other words, the WVD and ambiguity function form a Fourier pair [22]. A significant reason for not adopting these more mathematically rigorous parametric models is that often they are defended with the caveat that the amplitude be constant or slowly varying in time. This condition cannot be guaranteed for biosonar signals which contain unique amplitude modulations that change with each emitted pulse. Unfortunately there is no single time-frequency technique that is optimized for all situations. While meaningful insight can be gleaned using parametric models or 56 appropriate TFRs for a specific signal, the notion of using an adaptive or empirical decomposition is attractive due to the complexity and nonlinearity of the bat’s sound production system. Imaging techniques to improve time-frequency fidelity would more easily identify small differences in call structure (as found in Hiryu et al.). Resolving these differences is critical, however, to understand how these changes are actually perceived by the bat. This paper extends the use of the FrFT and applies several techniques to sepa- rate and analyze nonlinear harmonic components in biosonar signals. The methodol- ogy should be easily extrapolated to other highly variable, multi-component signals, such as calls of other bat species, marine mammal calls and whistles, insect commu- nication, and voiced-speech. 3.2 Data Collection The algorithm was developed and refined using a single E. fuscus call recorded at high signal-to-noise ratio (Fig. 3.1). An ultrasonic free-field microphone (Series 4139, Br¨ uel & Kjær) was placed directly in front of the bat on a stationary platform at approximately 20 cm. A recording was made while the bat performed a 2-choice discrimination test. The echolocation signal was recorded with a digital audio recorder (ISC-16, R.C. Electronics) at a 250kHz sampling rate [23]. Typical of this species of bat, the signal is non-linearly modulated, with two principal harmonics FM1 and FM2 along with a partial 3rd harmonic, FM3. To evaluate the utility of our method, we analyzed a body of existing data. This consisted of biosonar sounds recorded from four species of bats using a radio microphone (“Telemike”) carried by the flying bat [4, 24, 25, 26]. The Telemike includes an electret condenser microphone (FG Series, Knowles Acoustics, IL, USA) positioned above the bat’s head and attached to a miniature radio transmitter used to record the sounds without the acoustic artifacts that normally occur when a moving 57 bat is recorded by a stationary microphone. The data set included calls from E. fuscus, the eastern bent-winged bat (Miniopterus fuliginosus), the Japanese house bat (Pipistrellus abramus), and the greater horseshoe bat (Rhinolophus ferrumequinum). For each species the time series contained multiple biosonar signals recorded while the animal was navigating through a flight room used for testing their responses to clutter. The flights and recordings were conducted in the laboratories of Hiroshi Riquimaroux and Shizuko Hiryu at Doshisha University (Kyotanabe, Japan) or at Brown University. During recording, the signals were digitally sampled at either 384 kHz or 192 kHz [4, 24]. 3.3 Methods The multi-component analysis presented here is a two-part process: separation of harmonic components followed by mono-component decomposition. Component sep- aration includes a new use of the FrFT to find a rough approximation of instantaneous frequency, fi (t), time-varying demodulation centered about fi (t), and a zero-phase filtering technique that will not affect the phase or group delay of the signal compo- nent. Mono-component decomposition consists of applying analysis techniques such as Empirical Mode Decomposition (EMD) and Hilbert spectral analysis. The re- sulting decomposition produces highly resolvable images of each component in the time-frequency plane. The reader is referred to the Appendices for an overview of our definitions of a multi-component waveform and how the Hilbert spectral analysis can be used to extract this information. 3.3.1 Separation of Harmonic Components Component separation may be performed in a variety of ways; however, the follow- ing demonstrates a robust approach that combines the use of the Fractional Fourier Transform, demodulation, and zero-phase filtering. The FrFT provides an easy way 58 to approximate a component’s instantaneous frequency. We apply a time-varying bandpass filter along this estimate to isolate the component. Subtracting the result from the original signal allows the process to be repeated until all components have iteratively been separated. 3.3.1.1 Fractional Fourier Transform The fractional Fourier transform (FrFT) and Radon-Wigner transform (RWT) are both fractional rotations of a signal from the time domain to the frequency domain in the time-frequency plane. The FrFT can be defined in its more familiar integral form [27] as π φ e−i( 4 − 2 ) Z 1 2 +u2 ) cot(φ) tu F rF T (φ, u) = p x(t)e 2 i(t e−i sin(φ) dt (3.1) 2π sin(φ) The parameter φ is the angle of rotation in radians and u is the fractional dimension between time and frequency. Letting φ = α π2 , a rotation of α = 0 is simply the time series itself and a rotation of α = 1 is a traditional FT, any non-integer rotation will produce a fractional FT. This can be accomplished easily by forming the Fourier unitary matrix, raising it to an arbitrary power, α, then multiplying the FT of the original signal with the matrix. Repeatedly applying the FT to a signal is equivalent to raising this matrix to an integer power. For example, raising the matrix to 0, 1, 2 and 3, results in the original time series, the FT, the time-reversed series, and the FT of the time-reversed signal, respectively. The RWT is the Radon transform of the WVD. Geometrically, the RWT is a tomographic transform that combines a rotation of the WVD with a projection onto a one dimensional axis at some angle of rotation φ. Like the WVD, the RWT results in a 2D distribution. Unlike the WVD, the RWT provides intensity information not as a function of time and frequency, but rather as a function of frequency and angle 59 of rotation of the WVD. As a result, the relationship between the RWT and FrFT follows RW T (φ, u) = |F rF T (φ, u)|2 (3.2) That is, the RWT is equivalent to the squared modulus of the FrFT [28, 29, 30]. It should be noted that, like the conventional FT, the FrFT is a linear operator. The WVD, and therefore the RWT, are both bilinear operators on the signal. As a result, the FrFT is a TFR which does not produce the cross-term interference asso- ciated with bilinear TFRs. Because the RWT is a projection onto a one-dimensional axis through a line integral at angle α, the two-dimensional, bilinear (quadratic) rep- resentation loses the cross term interference during the projection, thus preserving the relationship between the RWT and the FrFT [29]. 3.3.1.2 Rough Approximation of Instantaneous Frequency This method uses a discrete implementation of the Fractional Fourier Transform (FrFT) [31] to compute the RWT of the analytic signal, x˜(t). Fig. 3.2 shows the signal from Fig. 3.1 in the rotation-fraction domain. Each column in the image is formed by computing the RWT of x˜(t) for a specific angle of rotation, α. Computing the RWT at more angles leads to better α resolution and zero-padding or interpolating the signal will increase resolution in u. Every (α, u) pair corresponds to a specific line in the time-frequency plane. For a linear FM signal, fi (t) = f0 + kt can be precisely estimated by finding its peak in the rotation-fraction plane and solving for the constants f0 and k as 60 1 0 0.8 −5 −10 Fraction (u) 0.6 −15 0.4 −20 0.2 −25 0 −30 −1 −0.5 0 0.5 1 Rotation (α) Figure 3.2. Rotation-fraction domain of the E. fuscus signal. The FrFT is computed on the analytic time series signal at incremental rotation values, α. The squared modulus, |F rF T |2 , produces the vertical slices of the rotation-fraction domain. Each (α, u) point in the image corresponds to a unique line cutting across the time-frequency plane. Once the global peak on the surface is found, points along the local ridge (inset) represent lines passing through subsections of the nonlinear component in the time-frequency plane. A polynomial curve is fit to the intersection points of adjacent lines which results in a rough estimate of fi (t) for one component. fs2 π k =− cot(α ) (3.3) T 2 1 π fc =fs (u − )csc(α ) (3.4) 2 2 T f0 =fc − k (3.5) 2 where fs is the sampling frequency, T is the period of the signal, and fc is the frequency at the midpoint of the line [32]. Since the bat’s signal consists of nonlinear FM components, there is no single peak, but a continuous ridge where multiple (α, u) pairs correspond to lines that pass through subsections of a component. We make use of this fact by normalizing the RWT to the highest peak, detecting local points along the ridge above a thresh- old, then finding the intersection points of the lines from adjacent (α, u) pairs. This 61 generates points in the time-frequency plane along the most prominent component. The end points can be extended by projecting out from the first and last intersec- tion points. Fitting a polynomial or spline curve to these points provides a rough approximation to fi (t) for one component without a priori information on any FM parameters. 3.3.1.3 Zero-Phase Component Filtering A time-varying bandpass filter is effectively applied to the analytic signal along the instantaneous frequency approximation. This is achieved by first integrating fi (t) to find the phase law, φi (t), as in Eq. (3.13) and demodulating the signal as xˇ(t) = x˜(t)e−jφi (t) (3.6) The demodulated complex signal, xˇ(t), is then lowpass filtered to remove unwanted harmonics and reverberation. The filter bandwidth can be adjusted depending on the accuracy of the initial fi (t) estimate. Note that a zero-phase forward-backward filter is required to minimize phase distortions and avoid introducing group delay: Yˇ (ejωT ) = H(e−jωT )H(ejωT )X(e ˇ jωT ) (3.7) The signal is then remodulated using the negative of the phase law: y˜(t) = yˇ(t)ejφi (t) (3.8) Each step is shown in Fig. 3.3 for the 2nd component, FM2. The process of rough approximation and zero-phase filtering is repeated for subsequent components (i.e. FM1 and FM3) once the isolated component, y˜(t), is subtracted from the analytic signal, x˜(t). After each harmonic component has been effectively isolated, this opens the door for a variety of different processing options. 62 x ˜(t) x ˇ(t) 0 0 100 100 −10 −10 Frequency (kHz) Frequency (kHz) 50 50 0 −20 0 −20 −50 −50 −30 −30 −100 A −100 B −40 −40 0 1 2 3 0 1 2 3 Time (ms) Time (ms) yˇ(t) y˜(t) 0 0 100 100 −10 −10 Frequency (kHz) Frequency (kHz) 50 50 0 −20 0 −20 −50 −50 −30 −30 −100 C −100 D −40 −40 0 1 2 3 0 1 2 3 Time (ms) Time (ms) Figure 3.3. Overview of FM2 component separation using a least-squares cubic approximation of fi (t). Negative frequencies are shown to accommodate the frequency warping caused by demodula- tion. (a) The analytic signal, x˜(t), with approximate fi (t) curve for FM2. (b) FM2 is now clearly separable by frequency after demodulation to 0 Hz (ˇ x(t)). (c) A zero-phase lowpass filter is applied to remove other components (ˇ y (t)). (d) FM2 is modulated back using the negative phase law, re- sulting in y˜(t). Through the process of component separation, the resulting component is free from non-overlapping echoes, reverberation, and background noise. 3.3.2 Monocomponent Decomposition 3.3.2.1 Empirical Mode Decomposition Empirical mode decomposition (EMD) is a useful technique for analyzing nonlinear FM signals due to its robustness in handling nonstationary, nonlinear data. The EMD separates a time-series signal into multiple decompositions known as intrinsic mode functions (IMFs). An IMF is defined only if (1) the number of extrema and the number of zero-crossings are equal or at most differ by one, and (2) the mean of the envelope of the maxima and the envelope of the minima is zero at all points. 63 This works due to the tacit relationship between zero-crossings and the frequency spectrum of a signal [33]. IMFs have properties conducive to signal processing, namely that they are linear and have well behaved Hilbert transforms. Additionally, the EMD forms a basis which is complete, approximately orthogonal, local, and adaptive. The orthogonal property of the IMFs ensures that the energy associated with the distribution is positive, a critical designation for a time-frequency representation. −70 −60 −50 −70 −60 −50 100 A B 50 IMF 1 IMF 2 0 Frequency (kHz) −40 −30 −20 −20 −10 0 100 C D 50 IMF 3 IMF 4 0 −50 −40 −30 −70 −60 −50 100 E F 50 IMF 6−13 IMF 5 0 1 2 3 1 2 3 Time (ms) Figure 3.4. Shown here are results of the empirical mode decomposition on the separated second harmonic, FM2, from E. fuscus (Fig. 3.3.1.3). Since the EMD works strictly in the time-domain, interpolation beyond the Nyquist rate is necessary to achieve good performance. FM2 was inter- polated by a factor of 8 before EMD to avoid aliasing artifacts. Spectrograms for IMF 1 through 5 (a-e) illustrate how energy is distributed amongst the IMFs. High frequency noise is contained largely in IMFs 1 and 2 (a and b). IMFs 3 and 4 (c and d) contain the strongest parts of the signal with a weaker part found in IMF 5 (e). Residual low frequency energy is found in IMFs 6 through 13 (combined in f). IMFs 4-6 may be summed and passed on to later processing stages. Since the decomposition forms a complete basis, summation across all IMFs will result in the original signal. The color scale depth is set to 30 dB on all plots. The result of the EMD is similar to that of passing the signal of interest through 64 a filter bank [34]. The key differences are that filtering is not stationary nor restricted to separation in the time-frequency plane. In this regard, the IMF that results from the decomposition is composed of the same time-varying frequency modulation of the original signal with much of the non-coherent signals (noise) and riding waves (DC to very low-frequency) suppressed. Spectrograms of the IMFs generated from FM2 are shown in Fig. 3.4. 3.3.2.2 Hilbert Spectral Analysis Computing instantaneous frequency and amplitude from the mono-component signals provides very useful information that cannot be easily found by other methods. In the discrete-time implementation, ai (t) is a straightforward absolute value calculation of the complex analytic signal. Finding fi (t) involves numerical integration and therefore requires some approximation. Calculation of fi (t) for a filtered analytic component, y˜(t), can be accomplished directly in discrete-time by fs fi [k] = y [k + 1] y˜∗ [k − 1]) ∠(˜ (3.9) 2π for k = 2, 3, 4 . . . N − 1 where k is the discrete-time sample number, N is the total number of sample points, and fs is the sampling rate [35]. This is immediately recognized as the central finite difference [36]. The resulting fi (t) and ai (t) functions (Fig. 3.5a-b) may optionally be smoothed to compensate for low signal-to-noise ratio using the least-squares Savitzky-Golay filter [37]. If applied, care should be taken to avoid over-smoothing by using a short filter length and a sufficient polynomial order. Each component is then combined to form a precise and high-resolution TFR (Fig. 3.5c). 65 Freq. (kHz) 100 A 50 FM1 FM2 FM3 0 0 0.5 1 1.5 2 2.5 3 3.5 −20 B Amp. (dB) −60 FM1 FM2 FM3 −100 0 0.5 1 1.5 2 2.5 3 3.5 −25 Freq. (kHz) 100 C −35 50 −45 −55 0 0 0.5 1 1.5 2 2.5 3 3.5 Time (ms) Figure 3.5. Hilbert spectral analysis results showing ai (t) and fi (t) for each harmonic component of the E. fuscus call (a-b). Each component has its own fi (t) and ai (t) function. (a) and (b) are combined to form the time-frequency representation shown in (c). The instantaneous amplitude is plotted on a decibel scale in (b) and is shown with intensity in (c). Line thickness has been increased in all plots to improve visibility. 3.3.3 Waveform Synthesis and Ground Truth An important aspect to these mono-component decomposition techniques is that all of the original signal information is retained. This implies that recorded biosonar signals can be decomposed, modified in some way, and finally synthesized into a noise-free replica of the recorded waveform for detailed acoustic simulations or computational models of auditory neural processing. This step is also useful to perform a ground truth by subtracting the synthesized signal from the original. When the initial phase φ0 (see Sec. B) is properly adjusted, results show negligible error in the time-frequency plane with only the broadband noise and non-interfering echoes removed from the signal. 66 3.4 Results 3.4.1 Telemike Data Series Echolocation signals from E. fuscus and three East Asian bat species were processed to show the method’s flexibility and ease of use. Data from various Telemike experi- ments were used in all four cases [4, 25, 26]. First, the biosonar calls were separated using a simple energy detector and then individually run through multi-component analysis. The spectrogram of the full time series are shown side-by-side with the analysis results for each bat in Figure 3.6a-d. Spectrogram of Telemike Data Overlaid Analysis Results 0 100 100 −10 kHz 50 50 −20 0 A 0 E −30 0 0.1 0.2 0.3 0.4 0.5 0.6 0 1 2 3 90 90 0 −10 kHz 45 45 −20 Frequency F 0 B 0 −30 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0 1 2 90 90 0 −10 kHz 45 45 −20 G 0 C 0 −30 0 0.02 0.04 0.06 0.08 0.1 0.12 0 1 2 90 90 0 −20 kHz 45 45 D −40 0 0 H 0 0.1 0.2 0.3 0.4 0.5 0 10 20 30 Time (seconds) Time (ms) Figure 3.6. Multi-component analysis was performed on call sequences from radiotelemetry record- ings of E. fuscus and three Asian bat species. The spectrogram for the entire time series are shown for E. fuscus (a), P. abramus (b), M. fuliginosus (c), and R. ferrumequinum (d). The analysis results for each call are aligned and overlaid in the time-frequency plane (e-h). The color scales are the sme across each row. Pairs of pulses, known as strobe groups, can be identified by short inter-pulse timing in the cases of E. fuscus, P. abramus, and R. ferrumequinum. Both P. abramus and M. fuliginosus emit mono-component non-linear FM waveforms. Although their calls are nearly identical in time-frequency structure (f and g), only P. abramus is known to emit strobe groups. R. ferrumequinum use relatively long constant frequency tones with short FM tails at the beginning and end of each call. The color depth was extended to -50 dB for R. ferrumequinum (d and h) to show the first harmonic, which is approximately 20 dB weaker than the second in this species. The E. fuscus data set was collected by Hiryu et al. [4] and the remaining data sets were collected by Riquimaroux et al. and Hiryu et al. [25, 26]. The Telemike data from E. fuscus (Fig. 3.6a) contains 13 echolocation signals emitted as it entered a densely cluttered array of chains. This data set is the same 67 as the example shown in Hiryu et al. [4]. Fig. 3.6b and 3.6c shows spectrograms of the Telemike data from the eastern bent-windged bat (Miniopterus fuliginosus) and the Japanese house bat (Pipistrellus abramus). Fig. 3.6d shows eight calls emitted by the greater horseshoe bat (Rhinolophus ferrumequinum). Figure 3.7 shows the results from E. fuscus in more detail. The pulse-to-pulse time intervals were used to identify strobe groups, which are closely spaced pairs of calls with short time intervals [1, 4]. The figure shows the strobe groups identified with brackets. It is worth noting that FM1 is stronger than FM2 by approximately 8 dB due to the off-axis microphone placement of the Telemike. In this data set, the first four pulses were emitted early in the clutter field where pulse-echo ambiguity was present. The last four pulses were emitted after pulse-echo ambiguity subsided. Hiryu et al. found that when pulse-echo ambiguity was strong, the bats shifted the tail-end frequency for each strobe group pair. This behavior was absent when pulse- echo ambiguity was not present. The results from our method confirm this occurred in the example data set, but it is significantly more pronounced than when looking at the spectrogram alone. 3.4.2 Synthesized Multi-Component FM Analysis To demonstrate how the proposed technique can adapt to small time-frequency per- turbations, a multi-component linear FM waveform is generated with a small sinu- soidal modulation. The combined FM signal can be defined using µ0 2 B φ(t) = f0 t + t + sin 2πfm t (3.10) 2 4πfm where f0 is the initial frequency, µ0 is the linear sweep rate, B is the amplitude of sinusoidal modulation (in Hz) and fm is the modulation frequency. This phase law is used directly in Eq. (3.12) to construct the discrete-time noiseless components, which are then added together. 68 FM2 120 Frequency (kHz) 1 2 3 strobe groups 4 5 −10 100 80 −20 60 A 2 ms −30 40 FM1 80 0 Frequency (kHz) 1 2 3 4 5 60 −10 40 −20 20 B 2 ms Artificially Compressed Interval Time Figure 3.7. E. fuscus was previously found to use slight frequency shifts to avoid pulse-echo ambiguity. Multi-component analysis results are plotted separately for FM2 (a) and FM1 (b). The time duration of each pulse component matches the scale bar, but the inter-pulse interval time is artificially compressed. This was done to show the fine detail in each call, which cannot be easily seen in the overlaid plot (Fig. 3.6e). The results reveal a clear distinction between the lowest frequency in each harmonic component for strobe groups 1 and 2. As noted in Hiryu et al. this separation becomes insignificant when pulse-echo ambiguity is no longer a problem. This is shown circled in strobe groups 4 and 5. Fig. 3.8a shows the desired fi (t) functions to synthesize a multicomponent si- nusoidal FM riding on an linear FM. The sinusoidal riding wave varies by ± 2.5 kHz, but neither the Wigner-Ville nor the Reassignment method (Fig. 3.8b-c) can resolve these variations. The proposed component separation and Hilbert spectral analysis faithfully reproduces the original fi (t) curves (Fig. 3.8d). 3.5 Discussion Many decompositions, including Hilbert spectral analysis and EMD, do not perform well on multi-component signals. In fact, unless the multi-component signal is first decomposed into the corresponding mono-component signals, the concepts of fi (t) and 69 120 120 A B 100 100 Frequency (kHz) Frequency (kHz) 80 FM2 80 60 60 40 FM1 40 20 20 0 0 0 1 2 3 1 2 3 Time (ms) Time (ms) 120 120 C D 100 100 Frequency (kHz) Frequency (kHz) 80 80 60 60 40 40 20 20 0 0 1 2 3 0 1 2 3 Time (ms) Time (ms) Figure 3.8. (a) The original fi (t) functions used to synthesize two linear plus sinusoidal FM compo- nents, (b) Wigner-Ville distribution, (c) Smoothed Pseudo-WVD, and (d) results after separation of components with the proposed method (d). Despite having better resolution, the WVD is perfectly localized for up to a second order phase law, such as a linear FM or a constant tone. This syn- thetic FM signal demonstrates that methods we consider “high fidelity” may not resolve small, but significant features in natural signals such as biosonar calls. For cases where the signal generation mechanism is unknown or not well understood, it is best not to assume any TFR is optimal. ai (t) lose physical meaning [38, 39, 40, 41]. How does one define the instantaneous frequency of a signal that has overlapping functions of frequency at a single point in time? Therefore, these signals must first be separated into mono-components and analyzed individually. Using such a technique, signal parameter estimation is not restricted to the coarse resolution of a spectrogram or interference cross-terms that plague other high-resolution methods. We have presented a technique for isolating and processing individual compo- nents of the call from E. fuscus based on the fractional Fourier transform, time-varying demodulation, EMD, and Hilbert spectral analysis. The method can be applied to any frequency modulated multi-component signal provided a rough estimate of the 70 instantaneous phase is achievable and components are separable in the time-frequency plane. Algorithm parameters can be freely adjusted to allow for an automated algo- rithm with various types of signals. Ultimately, we arrive at a TFR that is highly localized in both time and frequency. The EMD has important insights to offer in the realm of biological sonar. It was asserted [13] that the EMD technique is not generally efficient for estimating fi (t) of bat calls. We do not feel that is accurate. When recording an E. fuscus echolocation signal along the main response axis, the dominant signal energy typically transitions from the first to the second harmonic. This was offered as a reason to avoid the EMD, as the decomposition tracks the strongest energy in the signal. We have shown that a simple technique for isolating and separating the components can and does provide effective relief of this problem. Second, the EMD is not solely designed to break a multi-component signal into mono-components. The property of most importance is the similarity to the time-varying constant Q filter bank. In this way, EMD is more similar to the Minimum Variance Estimator (MVE) technique, which Kopsinis et al. endorse. This is due to the strong relationship between zero-crossings and spectral content [33]. Since its inception, the EMD has provided insights into a great many systems that are categorized by nonlinear and nonstationary signals. However, the problems with EMD have been well documented [13, 34, 42]. The lack of mathematical rigor and definition related to the EMD is often identified as a source for criticism. If the EMD is applied carefully and the results scrutinized, this concern can be effectively mitigated by applying known techniques to serve as a model for comparison. Recent advances have been made with empirical-based methods. The normal- ized Hilbert Transform, the normalized amplitude Hilbert Transform, and their rela- tionship to the signal quadrature help to mitigate some of the restrictions imposed by Bedrosian and Nutall [43, 44, 45]. In certain instances, the error between the approximated Hilbert transform and the quadrature can produce spectral artifacts in 71 the Hilbert spectral analysis. In other instances, the EMD can highlight the issue of undersampling. In conclusion, higher resolution time-frequency techniques are necessary to un- derstanding biosonar. This paper describes one possible solution to the problem of multi-component time-frequency analysis. Further developments in empirical decom- position techniques will enable new ways of evaluating non-linear processes. 3.6 Acknowledgments This work was funded through internal investments by the Naval Undersea Warfare Center, Division Newport, RI and ONR grant N00014-09-1-0691. The authors wish to thank Hiroshi Riquimaroux and Shizuko Hiryu for providing time series data from recordings using the Telemike recording system, Ivars Kirsteins and Lee Estes for dis- cussions on the Fractional Fourier Transform, and Laura Kloepper and Andrea Sim- mons for editorial suggestions. Figures showing the WVD, smoothed pseudo-WVD, and reassignment method for comparison were produced using the Time-Frequency Toolbox for MATLAB [46]. A Multi-Component Frequency-Modulated Wave- forms Many bat echolocation signals consist of components (usually harmonics) with a varying degree of amplitude and phase modulation. The multi-component version of the big brown bat’s echolocation call is a summation of each individual FM waveform, or N X s(t) = x˜n (t) (3.11) n=1 72 for N independent harmonically related components, x˜n (t). Given this assumption, each component therefore has a time-dependent amplitude and frequency, or more precisely is an instantaneous function of time. A signal component can be defined in its analytic form as x˜(t) = ai (t)ejφi (t)+jφ0 (3.12) where ai (t) is the instantaneous amplitude, φi (t) is the instantaneous phase modula- tion (or phase law), and φ0 is the initial phase of the complex exponential. The phase law is related to the instantaneous frequency, fi (t), by Z T φi (t) = 2π fi (t)dt (3.13) 0 In this manner, we assume that the bat’s multi-component FM waveforms can be completely described by defining ai (t), fi (t), and φ0 for each harmonic component. B Hilbert Spectral Analysis of Modulated Wave- forms We present the formulation below in continuous time for the purpose of familiarity. Assume for computational purposes that the signal of interest, x[n], is obtained by 1 sufficiently sampling a band-limited signal x(t) such that x[n] = x(nT ), where T = fs is the sampling interval chosen to avoid aliasing. If a real mono-component signal, x(t), fits the criteria for a modulated wave- form, then we can extract the parameters of interest directly from estimates of fi (t) and ai (t). This requires first converting the original mono-component signal into its complex analytic form using the Hilbert Transform, H, and is achieved with x˜(t) = x(t) + xˆ(t) (3.14) 73 where x(t) is the purely real signal under consideration and xˆ(t) is the purely imagi- nary H{x(t)}. This is calculated as follows Z ∞ xˆ(t) = x(τ )h(t − τ )dτ (3.15) −∞ 1 with h(t) = πt . The integral can be solved by using Cauchy’s principal value theorem; however, it should be noted that many simple approximations exist for a discrete-time implementation. The resulting analytic signal will consist only of positive spectral components in the frequency domain. This signal representation is convenient since it provides the information to fully describe a mono-component modulated signal. Once in this form, finding an estimate of the instantaneous amplitude, phase, and frequency is given by p ai (t) =|˜ x(t)| = Re{˜x}2 + Im{˜x}2 (3.16) x} x(t) = arctan Im{˜  φi (t) =∠˜ x} Re{˜ (3.17) and using the relation in (3.13), 1 d fi (t) = φi (t) (3.18) 2π dt The issue of finding the constant φ0 in (3.12) can be resolved by optimizing the phase alignment in time between the original signal and a waveform synthesized using the estimated parameters, but the value depends largely on the arbitrarily defined time origin. As long as φ0 is consistent between harmonics, it need not be exact for analysis purposes. The general use of the Hilbert transform in estimation of fi (t) and ai (t) has been termed elsewhere as Hilbert spectral analysis [42]. 74 References [1] A. Surlykke and C. F. Moss, “Echolocation behavior of big brown bats, Eptesicus fuscus, in the field and the laboratory.”, J. Acoust. Soc. Am. 108, 2419–2429 (2000). [2] B. Harris and S. Kramer, “Asymptotic evaluation of the ambiguity functions of high-gain FM matched filter sonar systems”, in Proc. IEEE, 2149–2157 (1968). [3] R. Altes and E. Titlebaum, “Bat signals as optimally Doppler tolerant wave- forms”, J. Acoust. Soc. Am. 48, 1014–1020 (1970). [4] S. Hiryu, M. E. Bates, J. A. Simmons, and H. Riquimaroux, “FM echolocating bats shift frequencies to avoid broadcast-echo ambiguity in clutter”, Proc. Natl. Acad. Sci. 107, 7048–7053 (2010). [5] S. Kay and G. Boudreaux-Bartels, “On the optimality of the Wigner distribution for detection”, in Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP ’85, 1017–1020 (1985). [6] W. Martin and P. Flandrin, “Wigner-Ville spectral analysis of nonstationary processes”, IEEE Trans. Acoust., Speech, Signal Process. 33, 1461–1470 (1985). [7] F. Auger and P. Flandrin, “Improving the readability of time-frequency and time- scale representations by the reassignment method”, IEEE Trans. Signal Process. 43, 1068–1089 (1995). [8] B. Ristic and B. Boashash, “Scale domain analysis of a bat sonar signal”, Time- Frequency and Time-Scale Analysis, 1994., Proc. of the IEEE-SP International Symposium on 373–376 (1994). [9] C. Capus, Y. Rzhanov, and L. Linnett, “The analysis of multiple linear chirp signals”, Time-scale and Time-Frequency Analysis and Applications (Ref. No. 2000/019), IEE Seminar on 4 (2000). [10] C. Capus and K. Brown, “Short-time fractional Fourier methods for the time- frequency representation of chirp signals”, J. Acoust. Soc. Am. 113, 3253–3263 (2003). [11] C. Capus, Y. Pailhas, K. Brown, D. M. Lane, P. W. Moore, and D. Houser, “Bio- inspired wideband sonar signals based on observations of the bottlenose dolphin (Tursiops truncatus).”, J. Acoust. Soc. Am. 121, 594–604 (2007). [12] S. Olhede and A. Walden, “A generalized demodulation approach to time- frequency projections for multicomponent signals”, Proc. R. Soc. A 461, 2159 (2005). [13] Y. Kopsinis, E. Aboutanios, D. Waters, and S. McLaughlin, “Time-frequency and advanced frequency estimation techniques for the investigation of bat echoloca- tion calls”, J. Acoust. Soc. Am. 127, 1124–1134 (2010). 75 [14] S. Peleg and B. Friedlander, “The discrete polynomial-phase transform”, IEEE Trans. Signal Process. 43, 1901–1914 (1995). [15] S. Peleg and B. Friedlander, “Multicomponent signal analysis using the polynomial-phase transform”, IEEE Trans. Aerosp. Electron. Syst. 32, 378–387 (1996). [16] P. Wang, I. Djurovic, and J. Yang, “Instantaneous Frequency Rate Estimation Based On the Robust Cubic Phase Function”, in Acoustics, Speech and Signal Processing (ICASSP ’06) Proceedings, IEEE International Conference on, 89–92 (2006). [17] P. O’Shea, “A fast algorithm for estimating the parameters of a quadratic FM signal”, IEEE Trans. Signal Process. 52, 385–393 (2004). [18] S. Barbarossa and V. Petrone, “Analysis of polynomial-phase signals by the integrated generalized ambiguity function”, IEEE Trans. Signal Process. 45, 316– 327 (1997). [19] S. Barbarossa, A. Scaglione, and G. Giannakis, “Product high-order ambiguity function for multicomponent polynomial-phase signal modeling”, IEEE Trans. Signal Process. 46, 691–708 (1998). [20] C. Ioana, “Time-frequency analysis using warped-based high-order phase mod- eling”, EURASIP J. Applied Signal Processing 2856–2873 (2005). [21] L. Cohen, Time-Frequency Analysis: Theory and Applications (Prentice Hall PTR, Englewood Cliffs, NJ) 299 (1995). [22] F. Hlawatsch and G. Boudreaux-Bartels, “Linear and quadratic time-frequency signal representations”, IEEE Signal Process. Mag. 9, 21–67 (1992). [23] J. A. Simmons, M. Ferragamo, C. F. Moss, S. B. Stevenson, and R. A. Altes, “Discrimination of jittered sonar echoes by the echolocating bat, Eptesicus fus- cus: the shape of target images in echolocation.”, J. Comp. Physiol. A 167, 589–616 (1990). [24] S. Hiryu, Y. Shiori, T. Hosokawa, H. Riquimaroux, and Y. Watanabe, “On-board telemetry of emitted sounds from free-flying bats: compensation for velocity and distance stabilizes echo frequency and amplitude.”, J. Comp. Physiol. A 194, 841–851 (2008). [25] H. Riquimaroux and S. Hiryu, “Findings on bat sonar through Telemike system”, J. Acoust. Soc. Am. 131, 3422 (2012). [26] S. Hiryu, N. Matsuta, S. Mantani, E. Fujioka, H. Riquimaroux, and Y. Watanabe, “On-board telemetry of biosonar sounds from free-flying bats”, J. Acoust. Soc. Am. 131, 3522 (2012). [27] L. E. Estes, “Revisiting an Eigenfunction perspective on the ordinary and frac- tional Fourier transforms”, NUWC-TM-12-010, NUWC Division Newport, RI (2012). 76 [28] J. Wood and D. Barry, “Radon transformation of time-frequency distributions for analysis of multicomponent signals”, IEEE Trans. Signal Process. 42, 3166–3177 (1994). [29] A. W. Lohmann and B. H. Soffer, “Relationships between the Radon-Wigner and fractional Fourier transforms”, J. Opt. Soc. Am. A 11, 1798–1801 (1994). [30] O. Akay and G. Boudreaux-Bartels, “Fractional convolution and correlation via operator methods and an application to detection of linear FM signals”, IEEE Trans. Signal Process. 49, 979–993 (2001). [31] H. M. Ozaktas, O. Arikan, M. A. Kutay, and G. Bozdagt, “Digital computation of the fractional Fourier transform”, IEEE Trans. Signal Process. 44, 2141–2150 (1996). [32] R. Jacob, T. Thomas, and A. Unnikrishnan, “Applications of fractional Fourier transform in sonar signal processing”, IETE J. Res. 55, 16 (2009). [33] R. Kumaresan and Y. Wang, “On the duality between line-spectral frequencies and zero-crossings of signals”, IEEE Trans. Speech Audio Process. 9, 458–461 (2001). [34] P. Flandrin, G. Rilling, and P. Goncalves, “Empirical mode decomposition as a filter bank”, IEEE Signal Process. Lett. 11, 112–114 (2004). [35] S. Kay, “A fast and accurate single frequency estimator”, IEEE Trans. Acoust., Speech Signal Process. 37, 1987–1990 (1989). [36] J. H. Mathews and K. D. Fink, Numerical Methods Using MATLAB, 3rd edition (Prentice Hall PTR, New York) 662 (1998). [37] A. Savitzky and M. J. E. Golay, “Smoothing and differentiation of data by sim- plified least squares procedures.”, Anal. Chem. 36, 1624–1639 (1964). [38] B. Boashash, “Estimating and interpreting the instantaneous frequency of a sig- nal. I. Fundamentals”, in Proc. IEEE, 520–538 (1992). [39] B. Boashash, “Estimating and interpreting the instantaneous frequency of a sig- nal. II. Algorithms and applications”, in Proc. IEEE, 540–568 (1992). [40] P. Oliveira and V. Barroso, “On the concept of instantaneous frequency”, in Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on, 2241–2244 (1998). [41] R. Kumaresan and A. Rao, “Model-based approach to envelope and positive in- stantaneous frequency estimation of signals with speech applications”, J. Acoust. Soc. Am. 105, 1912 (1999). [42] N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N. C. Yen, C. C. Tung, and H. H. Liu, “The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis”, Proc. R. Soc. A 454, 903–995 (1998). 77 [43] N. E. Huang, Z. Wu, S. R. Long, K. Arnold, X. Chen, and K. Blank, “On instantaneous frequency”, Adv. Adapt. Data Anal 1, 177–229 (2009). [44] E. Bedrosian, “A product theorem for Hilbert transforms”, Proc. IEEE 51, 868 – 869 (1963). [45] A. Nuttall and E. Bedrosian, “On the quadrature approximation to the Hilbert transform of modulated signals”, Proc. IEEE 54, 1458 – 1459 (1966). [46] F. Auger, P. Flandrin, P. Goncalves, and O. Lemoine, “Time-frequency toolbox”, Technical Report (1996), http://tftb.nongnu.org; last accessed February 1, 2012. 78 Chapter 4 High Resolution Acoustic Measure- ment System and Beam Pattern Reconstruction Method for Bat Echolocation Emissions Abstract Measurements of the transmit beam patterns emitted by echolocating bats have pre- viously been limited to cross-sectional planes or averaged over multiple signals using sparse microphone arrays. To date, no high-resolution measurements of individual bat transmit beams have been reported in the literature. Recent studies indicate that bats may change the time-frequency structure of their calls depending on the task, and suggest that their beam patterns are more dynamic than previously thought. To investigate beam pattern dynamics in a variety of bat species, a high-density recon- figurable microphone array was designed and constructed using low-cost ultrasonic microphones and custom electronic circuitry. The planar array is 1.83 meters wide by 1.42 meters tall with microphones positioned on a 2.54 cm square grid. The system can capture up to 228 channels simultaneously at a 500 kHz sampling rate. Beam patterns are reconstructed in azimuth, elevation, and frequency for visualization and further analysis. Validation of the array measurement system and post-processing The contents of this chapter were published in the Journal of the Acoustical Society of America. 2014 January; 135(1):513–520. [DOI: 10.1121/1.4829661]. 79 functions is shown by reconstructing the beam pattern of a transducer with a fixed circular aperture and comparing the result with a theoretical model. To demonstrate the system in use, transmit beam patterns of the big brown bat, Eptesicus fuscus, are shown. 4.1 Introduction Approximately 1,200 species of bats exist worldwide and nearly 1,000 of these rely primarily on the active probing of echolocation to gather information about their surroundings [1]. Many bat species appear to have evolved different strategies for hunting, foraging, and navigation. The ultrasonic echolocation signals used by bats are generally classified as constant frequency, frequency-modulated (FM), or a com- bination of the two. Depending on the species the source of emitted sound is either through the bat’s mouth or a noseleaf, which both have unique and highly com- plex structural properties. These reflective surfaces provide directivity to the sound which is highly frequency dependent [2]. The spatial directivity of the echolocation sound is known as the transmit beam pattern. Combined with the receive patterns of the ears, these beam patterns control the spatial information that is fundamental to echolocation. The most commonly reported measurements of biosonar transmit beam pat- terns are in controlled laboratory environments. Beam measurements from a single echolocation signal are traditionally limited to cross-sections with line arrays arranged in azimuth, elevation, or both. Another common approach combines the signals re- ceived over multiple echolocation calls. This averaging technique works well if the beam pattern is guaranteed to remain constant throughout the experiment. One of the earliest reported observations of transmit beam directivity of echolo- cating bats was by Griffin [3]. Following these early studies of little brown bats (Myotis lucifugus), Simmons published the beam pattern of the mustached bat (Pteronotus 80 parnellii ) and the big brown bat (Eptesicus fuscus) while stationed on a platform with a four-channel microphone array [4]. Detailed measurements for E. fuscus were later published by Hartley and Suthers who used a single microphone and combined the measurements over multiple echolocation calls [5]. They found that a reasonable ap- proximation to the big brown bat’s beam in azimuth was a circular piston transducer with an acoustic aperture comparable to the width of the mouth (4.7 mm radius). Ghose and Moss [6] reconstructed the beams used by E. fuscus in flight using the envelope of a narrow frequency band centered at 35 kHz, which corresponds to the strongest peak in the fundamental harmonic component. Interestingly, some [5, 7] have noted that E. fuscus emits a beam with two distinct vertical lobes. In a field study, Surlykke et al. recorded signals emitted by Myotis daubentonii, estimated the beam pattern by tracking a single bat, and averaged over multiple approaches toward a four-channel microphone array [8]. Despite the many different approaches to measuring the bat’s echolocation beam, until recently these studies have assumed a static transmit beam. Investi- gations by Yovel et al. noted that the Egyptian fruit bat (Rousettus aegyptiacus) points its echolocation beam slightly off axis to simultaneously optimize both target detection and localization during flight [9]. Matsuta et al. designed a 31-channel mi- crophone array to measure the dynamic beams of Rhinolophus ferrumequinum [10]. This array included an “O-shaped” planar dimension in addition to the horizontal and vertical planes. Transmit beam measurements have also recently been reported from various vespertilionid species [11, 12, 13], two emballonurid bats [14], and Trachops cirrhosis [15]. In addition to empirical measurements of biosonar beam patterns, computa- tional methods are now becoming practical. M¨ uller developed a numerical technique based on finite element modeling that predicts the transmit and receive beams from computed tomography scans of the noseleaf and external ears (pinnae and tragus) for numerous bat species [2, 16, 17]. A large library of transmit and receive beam pat- 81 terns have been assembled; however, this work currently excludes transmit beams of species producing echolocation sounds through the mouth. In several recent papers, Vanderelst et al. have used the finite element method on estimating bats’ transmit and received beams [18, 19, 20, 21]. Biosonar measurement systems are also being pioneered using underwater ar- rays of hydrophones. Investigations with echolocating marine mammals such as the bottlenose dolphin (Tursiops truncatus) and false killer whale (Pseudorca crassidens) have demonstrated with planar arrays that their beams are dynamic and may be shaped and/or steered depending on echolocation task [22, 23]. Multi-element, high resolution, underwater hydrophone arrays continue to provide information on the shape and dynamics of beam patterns of odontocetes [24, 25]. Although aerial and undersea echolocating mammals have evolved unique acous- tic structures and waveforms, they do exhibit similar performance characteristics [26]. It may be useful to quantitatively compare the adaptive beamforming techniques be- tween bats and cetaceans. These investigations into the dynamics of beam formation would provide insight into biosonar target localization and tracking strategies that may have significant implications for improving man-made sonar and radar systems. In this paper we introduce and describe a new apparatus and method for recording bat echolocation beams in the laboratory using low cost, commercially available mi- crophones that provide unprecedented resolution and accuracy for measuring bats’ dynamic echolocation beams. Beam measurements from both a man-made projector and the big brown bat (E. fuscus) are shown for system demonstration. 4.2 Data Collection A large reconfigurable microphone array was designed and constructed using low- cost ultrasonic microphones and custom analog and digital interface electronics. The microphone units are silicon integrated circuits based on micro electro-mechanical 82 systems (MEMS) technology (SPM0404UD5, Knowles Acoustics, Itasca, IL). The ar- ray backplane is 1.42 m tall by 1.83 m wide and consists of 16 printed circuit board panels attached to a machined aluminum frame. The surface of the array is covered with 2.54 cm thick acoustic foam panels (Class A™ Melamine Foam, American Micro Industries, Chambersburg, PA) with cutouts for the 2.0 cm by 1.3 cm microphone preamplifier circuit boards. The foam panels reduce echo backscatter by approxi- mately 15 dB across the frequency bands used by echolocating bats. Measurements show that the presence of the surrounding foam does not affect the omni-directional beams of the MEMS microphones. Sensors can be placed anywhere on the planar array within a 2.54 cm pitch grid. For a sound source at 1 meter and centered normal to the array plane, the maximum beam coverage in azimuth and elevation is 84◦ and 70◦ , respectively. The microphones were initially positioned uniformly on the planar array, therefore the minimum and maximum spacing varied with angle. Based on the geometry, a minimum element spacing of 3.39◦ horizontal and 5.18◦ vertical was located at the edges of the array. A maximum element spacing of 5.80◦ horizontal and 7.24◦ vertical was found at the center of the array. Acoustic signals transduced by the microphones are band-pass filtered between 10 kHz and 120 kHz, amplified, and synchronously sampled on 228 channels at a programmable rate up to 500 kHz. Digital signals are collected with a custom high- speed data recorder based on field programmable gate array (FPGA) technology. The FPGA’s parallel interface design enables data to be simultaneously sampled from each channel’s analog-to-digital converter. Figure 4.1 shows the fully assembled array and microphone preamplifier circuit boards. Data can be recorded either continuously or in short bursts triggered by an echolocation call received on a separate microphone. When using the trigger system, the amplitude envelope of a monitored microphone is compared with a threshold in real-time that enables the recording system for 10–20 ms. This duration is sufficient to 83 Figure 4.1. (a) Photograph of the fully constructed microphone array. Acoustic foam (not pictured) with microphone cutouts was placed on the face of the array to reduce echo backscatter during beam pattern measurements. With acoustic foam installed, the reflected energy was attenuated by approximately 15 dB across the entire frequency band of 10 kHz to 100 kHz. (b) Close-up view of a microphone preamplifier circuit board showing the integrated MEMS microphone unit. Preamplifier and filter circuitry are located on the back of the circuit board. Mechanical alignment, power distribution, and signal routing are provided by the backplane. capture the signal on each of the microphones in the array while providing the ability to record a much longer duration experiment without exceeding data processing and storage limits. 84 4.3 Methods 4.3.1 Beam Pattern Reconstruction Recorded acoustic data are post-processed using functions written in MATLAB. Fig- ure 4.2 shows a flow diagram of the entire beam reconstruction process. Each data channel is first mapped to its known planar array coordinates. Sonar signals are then identified by an energy detector. The next series of steps are performed iteratively over each identified sonar signal. To accurately reconstruct the beam pattern from each signal, array data must be mapped from the planar array coordinates to a spherical coordinate system centered at the source of the sound (see Figure 4.3). The echolocation calls are localized in azimuth, elevation, and range using time difference of arrival (TDOA) to triangulate the sound’s point of origin [27]. Once the position of the source is known, signals from each channel are aligned in time. The echolocation calls emitted by most species of bats consist of multiple non- stationary harmonic components. The multi-component signals produced by some bat species overlap in frequency; therefore, each harmonic component was analyzed indi- vidually. Components are first identified in the data by application of the fractional Fourier transform (FrFT) [28] and then extracted using a time-variant zero-phase fil- ter as outlined in [29]. Separation of harmonics is necessary to examine the possibility that the beams change over time within the same broadcast. Another benefit to this approach is that it improves signal-to-noise ratio by removing non-interfering echoes and reverberation from the data. Although the MEMS sensor-to-sensor magnitude and phase response agree very well due to tightly controlled fabrication processes, the frequency response is not flat. To correct for this variability each channel is digitally equalized with a zero- phase auto-regressive moving-average (ARMA) filter [30] that inverts the frequency response of the microphones, preamplifiers, and digital converter circuitry based on 85 Start Localize point source with TDOA Map data channels to array coordinates Realign calls in time Detect call events Separate harmonics with time-varying filter Process next call Equalize frequency response from microphone calibration Yes Correct for transmission Unprocessed Calls? and absorption losses No Estimate frequency Analyze beam data content of each channel Interpolate beam magnitude over angular coordinates Figure 4.2. Flow chart describing the signal processing steps to reconstruct each beam. After identifying each echolocation call in a data set, calls are processed iteratively to reconstruct the beam patterns. Once complete, the reduced set of beam data can be visualized and analyzed. calibration data from each microphone. The zero-phase is necessary to avoid intro- ducing frequency-dependent phase shifts and group delay effects. Additional details of the calibration procedure are discussed in the following section. The Euclidean distance between a sound source and each microphone varies significantly at close range. Furthermore, frequency absorption becomes dominant for the high frequencies considered here at only one to two meters in distance. Given the distance, d, between each microphone and the sound’s point of origin, transmis- 86 0.8 0.6 0.4 Y position (m) 0.2 0 −0.2 −0.4 −0.6 −0.8 0.8 0.4 0 −0.4 0.2 0 −0.8 1 0.8 0.6 0.4 X position (m) Z position (m) Figure 4.3. Diagram showing microphone sensor positions mapped to spherical coordinates with the sound source positioned at the origin. In the example shown, the planar array is located 1 m from a point source centered about the middle of the array coordinates. Beam pattern coverage and resolution depend upon the sound source position relative to the array. sion loss effects due to both spherical spreading and frequency dependent absorption are estimated and corrected by computational means. Spherical spreading losses contribute an overall attenuation in pressure proportional to 1/d and is indepen- dent of frequency. The atmospheric absorption coefficient, α, varies significantly as a function of frequency and is dependent upon environmental conditions of ambient temperature, relative humidity, and atmospheric pressure [31]. Once absorption has been calculated in conventional units of dB/m, the combined transmission loss at a specific distance, d, can be estimated as a function of frequency: 87 d T L(d, f ) = 20log10 + α(f )(d − d0 ) (4.1) d0 where d0 is the reference distance of the sound source (typically 0.1 m for bat sonar). The desired magnitude response to correct for transmission losses in pressure mea- surements at a particular distance, d, is therefore Hd (f ) = 10T L(d,f )/20 . (4.2) Attenuation in physical systems implies the presence of dispersion and phase shift to guarantee causality [32]; however, the effects on phase can be ignored at acoustic frequencies of interest here. Therefore, transmission losses exhibit a low-pass filter response that is well modeled by a zero-phase moving-average (MA) filter [14]. This transfer function model is unique for each source-sensor pair and is not pre- calculated as for microphone channel equalization. The final steps in reconstructing the beam pattern are to estimate the frequency spectrum at each microphone position and interpolate along the angular coordinates. The magnitude and phase response of each harmonic component is extracted through spectral analysis via the fast Fourier transform (FFT). The values at each spatial angle are interpolated across a fine uniform grid with 1◦ resolution to simplify visu- alization and data analysis. Interpolation is achieved by using the natural neighbor interpolation method on linear units of amplitude [33]. 4.3.2 Microphone and System Calibration A detailed calibration of each microphone channel and supporting electronics was per- formed to ensure meaningful acoustic beam measurements. A custom electrostatic transducer with a fixed circular aperture of 2.0 cm was used as a broadband sound source to validate the array measurement system [34]. For a reference measurement, 88 the projector was positioned 10 cm from a calibrated 1/4” ultrasonic microphone (Se- ries 4135, Br¨ uel & Kjær, Nærum, Denmark) in the free-field. The projector emitted ten identical linear FM chirps with 2 ms duration from 110 kHz down to 10 kHz. Each microphone channel on the array was then tested individually with the same set of ten pulses from 10 cm. The projector’s distance and orientation were physically constrained to minimize measurement error and acoustic foam was installed on the array prior to taking the measurements. Time series data from the reference measurement, x[k], and each array mi- crophone channel, yi [k], were processed to estimate the frequency spectrum of the transduced signals. The FFT of each signal provided an estimate of the frequency spectrum. Spectra from multiple pulses on a given microphone were averaged to- gether. The transfer function, Hi (z), for each sensor and supporting electronics was then calculated as Yi (z) Hi (z) = , i = 1, 2 . . . N (4.3) X(z) where Yi (z) is the frequency spectrum of each of N array microphone channels and X(z) is the frequency spectrum of the reference microphone. Frequencies not covered by the FM pulse contained only noise and were forced to unity. Given the magnitude and phase response of a linear time-invariant system, an ARMA model can be created for that system to mimic or reverse its response. The advantage of using such a model over a simpler all-zero model is that it can directly model the physical resonances in a system using its poles while the zeros match nulls in the response and reduce any residual error. For an equalizer the desired model is the inverse of the system’s frequency response, HiEQ (z) = Hi−1 (z). In this case, special care must be taken to ensure that the inverse filter remains stable or can be made stable. The procedure to generate a zero-phase ARMA model is based off an initial estimate using Prony’s method [30] followed by iterative refinement with a frequency-domain Steiglitz-McBride algorithm [35]. Once the model coefficients are 89 defined for each channel, data are equalized by passing through the zero-phase filter. An important specification for the acoustic measurement system is the full- scale sound pressure level (SPL) at the face of the array, which is the equivalent RMS sound pressure level that would produce saturation of the analog-to-digital converters. Based on the calibration measurements and the derived voltage-to-acoustic conversion factor, the full-scale sound pressure level at the face of the array is 127 dB SPL (re 20 µPa). For reference, acoustic source levels of some representative bat echolocation signals are typically 110 dB SPL (re 20 µPa @ 0.1m) for aerial feeding bats such as E. fuscus and 100 dB SPL for smaller “whispering bats” such as Artibeus jamaicensis [1, 26, 36]. The loudest bat species to have been reported, the lesser and greater bulldog bats (Noctilio albiventris and Noctilio leporinus, respectively), produce echolocation sounds up to 140 dB SPL (re 20 µPa @ 0.1m) [37]. Another important specification, instantaneous dynamic range, is approximately 110 dB across the entire frequency range including signal processing gain. A programmable gain up to 30 dB can be applied to all channels to record less intense signals. 4.4 Results 4.4.1 Example Beam Pattern of a Circular Electrostatic Projector The same electrostatic projector and FM waveform used for calibration was also used to validate the acoustic beam measurements and post-processing functions. The 2 cm diameter projector provides a symmetrical circular beam that is highly repeatable and may be quantitatively compared with the expected beam from a theoretical model of the transducer. For this validation measurement, the projector was moved normal to the center of the array at 1 meter distance. A sampling rate of 235 kHz was used during these measurements, which provided sufficient frequency coverage without aliasing effects. 90 Figure 4.4. Aspect view and contour plot of the reconstructed transmit beam pattern of a 2 cm diameter transducer at its resonant frequency of 60 kHz. The transducer was centered normal to the array at 1 m distance and emitted a 2 ms broadband linear FM pulse from 110 kHz down to 10 kHz. The dB units are normalized to the peak at the maximum response axis. The half-power beam width, β3dB , is defined as the angular width of the beam pattern at the 3 dB cutoff points. As with any fixed-aperture transducer, the beam width varies inversely proportional to frequency. β3dB was measured for the projector along the horizontal axis to be 47.0◦ , 21.9◦ , and 12.3◦ at 30 kHz, 60 kHz, and 90 kHz. Sidelobe peak levels were approximately 15 dB relative to the main lobe at 60 kHz. Sidelobes could not be verified for these pulses at 30 kHz due to coverage limitations and at 90 kHz due to the projector’s limited source level above 60 kHz. Figure 4.4 shows the reconstructed beam pattern for the transducer at its resonant frequency of 60 kHz. The theoretical model of a piston transducer with an infinite baffle is defined [38] as 91 Dω(θ) 0 −6 Directivity (dB) −12 −18 10 kHz 20 kHz 40 kHz −24 60 kHz 80 kHz 100 kHz −30 −40 −30 −20 −10 0 10 20 30 40 Angle (deg) Figure 4.5. Theoretical beam pattern of a piston transducer with 2 cm diameter in air. The beam pattern is frequency dependent such that the beam width scales inversely proportional to frequency. Side-lobes are also predicted by the model. The first and second side-lobes are approximately 17 dB and 24 dB lower than the main lobe, respectively. The beam model is axial-symmetric and the main response axis is normal to the transducer at all frequencies. !2 2J1 (π λd sinθ) Dω (θ) = (4.4) π λd sinθ where Dω (θ) is the one-dimensional beam pattern at acoustic frequency ω and angle θ, d is the diameter of the transducer, λ is the wavelength in the medium at frequency ω, and J1 is the Bessel function of the first kind and order 1. This model is useful to verify several characteristics of the measured beam pattern. Specifically, β3dB of the main lobe can be quantitatively compared with measured results for a given frequency and the sidelobe levels can be verified. Figure 4.5 shows the theoretical beam patterns for several different frequencies of a piston transducer with an infinite baffle. An approximation to β3dB can be made 92 as follows:   −1 λ β3dB = 2sin 0.514 . (4.5) d This model predicts β3dB = 34.3◦ , 16.9◦ , and 11.24◦ at 30, 60, and 90 kHz for a 2 cm diameter. Based on the projector’s measured β3dB , the data align well with an effective aperture of 1.6 cm; 20% less than its physical aperture of 2.0 cm. This discrepancy is likely due to a combination of the smaller diameter active components internal to the membrane (sintered disk) and the added stiffness at the edges of the transducer where the membrane is held securely in place. The beam pattern depends directly on the wavelength of the sound, which in turn depends on the speed of sound in the medium. Although sound speed does change with temperature, its sensitivity is minimal at room temperature (1% for a 10◦ C change) and would not explain the 20% difference in beam width. 4.4.2 Example Beam Pattern of the Big Brown Bat, Eptesicus fuscus Echolocation calls from the big brown bat, E. fuscus, were recorded and processed to demonstrate the ability of the measurement system to record biosonar beam patterns. Three bats were trained to perform a target detection task while stationary on a platform 1 m from the array, and all emitted sonar signals from each trial were recorded. The reconstructed beam pattern for one example call is shown in Figure 4.6 for the frequencies of 40 kHz, 60 kHz, and 80 kHz. Results appear comparable to previous beam pattern measurements of E. fuscus [5, 6] and reasonably match the theoretical beam widths of β3dB = 56.1◦ , 36.5◦ , and 27.2◦ at 40, 60, and 80 kHz produced by a piston transducer with a 9.4 mm diameter. 93 A B C Figure 4.6. Aspect view and 6 dB contour plot of the reconstructed beam patterns for a single E. fuscus transmit pulse. The frequency-dependent beam magnitudes at 40 kHz (a), 60 kHz (b), and 80 kHz (c) are shown on a normalized magnitude (dB) scale. Beam widths at these frequencies appear consistent with past measurements for this species and can be approximated by a circular piston transducer with a fixed 4.7 mm radius. Color scale is used to reinforce the vertical axis. 4.5 Discussion With this array system we have constructed a tool of fine spatial resolution, high sampling rate, and rapid data collection that allows for investigations into the bat’s dynamic sonar beam. The relatively recent development of MEMS technology allows mass-production of low-cost sensors. The microphones used here have an extremely small acoustic aperture that ensures omni-directionality; however, current MEMS microphones introduce significant variability in the frequency response that must be equalized through careful calibration. Validation of the array was performed with a custom-built piston transducer and compared against a theoretical model. An example beam pattern from a single E. fuscus call was shown to demonstrate the 94 usefulness of the array in capturing biosonar beams. The array is primarily intended to be used with echolocating bats in a controlled laboratory environment, although measurements in the field would be possible after modifications to the mechanical assembly. As described in Sec. 4.4, beam measurements are being made of bats performing on a static platform. Future experiments are being planned with bats flying through an obstacle course. Systems containing a large number of sensors and supporting electronics are inherently more complex. Every additional sensor channel and electronic component reduces the mean time between failure so it is common for multi-element arrays to exhibit failed or degraded components over their supported lifetime. Two possible solutions to this problem are to 1) ensure a high degree of quality in component selection and manufacturing processes, and 2) design for maintainability. Our system aims to be a relatively low-cost solution to high-resolution acoustic measurement. Although high-quality ultrasonic microphones are commercially available, it is not yet feasible to use hundreds of these in a dense array due to their significant cost. To address reliability and repeatability concerns, the microphone boards were assembled by automation rather than by hand. The array was also designed with maintainability in mind by using pluggable circuit boards containing the microphone preamplifiers and are easily replaced. Another difficulty in developing measurement systems with high channel counts is that massive amounts of data must be simultaneously stored, processed, validated, and analyzed. This is where the parallel computing platform (i.e. FPGA) outper- forms even the fastest digital signal processor. Data storage and throughput have seen consistent growth in computing and this has been a necessary enabler for the amounts of digital data captured by a measurement system with 100 or more chan- nels. Validating and analyzing large amounts of information require automation to perform data reduction and signal processing tasks. The trend toward more measure- ment sensors will continue to be facilitated by keeping pace with advances in sensing 95 and computational technologies. Beyond the hardware, advanced signal processing techniques are used to ap- proach the beam reconstruction in a novel way. Newly developed algorithms to separate multiple harmonic components of biosonar signals were used. These tech- niques improved signal-to-noise ratio and allow more accurate tracking of energy across time and frequency. Beam pattern measurements are conventionally performed with Fourier analysis; however, separating multiple components allows other decom- positions (such as Hilbert spectral analysis) to be used that may be better suited for certain signals [29]. ARMA and MA modeling were introduced to better equalize the response of each microphone and reverse frequency dependent absorption effects. These zero-phase filters eliminate frequency dependent group-delay that is inherent in any linear, time-invariant model. Although non-causal, this approach works well for post-processed data and reduces phase errors to within machine precision. Experimental design played a significant role in the measurement accuracy. Initial data collected with a static platform showed significant spatial interference patterns in the form of frequency-dependent vertical notches. It was found that direct path signals received at the array were combined with a slightly delayed interfering reflection off the platform itself. Modifications were made to the platform to reduce the length, tilt forward, and cover with more attenuative fabric. The echolocation experiment was also modified to ensure that animals were echolocating from the front of the platform rather than the rear. Bats naturally perch upside-down, so it may be feasible to eliminate platform echoes by training some species to echolocate while inverted from a wire. The goal of any measurement system is to sense without interfering with the phenomenon being sensed. The microphone array was primarily designed to look at beam pattern emissions from echolocating bats during psychophysical experiments. The potential behavioral disturbance of introducing a large panel directly in front of the area to be measured was of concern. To mitigate this potential problem, the array 96 was covered with acoustic foam minimizing backscatter from the array. This proved to be effective and animals were trained successfully in a detection task on a stationary platform at 1 m facing the array. Experimental design warranted that the detection object was located between the array and the stationary platform. Preliminary data demonstrate that proximity to the array does not impact echolocation ability. The array location during in-flight experiments also needs to be carefully consid- ered. Many experiments in a controlled flight room are designed to test the animal’s echolocation ability while surrounded by dense clutter. The unfortunate problem is that physical objects used to alter the echolocation behavior also interfere with the free-field reception of the transmit beam. Only careful experimental design and data analysis can ensure no physical objects are interfering with the beam measurement. Although E. fuscus was previously found to emit two distinct ventral lobes, data collected with this array does not contain any evidence for multiple lobes. This does not constitute sufficient evidence against E. fuscus emitting dual-lobes. Rather, this characteristic was simply not observed under the circumstances of one particular stationary echolocation task. Given prior evidence of adaptive beam patterns, it would not be surprising if this ventral lobe was selectively used where beneficial to echolocation and suppressed otherwise. Additional experiments will be carried out to explore this discrepancy further. Future work with this measurement system is already underway. By combining high-resolution acoustic measurements of bats’ transmit beams with high-resolution and high-speed video during psychophysical experiments, we are investigating the nature of the dynamics of bat echolocation and the relationship of beam adjustments to target detection, localization, and tracking. 97 4.6 Acknowledgments The authors thank Leland Jackson (U. Rhode Island) for many interesting discussions on ARMA modeling and John Buck (U. Mass. Dartmouth) for helpful suggestions on beam pattern reconstruction. This work was supported by internal investment funding from the Naval Undersea Warfare Center, Division Newport, RI, ONR Grant No. N00014-09-1-0691, and NSF Grant No. DBI-1202833. References [1] G. Neuweiler, The Biology of Bats (Oxford University Press, New York, 2000), p. 320. [2] R. M¨ uller, “Numerical analysis of biosonar beamforming mechanisms and strate- gies in bats”, J. Acoust. Soc. Am. 128, 1414–1425 (2010). [3] D. Griffin, Listening in the Dark, The Acoustic Orientation of Bats and Men (Cornell University Press, London, 1958), p. 415. [4] J. A. Simmons, “Acoustic Radiation Patterns for the Echolocating Bats Chilonycteris rubiginosa and Eptesicus fuscus”, J. Acoust. Soc. Am. 46, 1054– 1056 (1969). [5] D. Hartley and R. Suthers, “The sound emission pattern of the echolocating bat, Eptesicus fuscus”, J. Acoust. Soc. Am. 85, 1348–1351 (1989). [6] K. Ghose and C. Moss, “The sonar beam pattern of a flying bat as it tracks tethered insects”, J. Acoust. Soc. Am. 114, 1120–1131 (2003). [7] K. Ghose, C. Moss, and T. Horiuchi, “Flying big brown bats emit a beam with two lobes in the vertical plane”, J. Acoust. Soc. Am. 122, 3717–3724 (2007). [8] A. Surlykke, S. Boel Pedersen, and L. Jakobsen, “Echolocating bats emit a highly directional sonar sound beam in the field”, Proc. R. Soc. B 276, 853–860 (2009). [9] Y. Yovel, B. Falk, C. F. Moss, and N. Ulanovsky, “Optimal Localization by Pointing Off Axis”, Science 327, 701–704 (2010). [10] N. Matsuta, S. Hiryu, E. Fujioka, Y. Yamada, H. Riquimaroux, and Y. Watan- abe, “Adaptive beam-width control of echolocation sounds by CF-FM bats, Rhi- nolophus ferrumequinum nippon, during prey-capture flight”, J. Exp. Biol. 216, 1210–1218 (2013). 98 [11] L. Jakobsen, J. M. Ratcliffe, and A. Surlykke, “Convergent acoustic field of view in echolocating bats”, Nature 493, 93–96 (2014). [12] L. Jakobsen, S. Brinkløv, and A. Surlykke, “Intensity and directionality of bat echolocation signals.”, Front. Physiol. 4, 1–9 (2013). [13] L. Jakobsen and A. Surlykke, “Vespertilionid bats control the width of their biosonar sound beam dynamically during prey pursuit”, Proc. Natl. Acad. Sci. U.S.A. 107, 13930–13935 (2010). [14] L. Jakobsen, E. K. V. Kalko, and A. Surlykke, “Echolocation beam shape in emballonurid bats, Saccopteryx bilineata and Cormura brevirostris”, Behav. Ecol. Sociobiol. 66, 1493–1502 (2012). [15] A. Surlykke, L. Jakobsen, E. K. V. Kalko, and R. A. Page, “Echolocation in- tensity and directionality of perching and flying fringe-lipped bats, Trachops cirrhosus (Phyllostomidae)”, Front. Physiol. 4, 1–9 (2013). [16] R. M¨uller, “A numerical study of the role of the tragus in the big brown bat”, J. Acoust. Soc. Am. 116, 3701–3712 (2004). [17] R. M¨ uller and J. C. T. Hallam, “Knowledge mining for biomimetic smart antenna shapes”, Rob. Autom. Syst. 50, 131–145 (2005). [18] D. Vanderelst, F. De Mey, H. Peremans, I. Geipel, E. Kalko, and U. Firzlaff, “What Noseleaves Do for FM Bats Depends on Their Degree of Sensorial Spe- cialization”, PLoS ONE 5, e11893 (2010). [19] D. Vanderelst, J. Reijniers, J. Steckel, and H. Peremans, “Information Gener- ated by the Moving Pinnae of Rhinolophus rouxi: Tuning of the Morphology at Different Harmonics”, PLoS ONE 6, e20627 (2011). [20] D. Vanderelst, R. Jonas, and P. Herbert, “The furrows of Rhinolophidae revis- ited”, J. R. Soc. Interface 9, 1100–1103 (2012). [21] D. Vanderelst, Y. Lee, I. Geipel, E. Kalko, Y. M. Kuo, and H. Peremans, “The noseleaf of Rhinolophus formosae focuses the Frequency Modulated (FM) com- ponent of the calls”, Front. Physiol. 4, 1–8 (2013). [22] P. W. Moore, L. A. Dankiewicz, and D. S. Houser, “Beamwidth control and angu- lar target detection in an echolocating bottlenose dolphin (Tursiops truncatus)”, J. Acoust. Soc. Am. 124, 3324–3332 (2008). [23] L. N. Kloepper, P. E. Nachtigall, M. J. Donahue, and M. Breese, “Active echolo- cation beam focusing in the false killer whale, Pseudorca crassidens”, J. Exp. Biol. 215, 1306–1312 (2012). [24] J. Starkhammar, M. Amundin, J. Nilsson, T. Jansson, S. A. Kuczaj, M. Almqvist, and H. W. Persson, “47-channel burst-mode recording hydrophone system enabling measurements of the dynamic echolocation behavior of free- swimming dolphins”, J. Acoust. Soc. Am. 126, 959–962 (2009). 99 [25] J. W. Shaffer, D. Moretti, S. Jarvis, P. Tyack, and M. Johnson, “Effective beam pattern of the Blainville’s beaked whale (Mesoplodon densirostris) and impli- cations for passive acoustic monitoring”, J. Acoust. Soc. Am. 133, 1770–1784 (2013). [26] W. Au and J. Simmons, “Echolocation in dolphins and bats”, Phys. Today 60, 40–45 (2007). [27] M. Gillette and H. Silverman, “A Linear Closed-Form Algorithm for Source Localization From Time-Differences of Arrival”, IEEE Signal Process. Lett. 15, 1–4 (2008). [28] O. Akay and G. Boudreaux-Bartels, “Fractional convolution and correlation via operator methods and an application to detection of linear FM signals”, IEEE Trans. Signal Process. 49, 979–993 (2001). [29] J. DiCecco, J. E. Gaudette, and J. A. Simmons, “Multi-component separation and analysis of bat echolocation calls”, J. Acoust. Soc. Am. 133, 538–546 (2013). [30] L. B. Jackson, Digital Filters and Signal Processing with MATLAB Exercises, 3rd ed. (Klewer Academic, Norwell, MA, 1995), pp. 323–372. [31] “ANSI S1.26-1995 (R2009) Method for Calculation of the Absorption of Sound by the Atmosphere”, American National Standards Institute, New York (2009). [32] W. I. Futterman, “Dispersive body waves”, J. Geophys. Res. 67, 5279–5291 (1962). [33] T. Bobach and G. Umlauf, “Natural Neighbor Concepts in Scattered Data In- terpolation and Discrete Function Approximation”, in in proceedings of Visual- ization of Large Unstructured Data Sets, 23–35 (2007). [34] J. A. Simmons, M. B. Fenton, W. R. Ferguson, M. Jutting, and J. Palin, Appara- tus for research on animal ultrasonic signals (Royal Ontario Museum, Toronto, 1979), p. 10. [35] L. B. Jackson, “Frequency-domain Steiglitz-McBride method for least-squares IIR filter design, ARMA modeling, and periodogram smoothing”, IEEE Signal Process. Lett. 15, 49–52 (2008). [36] S. Brinkløv, E. K. V. Kalko, and A. Surlykke, “Intense echolocation calls from two “whispering” bats, Artibeus jamaicensis and Macrophyllum macrophyllum (Phyllostomidae)”, J. Exp. Biol. 212, 11–20 (2009). [37] A. Surlykke and E. K. V. Kalko, “Echolocating Bats Cry Out Loud to Detect Their Prey”, PLoS ONE 3, e2036 (2008). [38] R. Urick, Principles of Underwater Sound, 3rd ed. (McGraw-Hill, New York, 1983), p. 43. 100 Chapter 5 Modeling Bio-Inspired Broadband Sonar for High-Resolution Angu- lar Imaging Abstract Echolocating mammals perceive images of targets with hyper-resolution and navi- gate seamlessly through obstacles in complex acoustic environments. The biological solution to imaging with sound is vastly different from man-made sonar. The most prominent difference is that instead of imaging with narrow beams and large aper- tures, bats ensonify a large spatial region and exploit broadband echo information to acoustically focus with about one degree of angular resolution. Using the additional information in the spectrum, angular localization may therefore be redefined as a spectral pattern matching problem. By imaging with wider beams this remarkable performance requires only a single broadband transmitter and two receive elements. Our computational modeling work provides new insight into the salient spatial in- formation encoded by the bat’s auditory system. Although information can increase with the highly complex baffle structures found in biological sonar, we show they are not theoretically required for good spatial resolution. Replicating bio-inspired acoustic processing techniques in man-made systems can reduce sonar array aperture requirements by two orders of magnitude for a variety of both aerial and underwater 101 acoustic sensing applications. Modeling and simulation results show the feasibility of designing a bio-inspired broadband sonar system for a compact high-resolution acoustic imaging solution. Also presented is a method for quantifying the theoretical limit to the resolving power for a given set of operating conditions and directivity patterns. 5.1 Introduction This chapter begins by describing the environmental acoustics and transducer beam patterns relevant to biosonar. Then we characterize the echo spectrum from reflective scatterers in the range-azimuth plane based on these models. This numerical result effectively quantifies the unique information contained in an echo arriving from any point in the range-azimuth plane. In an appendix, the physics-based model is adapted to the underwater environment. Elevation is omitted for simplicity, but including the additional dimension is straightforward. Our approach serves as a computational template for the design of a sonar system using micro-aperture broadband acoustic technology, or µBAT. 5.2 Modeling Broadband Acoustic Information There are three fundamental ways in which broadband acoustic signals are trans- formed in the context of biosonar: 1) the physical environment, 2) transducer direc- tivity patterns, and 3) reflective scatterer structure and composition. Each of these is explored independently and integrated at the end of this section. 102 5.2.1 Environmental Acoustics 5.2.1.1 The Transformation of Broadband Information in the Physical Envi- ronment The first aspect to be modeled is the physical environment – more specifically its impact on broadband signal propagation. A physics-based model was constructed that accounts for the two most prominent characteristics of acoustic transmission loss in the medium: geometrical spreading and acoustic absorption. Geometrical spreading losses are caused by acoustic waves propagating over an increasing volume. Since the amount of acoustic energy is finite, it spreads evenly over the surface area of the expanding wavefront. In the simplest case of free-field propagation, sound spreads spherically in all directions and is a quadratic function of distance, d, where the energy density is proportional to 1/d2 (a 6 dB energy loss per doubling of distance). There are cases in which sound energy cannot spread evenly in all directions (e.g. surface boundary layers, physical obstructions), but spherical spreading is the upper bound on this type of transmission loss. Due to its frequency independence, we consider here only free-field propagation and direct most of our attention to absorption losses. Absorption of sound by the atmosphere is highly frequency dependent and gen- erally imposes a low-pass filter effect on broadband acoustic waves. Unlike geometrical spreading, absorption is an exponential function of distance proportional to 10−dα/10 where distance, d, is specified in meters, and the absorption coefficient, α, is defined in dB/m units. It is caused by a combination of factors that become dominant in dif- ferent frequency regions. The existing models of absorption are complicated functions of several environmental parameters including ambient temperature, T , atmospheric pressure, ρ, and relative humidity, hr [1, 2, 3, 4]. The equations that describe these models have been rearranged in this section to clarify and make obvious their depen- dence on frequency. Doing this allows us to determine which aspects to absorption are most important to echolocation, and also quantify the sensitivity of absorption 103 to changing environmental parameters. The primary components affecting sound absorption in the air are attributed to “classical” physics (e.g. viscosity, heat conduction, and diffusion) and the molecular interactions with oxygen and nitrogen. Given T , ρ and hr , the equation for the attenuation coefficient in dB/m can be written as a function of frequency: α(f ) = αcr (f ) + αvib,O (f ) + αvib,N (f ). (5.1) Here, αcr is the absorption component due to classical physics and molecular rota- tional relaxation. αvib,O and αvib,N are the components due to molecular vibrational relaxation of oxygen and nitrogen, respectively. Figure 5.1 shows how these compo- nents contribute to the total absorption coefficient. Broadband echolocation signals occur on a rapid-time scale and over relatively short distances. Therefore, we can treat the environmental parameters as constants and rewrite Equation 5.1 to be strictly a function of frequency: f 2 FrO f 2 FrN     2 α(f ) = α ˆ cr f + α ˆ vib,O 2 +α ˆ vib,N 2 . (5.2) f 2 + FrO f 2 + FrN Parameters FrO and FrN are the scaled relaxation frequencies for oxygen and nitrogen and also depend on T , ρ and hr . The individual α ˆ components are computed as 1 T 2 ρ −1 αˆ cr = 1.598e-10 × , (5.3) T0 ρ0  − 52 T ˆ vib,O = 1.107e-1 × α e−2239.1/T , (5.4) T0  − 52 T ˆ vib,N = 9.277e-1 × α e−3352.0/T , (5.5) T0 where the constant T0 is the standard temperature of 293.15◦ K and ρ0 is the reference pressure of 1 atm. The equations for FrO and FrN are: 104 0.02h + h2    ρ FrO = 24 + 4.04e4 (5.6) ρ0 0.391 + h  − 12  −1 ! ρ T 4.170 1−T /T0 3 FrN = 9 + 280h × e (5.7) ρ0 T0 where T0 and ρ0 are defined as above and h is the molar concentration of water vapor computed from the estimates of relative humidity and temperature. By removing the dependence on frequency in Equation 5.2, these components reduce to constants. At nominal environmental conditions (T = 20◦ C, hr = 50%, and ρ = 1 atm), we calculate α ˆ cr = 1.598e-10, α ˆ vib,O = 5.334e-5, α ˆ vib,N = 1.004e-5, Fr,N = 332.1 Hz, and Fr,O = 35.45 kHz. For frequencies below about 1 kHz αvib,N dominates. Between approximately 1 kHz and 100 kHz αvib,O is dominant and above 100 kHz αcr is dominant. Figure 5.2 plots the absorption coefficient vs. frequency calculated at different temperatures between 0◦ C and 40◦ C at 1 atm. 5.2.1.2 Application of Broadband Transmission Loss to the Active Sonar Equa- tion We are interested in understanding the total spectral effect on a broadband signal as it propagates over distance. Therefore, we define the transmission loss, T L, as a function of frequency and distance based upon spherical spreading and absorption (see Fig. 5.3): T L(f, d) = T Lspr (d) + T Labs (f, d) (5.8) with 105 Components of absorption (T=20.0C, hr=50%) 4 10 Absorption Coefficient α / ρs (dB / m⋅atm) 2 10 0 10 −2 10 −4 10 αcr αv,O −6 10 αv,N αtot −8 10 1 2 3 4 5 6 10 10 10 10 10 10 Frequency/pressure (Hz/atm) Figure 5.1. The total absorption effect in air is the combined result of three individual components that dominate in different frequency regions. Investigating these components through superposition enhances our understading and allows us to organize this complex process into simpler models. Fr,N and Fr,O are the parameters that determine the cutoff frequencies where α ˆ vib,N and α ˆ vib,O saturate. From this separation of parameters, we see that αcr and αvib,O are the dominant characteristics in ultrasonic echolocation signals. d T Lspr (d) = 20 log10 , and (5.9) d0 T Labs (f, d) = α(f )(d − d0 ). (5.10) d0 is taken to be the reference distance for sound pressure level (SPL re 1µPa @ d0 ) in the sonar equations [5]. For the relatively short distances considered in bat sonar, d0 = 0.1 m is generally considered a reasonable value. When modeling distances d  d0 , Equation 5.10 typically reduces to T Labs (f, d) ≈ α(f ) × d. (5.11) 106 Atmospheric absorption coef. (T=0−40C, hr=50%) 3 10 Absorption Coefficient α / ρs (dB / m⋅atm) 40◦ C 35◦ C 30◦ C 2 25◦ C 10 0◦ C 5◦ C 1 10◦ C 10 15◦ C 20◦ C 0 10 −1 10 −2 10 −3 10 3 4 5 6 10 10 10 10 Frequency/pressure (Hz/atm) Figure 5.2. Absorption vs. frequency at 50% relative humidity plotted for temperatures between 0◦ C and 40◦ C in steps of 5◦ . The absorption coefficient curves are normalized to standard atmo- spheric pressure (1 atm.). Echolocating bats operate their sonar at ultrasonic frequencies within the transition region of αvib,O and αcr . Spherical Spreading Loss Frequency Dependent Absorption Loss in Air One−Way Transmission Loss in Air 40 40 80 10 kHz 10 kHz 35 A 35 15 kHz 22 kHz B 70 15 kHz 22 kHz C 32 kHz 32 kHz 30 30 46 kHz 60 46 kHz Attenuation (dB re 0.1 m) Attenuation (dB re 0.1 m) Attenuation (dB re 0.1 m) 68 kHz 68 kHz 100 kHz 100 kHz 25 25 50 20 20 40 15 15 30 10 10 20 5 5 10 0 −1 0 1 0 −1 0 1 0 −1 0 1 10 10 10 10 10 10 10 10 10 Distance (m) Distance (m) Distance (m) Figure 5.3. (a) Spherical spreading loss is a quadratic function of distance, T Lspr (d). (b) Frequency dependent absorption losses are a exponential function of distance, T Labs (f, d). Nominal values of 20◦ C, 50% rh, and 1 atm. were chosen for environmental parameters in calculating the absorption coefficient. (c) The combined transmission loss components due to both spreading and absorption, T L(f, d). Applying the sonar equation for an active broadband system, the echo strength (ES) at the face of the receive sensors can be estimated as a function of frequency and distance as 107 Frequency Dependent Echo Strength for TS = 0 dB 0 −20 Relative Echo Intensity (dB re 0.1 m) −40 −60 −80 10 kHz −100 15 kHz 22 kHz 32 kHz −120 46 kHz 68 kHz 100 kHz −140 −1 0 1 10 10 10 Target Range (m) Figure 5.4. Relative echo strength (ES) vs. distance at different frequencies for an ideal 0 dB point reflector. Spreading losses are dominant at close range due to the exponential vs. quadratic scaling with distance. Beyond approximately 0.5 to 1.0 m of distance traveled absorption becomes significant across the broad range of frequencies applicable to biosonar. ES(f, d) = SL(f ) − 2T L(f, d) + T S(f ) (5.12) where knowledge of source level (SL) and target strength (TS) for each scattering reflector is either given a priori or estimated through iteration. Figure 5.4 shows the relative echo strength for an ideal point reflector having a flat 0 dB TS across the entire spectrum. The consequence is that there exists a unique transfer function between the source and each point scatterer located at some distance, d, based on the two- way transmission loss in the physical environment. This range-dependent transfer function effectively predicts the physical environment’s impact on any broadband signal propagating through the medium. 108 5.2.2 Transducer Directivity Patterns 5.2.2.1 Broadband Spectral Information in Conventional Transducers Another important aspect of modeling broadband acoustic information is the directiv- ity pattern. Directivity (or beam) patterns of a sonar system are explicitly controlled by the transducer construction for signal transmission and/or reception. Whereas transmission losses impose a range-dependent frequency spectrum, a transducer’s di- rectivity pattern imposes an angular-dependent frequency spectrum. These patterns may be designed to be as simple or complex as is necessary to achieve the perfor- mance goals. Most man-made sonar systems utilize arrays of many basic piezoelectric elements to construct narrow beams. Directivity is achieved by the relative position- ing of these elements and applying different amplitude scaling factors and/or phase delays between them [6]. These elements are typically capable of both transmitting and receiving acoustic waves, but are sometimes used exclusively for one mode in concert with another transducer array. An important distinguishing characteristic of many such systems is that they are designed to operate over a relatively small bandwidth-to-center-frequency ratio. Consequently, standard beam patterns and ar- ray geometries are either designed for one particular frequency or constrained to have a constant beam width over many frequencies. Directivity patterns are defined by the magnitude response vs. angle. The phase response of individual elements is almost always ignored except when check- ing for mechanical consistency between elements. This is because absolute phase is irrelevant for conventional beamforming – regardless of the exact scatterer distance, non-stationarity and non-linearities in the medium cause the acoustic signal’s phase to converge on a uniform random variable (i.e. a non-coherent receiver). Although the phase variation with angle may be negligible under usual circumstances, it inherently exists in any real system with mechanical damping. Ignoring phase is lamentable when considering the broadband directivity patterns that are ubiquitous in biosonar. 109 The fundamental idea is not that echolocating animals have coherent receivers; it is that the broadband waveforms traveling through the physical medium retain their relative phase over the wide range of frequencies. When the biosonar signals emitted contain multiple harmonics, this is known as harmonic coherence [7]. The directivity of a transducer element is intimately related to its geometrical structure and operating frequencies. For a transducer with fixed physical aperture, the beam pattern will scale with frequency due to the close interaction between aperture and wavelength. Piezoelectric elements are usually constructed out of simple shapes such as cylinders or rectangular blocks. When the entire crystal surface vibrates in unison along the acoustic axis, the instantaneous pressure and particle velocity waves propagate outward from each point on the surface at a constant speed. The group interference of this continuum of waves causes the angular dependent amplitude which is manifested as the far-field directivity pattern. Ignoring any backward propagation (assuming an infinite baffle), the theoretical directivity pattern for a piston transducer is [8, Ch. 11] !2 2J1 (k d2 sin θ) D(k, θ) = , (5.13) k d2 sin θ where J1 is the Bessel function of the first kind and order 1, k = 2π/λ is the acoustic wavenumber, d is the piston’s diameter, and θ is the off-axis angle. The defining parameter of the piston’s directivity pattern is the aperture to wavelength ratio, d/λ. This particular beam response is radial symmetric as would be expected. Figure 5.5 shows the transducer’s amplitude vs. angle in air for a 1 cm diameter piston across the frequency range of 10 to 100 kHz. Long wavelengths relative to the aperture will have a broad beam, while short wavelengths will have narrower beams and many sidelobes. Plotting the linear amplitude rather than magnitude emphasizes that each alternating sidelobe exhibits phase reversal by 180◦ . 110 Acoustic Directivity Pattern in Air (d=1cm) 10 kHz 15 kHz 1 22 kHz Normalized Pressure Amplitude 32 kHz 46 kHz 0.8 68 kHz 100 kHz 0.6 0.4 0.2 0 −0.2 −80 −60 −40 −20 0 20 40 60 80 Angle (°) Figure 5.5. Theoretical directivity pattern for a piston transducer in air with a fixed circular aperture of 0.94 cm. Low frequencies remain omni-directional since λ  d, whereas high frequencies contain sidelobes that alternate between positive and negative amplitudes. If the wavenumber, k, is made complex it can account for damping in the mechanical system and phase will become continuous between reversals [9, p. 145]. 5.2.2.2 Bio-Acoustic Baffle Structures and Implications for Modeling In biosonar, transmitted patterns are defined by the geometry of the mouth or nose- leaf in different species in bats [10] and the melon in odontocetes. Received directivity patterns arise through a complex yet minimal set of acoustic baffles (two ears for bats and mandibular structure in odontocetes [11, 12, 13, 14]) that change the magnitude and phase response dramatically over frequency and angle. No closed form expres- sions are available for the directivity pattern of such complex baffles; however, finite element methods have successfully predicted the directivity of many biosonar struc- tures [15, 16, 11]. The transmit directivity pattern in azimuth of E. fuscus has been approximated by a piston transducer of 9.4 mm diameter with reasonable matching of 3 dB beam widths over most of the frequency range. To date, no directivity patterns have been published that rely on the finite element method applied to the oral cavity. 111 Obliquely truncated horn models appear to be good approximations of receiv- ing ears in biosonar [17]. Acoustic horns amplify sound produced or received and have a frequency dependent magnitude response along the radial axis [18]. Angular directivity of a horn is similar to a piston transducer model where higher frequencies have a narrower main lobe and sidelobes that scale inward (Figure 5.6). A defining characteristic of the truncation angle is that at low frequencies the MRA shifts from off-axis, normal to the angle of trancation, and toward the radial cone axis at higher frequencies. This characteristic is present in the azimuthal receive measurements of E. fuscus (refer back to Fig. 1.2). In addition to the magnitude response, phase response is also present. The presence of phase spatial structure in an acoustic baffle makes intuitive sense, because the sound path through the acoustic baffle to the pressure sensing element will be dependent on the angle of incidence and wavelength. The difference in sound path length vs. angle may be relatively large compared to the wavelengths (e.g. E. fuscus ear length is about 8 mm; λ = 3.4 mm at 100 kHz). By reciprocity, the same arguments would apply for an acoustic baffle used in sound transmission. What is interesting here is not the fact that echolocating animals have compli- cated baffle structures nor the distinct directivity patterns they provide, but instead what they can achieve by having a spectral pattern that is unique across angle. Al- though biosonar beam patterns are typically complicated functions of frequency and angle, we later show that the concept of spatial imaging through spectral pattern matching can actually be performed using standard piezoelectric transducers. To demonstrate this, in Section 5.3 a simple biosonar array of three circular aperture piston transducers can achieve fine angular resolution without the convoluted struc- tures found in biological sonar systems. 112 A B Peak Magnitude Level Magnitude (dB) `m 0 −10 −20 β −30 10 20 30 40 50 60 70 80 90 100 Maximum Response Axis L 40 elevation azimuth Angle (deg) 20 0 α −20 10 20 30 40 50 60 70 80 90 100 `t Frequency (kHz) C D Figure 5.6. Example beam pattern data measured from an obliquely truncated horn. (a) The geometry of a simple truncated horn can be described by several parameters. `t and `m are the diameters of the throat and mouth before truncation. L is the length of the horn, α is the conical angle, and β is the angle of truncation. (b) Data were collected with a truncated horn constructed out of a flexible rubber sheet (`t = 0.4 cm, `t = 3 cm, α = 20◦ , β = 45◦ ). A projector at the throat of the horn emitted linear FM chirps from 100 to 10 kHz and a mechanically aligned microphone received the signal at 3◦ increments in azimuth and elevation. The mouth of the horn acts as a circular aperture of diameter, `m , that can be closely approximated as a piston transducer response and a frequency-dependent magnitude defined by the acoustic gain of the horn’s mouth to throat ratio. The main response axis (MRA) of a standard conical horn would remain along the cone’s radial axis (0◦ ) for all frequencies; however, for an obliquely truncated horn the MRA will shift off-axis, normal to the truncation at low frequencies [18]. The data support this finding, but also show that the harmonic frequency shifts the MRA to match the fundamental. (c) The beam’s magnitude response of the constructed horn at 47.6 kHz shows a clear main lobe centered around 16◦ in elevation. (d) Interestingly, the phase response shows a significant amount of spatial structure, which was unexpected. In the main-lobe, phase varies only slightly whereas off-axis the phase varies significantly across frequency. Measured acoustic horn data was provided courtesy of Mittu Pannala and Rolf M¨ uller of Virginia Tech. 113 5.2.3 Reflective Scatterer Structure and Composition An ideal point scatterer reflects all incident energy with a flat magnitude spectrum and zero phase and group delay. Real acoustic objects are usually modeled as having multiple ideal point reflectors and a constant target strength (the ratio of attenuation or gain of the reflected to incident energy) that may also depend upon the aspect angle [19]. This model is reasonable when 1) the object consists of one or more dominant points of specular reflection or surface protuberances having some spatial extent, and 2) the frequencies of interest have a relatively small bandwidth-to-center frequency ratio (e.g. most man-made active sonar systems). In the case of a broad- band biosonar system, the first assumption may be reasonable; a good example would be the wingtips of an insect. However, the second assumption requires more care- ful consideration and the frequency dependence of target strength must be further examined. According to theory, any convex surface that is rigid and smooth reflects energy independent of frequency if the following conditions are met [5, p. 291]: • ka1 , ka2  1, and • the object is in the acoustic farfield. Here, k = ω/c = 2π/λ is the wave number and a1 , a2 are the principal radii of the convex curvature. For frequencies down to 20 kHz in air, this would require a surface with radii a1 and a2  2.7 mm. To satisfy the second constraint, a 1 cm diameter transducer operating up to 100 kHz means the object must be no closer than the Fraunhofer distance [9, Ch. 8] of 2d2 /λ ≈ 6 cm. Under these conditions the target strength can be expressed as a1 a2 T S = 10 log10 . (5.14) 4 For natural and man-made objects with flat surfaces, target strength can be 114 categorized under two distinct situations shown in Table 5.1. Target strength of objects having infinite dimensions relative to the proximity of the sonar (such as a wall or long cable) do not depend on frequency when ka1,2  1. Objects with finite linear dimensions relative to the sonar beam (such as a cylinder or small plate) increases with frequency. This class of objects actually have the opposite effect from the natural low-pass filtering by frequency absorption and off-axis directivity of a fixed-aperture transducer. TS independent of f TS increases with f a1 a2 T Sconvex = 10 log10 4 a2 T Ssphere = 10 log10 4 r2 A2 T Splate∞ = 10 log10 4 T Splate = 10 log10 λ ar aL2 T Scyl∞ = 10 log10 2 T Scyl = 10 log10 2λ Table 5.1. Target strength of various simple geometrical objects. Objects that have either con- vex surfaces or large dimensions relative to the sonar field are theoretically frequency independent. Spheres, for example, are often used as sonar test targets due to their frequency and aspect indepen- dence and predictable target strength. Objects that have finite linear dimensions depend strongly on frequency, but target strength actually increases with frequency to form a high-pass transfer function. Legend: a, spherical radius; r, circular radius; A, surface area; L, cylinder height; λ, wavelength. Target strength of fish at dorsal aspect −25 −30 −35 target strength, TS (dB) −40 −45 −50 1 kHz 10 kHz −55 100 kHz −60 −65 −70 0 1 2 10 10 10 fish length (cm) Figure 5.7. The target strength of an individual fish at dorsal aspect is strongly correlated with its length. A minor correction term is added for a slight frequency dependence, but target strength only drops by less than 1 dB from 10 kHz to 100 kHz. Therefore, the size of the fish is the most important predictor of a reflected echo. These values were found to be valid over the range 0.7 < L/λ < 90 [5]. In reality, there is rarely a case where objects consist of simple geometrical 115 shapes. Nonetheless, real acoustic objects can be deconstructed into numerous sim- pler structures based on the dominant reflective scatterers. In water, for example, the target strength of fish from a dorsal aspect has been empirically derived [5, p. 315] to depend primarily on its length, L, with a small correction factor for frequency, f , as T S(f ) = 19.1 log10 L − 0.9 log10 f − 62 with length in units of cm (Fig. 5.7). Due to the large acoustic impedance mismatch, the gas-filled swim bladder is usually the primary source of reflection and most of the remaining body is acoustically transpar- ent [20]. Even though there is a small correction factor for frequency, target strength remains relatively independent of frequency and implies that the swim blatter forms a generally convex reflective surface. So far, we have only considered the geometrical structure of reflective scatterers. The physical composition of these objects also matters when acoustic impedance may cause effects such as resonance of the object. This greatly affects the target strength at a particular frequency, but its impact should be local to the resonance. Spheres for instance, have been well characterized for use as ideal reflectors due to their frequency independent properties and minimal dependence on aspect angle. At low frequencies (i.e. ka1,2 < 1) there are creeping waves, internal reflections, and other secondary artifacts add to the overall echo structure [5]. Marine mammals have been shown to easily detect a hollow versus filled cylinder [21]. In fact, determining the composition of objects is of high interest for a variety of maritime applications and the source of information most useful for classification or automatic target recognition (ATR). For the purposes of localization and imaging, the directly reflected wavefront is the primary echo component of interest and not the secondary wave artifacts that follow. 5.2.4 The Broadband Echo Spectrum in the Range-Azimuth Plane The integration of all three broadband factors—environmental acoustics, directivity patterns, and target strength—is straightforward. Since range-dependent transmis- sion loss (Fig. 5.4) and angle-dependent directivity patterns (Fig. 5.5) are independent 116 phenomena, the spectra at any particular range and angle are simply multiplied to result in a 3-dimensional volume of relative echo intensity across range, azimuth, and frequency. The full-spectrum target strength of an object at some location would be applied in the same manner. For demonstration purposes, a single ideal scatterer with constant target strength is assumed. Additionally, including directivity patterns for elevation would require a fourth dimension, but is omitted here for simplicity. With the physics-based models and assumptions in place, Figure 5.8 shows the expected magnitude spectrum of an echo at any particular point in the range-azimuth plane. Assuming these models of the physics are correct, an echo arriving from a specific location in space would have a spectrum matching the line cut vertically across the frequency dimension. Thus, the multi-dimensional data set shown here corresponds to a look-up table for the range and angle dependent transfer function imposed by the environment and transducers. This data could be used to implement a broadband matched-field processing algorithm and scan the acoustic space in a manner similar to a beamformer. Conversely, the spectrum of a discrete echo received at some random angle may be compared and matched to the spectra across the entire space while minimizing some error function. The latter scenario would form the basis for a machine learning classifier or regression algorithm. 5.3 Extraction of Broadband Spatial Information from Echoes 5.3.1 Quantifying the Angular Resolution Limit The accuracy of spectral localization in biosonar has been studied to varying degrees using the Cramer-Rao lower bound (CRLB) [22, 23, 24, 25, 26]. However, the angular resolution that is achievable with a biosonar solution has only been investigated in animals [27]. Resolution is conventionally defined as the minimum distance between 117 Figure 5.8. The relative intensity of an echo is shown as a function of range, azimuth, and frequency. Acoustic transmission loss and the composite transmit-receive beam patterns are independent func- tions of frequency. By combining these independent functions for each receive element, the echo spectrum can be estimated a priori for any point in the range-azimuth plane. This idea extends to the dimension of elevation, but is restricted to range-azimuth here for visualization of the data. two signals that arrive concurrently at a receiver, while still being resolved as distinct objects [28]. The general definition applies to the spatial domain of range as well as angle, but here we are most interested in angular resolution. This critical piece of in- formation is necessary to begin developing bio-inspired broadband sonar systems for real-world applications that must appropriately separate on-axis target echoes from simultaneously arriving off-axis clutter. In conventional beamforming, the resolution of an array is easily determined to be the half-power width of the summed beam pattern (see Section 2.2). For biosonar, this is a much more difficult quantity to cal- culate, because the beam pattern width is not what determines imaging performance. What follows is a simplified approach to demonstrate the acoustic resolving power of using broadband spectral information in addition to the time delay between receive elements. To evaluate the information carried by broadband echoes from anywhere across 118 the range-azimuth plane we use a simplified error metric, the minimum L1 distance, to quantify the ability to discriminate between the predicted spectrum (or transfer function) of a single focal point and the predicted spectra across all other points in the range-azimuth plane. Plotting the L1 distance over range and azimuth produces an error surface whereby the characteristics of the spectrum can be discriminated by selecting an error threshold. The error surfaces are shown in units of time, since both the mammalian auditory system and our bio-inspired model encode the frequency dependent amplitude of sound logarithmically into time. This logarithmic translation is known as amplitude-latency trading (ALT) and was first observed as a psychoacoustic effect during behavioral experiments [29]. ALT is a psychological shift in relative time delay that amounts to approximately 16 µs/dB. For example, an echo that is 6 dB louder will appear to arrive ≈ 96 µs earlier. The perceptual limits for echo range discrimination in bats has been measured to be within 2-3 µs [30] or even less than 0.5 µs in 180◦ phase reversal experiments [31]. These experimental results imply that the biological sonar system would have little trouble resolving targets above a discrimination threshold of on the order of 10 µs. For a man-made system, such timing constraints are controllable and fairly easy to meet or exceed with careful design. The fundamental limitation for these systems would likely be the noise introduced by scatterer echoes. It is unknown exactly where ALT originates in the auditory system, however it is likely a narrowband physiological effect beginning with the inner hair cells (IHC) of the cochlea and contributed to at every incremental neural stage afterward. Its significance to our receiver model is that relative differences in echo amplitude across frequency will directly correspond to a decorrelation of broadband echoes in time. Therefore, echoes can be localized by selectively adjusting the timing parameters across individual frequency channels. Likewise, echoes that fall outside of a focal region (e.g. selective attention) can be simply filtered out and passed to a peripheral imaging process. 119 5.3.2 Broadband Acoustic Focusing with a Single Piston Transducer As a simple example, when using a single transducer for both transmit and receive, the beam patterns are identical (e.g. Fig. 5.5). In this case, the entire frontal hemi- sphere is ensonified with an arbitrary broadband waveform, but acoustically focusing to 0◦ will result in a region that is approximately as narrow as the beam width for the highest frequency used. Figure 5.9a shows approximately 30◦ resolution is achiev- able for a 0.94 cm piston transducer when focusing at 4.25 m range and 0◦ azimuth. Interestingly, the region of focus is highly range dependent and also shows a slight bias toward closer ranges for off-axis echoes, because both transducer directivity and acoustic absorption exhibit a low-pass filtered response. This error surface does not in- corporate any of the high-resolution range information available by cross-correlation, but would be used in practice to restrict the biosonar algorithm to search possible angles within a single range. An important caveat with these results is they assume an ideal point reflector and infinite signal-to-noise ratio. A B Figure 5.9. The region of focus after applying the L1 spectral distance around 4.5 m at 0◦ azimuth (a) and 25◦ off-axis (b) for a single transmit-receive transducer. The color depth shows the distance metric in units of µs, which is a direct logarithmic conversion from amplitude. The 0.94 cm bidirec- tional transducer is modeled with the full bandwidth from 10 to 100 kHz. Despite having a nearly omni-directional beam at low frequencies, the range and angle dependent spectral characteristics are significant enough to distinguish between echoes arriving from the focal region and elsewhere in the range-azimuth plane. A single broadband transducer could therefore serve as a very cost-effective obstacle avoidance sensor without requiring a large aperture. Acoustic focusing through the L1 minimization can be applied for any point in space, given sufficient signal-to-noise ratio. Figure 5.9b demonstrates the region of 120 focus applied off-axis to 4.5 m and 25◦ . Due to the symmetry of the piston transducer’s beam pattern, the spectral distance is zero between ±25◦ left and right. In fact, this symmetry persists around the entire radial axis unless the beam pattern symmetry can be broken in some manner. More interesting, though, is that the large angular width of the focus region is significantly reduced. The reason focusing off-axis improves resolution is that the beam patterns have an area of highest sensitivity off-axis, where the derivative of the beam with respect to angle (spatial gradient) is significantly larger than on-axis at 0◦ [23, 32]. These results can be iteratively applied for multiple ranges to form a sector scan of the frontal region, or selectively when an echo has been detected. The actual implementation will depend on the intended application and operating environment. 5.3.3 Broadband Acoustic Focusing with a Bio-Inspired Array Although demonstrating angular localization with only a single broadband transducer is impressive, a pair of identical piston transducers can be used to eliminate the off-axis ambiguities. Furthermore, by orienting each sensor off-axis by approximately ±25◦ we establish much higher angular resolution along the main-response axis. Each transducer still has wide spatial coverage, but the difference in the beam pattern orientations provide additional information to be gleaned. Figure 5.10 shows a bio- inspired conceptual array based on the dimensions of the mouth and ears in an adult E. fuscus. The horn shaped baffles and complex mouth cavity have been replaced by standard piston transducers to show evidence that a bio-inspired broadband sonar does not require complex beam patterns to localize and resolve echoes with precision. Focusing on a point at center (0◦ , 4.5 m) with this binaural configuration sig- nificantly reduces the region of focus to several degrees; however, this causes angular ambiguity as seen before (Fig. 5.9). Figure 5.11 shows the improved resolution for each transmit-receive transducer pair and evidence of the problem of left-right ambi- guity. 121 Figure 5.10. A bio-inspired broadband sonar array is proposed utilizing only three circular piston- like elements. A single broadband transmit element is used with the main response axis pointed directly forward. Two broadband receive elements are oriented off-axis by 25◦ . The array geometry approximates the size and relative locations of the acoustic baffles, ears and mouth, in E. fuscus. The information available to the sonar system is the absolute pulse-echo time delay for range estimates, the relative time-delay between receive sensors for rough horizontal angle estimates, and the spectral information for the left-right transmit-receive pairs for precise, but ambiguous angle and range estimates. A B Figure 5.11. The region of focus after applying the L1 spectral distance around 4.5 m at 0◦ azimuth for a single transmitter and a pair of identical receive transducers. The receive elements are oriented outward by 25◦ . Color depth shows the distance metric in units of µs. Each transmit-receive pair shows improved resolution performance, but ambiguous regions to the left (a) and right (b). These ambiguities can be resolved by comparing the spectral distances for each receive element. With a pair of receive elements, additional time delay information is also available. One method of reducing ambiguity is to use the time-difference of arrival (TDOA) between each sensor. Figure 5.12 plots the relative time delay between receive ele- ments spaced 1.4 cm apart. In air, the difference in echo arrival would be anywhere between ±40 µs. For an echo arriving at 0◦ , the localization accuracy would not be sufficient to achieve high-resolution imaging [32]. Although TDOA is relatively insen- sitive across angle, this can be used to eliminate the ambiguity shown in Figure 5.11. 122 TDOA also has biological significance, because the interaural time delay (ITD) is a primary auditory cue for localization in azimuth by hearing mammals [33]. Figure 5.12. The time difference of arrival (TDOA) between two receiving transducers is shown when separated by 1.4 cm. As expected, TDOA information is range-independent. Although it has been shown to be a useful approach to biomimetic localization [34, 35], the lack of sensitivity with angle renders it highly inaccurate for localization when used as the only source of angular information. After combining the resulting spectral distance from each sensor and the TDOA localization (Fig. 5.12), we can see that high-resolution is certainly achievable under the imposed assumptions. To compute the accuracy of localization in the form of the Cramer-Rao Lower Bound (CRLB), we must include noise to the model intro- duced as ambient acoustic noise, internally generated electronic (or neural) noise, and perturbations caused by natural imperfections in target echoes, sonar system beam patterns, and inaccuracies of the environmental models. 5.3.4 Mutual Interference and the Diffraction Patterns of Scatterers Feynman [36, p. 30-1] prefaced a discussion on the subject of diffraction by stating: No-one has ever been able to define the difference between interference and diffraction satisfactorily. It is just a question of usage, and there is no specific, important physical difference between them. When multiple acoustic waveforms coincide simultaneously in space and time, the pressure and particle velocity of each wave combines through constructive and de- 123 Figure 5.13. The region of focus after combining binaural spectrogram correlation and TDOA estimates. By fusing the L1 spectral difference for left and right sensors and the relative time delay between sensors, the region of focus is restricted to a single unambiguous point. A timing threshold could be selected based on a sliding window to adjust for spectral and other noise sources. With a threshold of 20 µs, the region of focus is restricted to only about 2◦ degrees – reasonably close to the horizontal acuity of E. fuscus [27]. This result clearly demonstrates high-resolution angular imaging is theoretically achievable by including broadband spectral information. structive interference. This interference causes specific temporal and/or spatial pat- terns. For example, a predictable pattern of spectral notches appears in the over- lapping echoes from two or more closely spaced point reflectors. The bat’s neural circuitry exploits these temporal interference patterns and has been shown to be the mechanism responsible for achieving hyper-resolution in range [37, 38, 39, 40, 41]. Several possible solutions may be used by the bat to cope with echoes arriving simultaneously from multiple locations: 1. It is certainly possible that interference patterns by echoes arriving from sepa- rate angles have a distinct pattern and can be deconvolved by the same neural circuitry for hyper-resolution in range. Reconstructing the coincidence of echoes in this manner would be enough to perform localization through spectral pattern matching on the separated echoes. 2. With the very high repetition rates of pulse emissions, the simultaneous coinci- dence of two echoes in clutter may be overcome by rejecting these echoes and waiting until the subsequent pulse’s echoes arrive. Therefore, mutual interfer- ence of two echoes could be interpreted as a single invalid echo without achieving 124 sufficient coincidence to register in the high-resolution display. 3. The possibility that two echoes overlap with perfect coherence at both ears at exactly the same time is unlikely, even in dense clutter. Statistical averaging over many pulses may be the simplest solution. 4. If and when interference persists, the animals may resolve the ambiguity by changing the shape of the beams through adaptive methods, such as receive beam movements. In fact, these beam pattern dynamics are just being discov- ered and appear to be intentional [42, 43]. Regardless of the actual mechanisms by which bats handle interfering echoes ar- riving concurrently, observation and laboratory experiments with these animals have shown that deconvolution and subsequent clutter rejection is not only possible, but also reliably consistent [44]. Likewise, successfully dealing with mutually interfering scatterers is critical to the success of any bio-inspired sonar system, and failure to implement a working deconvolution process will restrict the sonar’s operation to a subset of trivial scenarios without clutter. Resolution, after all, is defined using the mutual interference of two or more scatterers arriving simultaneously. Without this angular resolution, the bio-inspired sonar system remains a research project. 5.4 Performance Comparison with Conventional Acoustic Imaging One of the common points of contention for researchers studying biosonar is a lack of sufficient comparison metrics with existing approaches. In many respects, this is a difficult comparison to make due to 1) a weak understanding of exactly how animals process acoustic signals, and 2) the fundamental difference in the information con- tent being processed. This section addresses these concerns by providing a baseline comparison with conventional narrowband beamforming, while still including the ad- 125 vantage of signal bandwidth. Conventional beamforming techniques utilize only the relative time delay between elements to perform angular imaging. Given the same basic set of sonar array geometry, element beam patterns, and acoustic signals avail- able to the bat, the results here clearly show the advantages of combining time delay and spectral information over conventional delay-and-sum beamforming alone. 5.4.1 Processing Broadband Signals with Suboptimal Element Spacing There are many possible ways to compare conventional acoustic imaging systems with the biosonar imaging approach. Since we claim that signal bandwidth is the critical enabler for biosonar’s additional resolving power, it would be appropriate to include the same bandwidth in conventional beamforming for a fair comparison. The main difficulties are that biosonar elements are widely spaced relative to the wavelengths in air (d = 1.4 cm; 0.34 cm ≤ λ ≤ 3.4 cm between 20 to 100 kHz) and the beam patterns vary significantly over the broad range of frequencies used. To resolve these issues, broadband signals can be processed with multiple narrowband beams and then combined additively. This is precisely the solution proposed by Hinich to perform broadband array signal processing while removing the angular ambiguity caused by insufficient array spacing (when d > λ/2) [45]. The phase delay beamformer is a standard narrowband method for producing multi-beam acoustic images (see Section 2.2 for an overview). For every frequency, f , and steer angle, θ, the phase delay beamformer response of an N element array is computed as Y (f, θ) = df (θ)WxTf (5.15) where df (θ) is the 1 × N steering vector of complex phase delays steered to angle θ, W is the diagonal N × N aperture shading matrix, and xTf is the transposed N × 1 complex frequency data vector for a single time or range bin. To apply broadband 126 processing via the Hinich method, the steered beam response at each frequency is summed over M discrete frequencies, f = {f1 , f2 . . . fM }, as M X Ysum (θ) = |Y (fi , θ)| (5.16) i=1 where the choice of f might be aligned with FFT bins containing the signal. To illustrate this technique, Figure 5.14 shows the narrowband and summed beamformer response to an ideal target (i.e. xf = df (ψ) with a point target at angle, ψ) for an array of N = 10 elements spaced by d = 1.4 cm. In the case of suboptimally spaced elements the main lobe of every beam points to the steered angle, while grating lobe angles vary with frequency. Thus, main lobes add coherently and grating lobes are reduced by averaging with side lobes. This example shows that grating lobes can effectively be suppressed at the cost of increased side lobe levels. The angular resolution of the beam approaches the half-power beam width of the highest frequency beam, although the energy does taper off more slowly with angle. For the case of a biosonar system, the receive array from Figure 5.10 consists of N = 2 elements spaced by d = 1.4 cm. With only two elements, application of aperture shading coefficients becomes impossible and W is simply the 2 × 2 identity matrix. Figure 5.15 plots the beam response at several narrowband frequencies and the combined beam pattern using the Hinich approach. This simplified array creates a dipole beam pattern, which is aliased above 12 kHz. The narrowband beam patterns no longer contain any side lobes and consist entirely of a large main lobe and repeating grating lobes. Summing beams across frequency as in Equation 5.4.1 does not provide any obvious benefit for directivity or resolution. These results assume omni-directional transmit and receive elements for sim- plicity. This analysis could be extended to include directivity of individual elements; however, the resulting conclusions would be the same – conventional array signal processing with a N = 2 element array has insufficient resolution and large an- 127 Beam Response (ψ=0°, N=10, d=1.4cm) Combined Beam Response (ψ=0°, N=10, d=1.4cm) 10 10 A C Mag. (dB) Mag. (dB) 0 0 −10 −10 −20 −20 −80 −60 −40 −20 0 20 40 60 80 −80 −60 −40 −20 0 20 40 60 80 10 kHz 60 kHz 100 kHz 1 1 B D Amplitude Amplitude 0.5 0.5 0 0 −0.5 −0.5 −1 −1 −80 −60 −40 −20 0 20 40 60 80 −80 −60 −40 −20 0 20 40 60 80 Bearing Angle, θ (deg.) Bearing Angle, θ (deg.) Figure 5.14. The beam patterns of an array with N = 10 omni-directional elements spaced at d = 1.4 cm. Chebychev aperture shading coefficients for 20 dB side lobes are applied to the beams. Beam patterns for several narrowband frequencies (a and b) and the combined beam pattern across all frequencies (c and d) are shown for both magnitude (top) and linear amplitude (bottom) for illustration. In this example, the steered angle is ψ = 0◦ , but the concept works for any steered angle. The design frequency of the uniform line array, fd = λ/2, is approximately 12.3 kHz in air (given c = 344 m/s). Note that the beam patterns at higher frequencies scale inward by cos(θ) and are spatially aliased for frequencies above fd . When the beam patterns across all frequencies in the decade from 10 to 100 kHz are combined in steps of 1 kHz, grating lobes can effectively be suppressed. The abrupt change in sideband levels occur at the grating lobe locations for the highest frequency beam. Beam Response (ψ=0°, N=2, d=1.4cm) Combined Beam Response (ψ=0°, N=2, d=1.4cm) 10 10 A C Mag. (dB) Mag. (dB) 0 0 −10 −10 −20 −20 −80 −60 −40 −20 0 20 40 60 80 −80 −60 −40 −20 0 20 40 60 80 10 kHz 60 kHz 100 kHz 1 B 1 D Amplitude Amplitude 0.5 0.5 0 0 −0.5 −0.5 −1 −1 −80 −60 −40 −20 0 20 40 60 80 −80 −60 −40 −20 0 20 40 60 80 Bearing Angle, θ (deg.) Bearing Angle, θ (deg.) Figure 5.15. The beam patterns of an array with N = 2 omni-directional elements spaced apart by d = 1.4 cm. No aperture shading is possible with only 2 elements. The natural (ψ = 0◦ ) beam response at 10, 60, and 100 kHz is plotted in log-magnitude (a) and linear amplitude (b) units. The combined beam response is shown for the same linearly spaced frequencies between 10 and 100 kHz as in Figure 5.14 (c and d). With no side lobes present, the grating lobes are averaged together and only results in approximately 3 dB sideband suppression, which is not useful for high-resolution angular imaging. 128 gular ambiguities. Most importantly, this minimal array configuration will always have complete ambiguity in elevation unless additional vertical elements are added or broadband spectral information is included in the processing. 5.4.2 Coherent Summation of Broadband Signals Narrowband signals received by a sonar are not typically coherent and the environ- ment will force the phase of incident waves to behave as a uniform random variable. At first glance, the destruction of phase information by wave propagation implies that only the magnitude information persists at the receiver. However, even when the environment causes the absolute phase of a signal to become random, the rela- tive coherence of a broadband signal may still persist if an acoustic wave travels in unison across the same ray propagation path. Acoustic dispersion would be direct evidence to the contrary, but this does not appear to be a significant factor across the frequencies or short distances relevant to biosonar, either in air or water. If these assumptions hold true, then the inclusion of phase information in combined beam pat- terns is warranted and will provide new information that is absent from the original Hinich approach. Equation 5.16 required adding the absolute value of each frequency dependent beam pattern. Instead, phase information in the form of alternating positive-negative amplitudes can be included in the summation as M X Ysum (θ) = Y (fi , θ) (5.17) i=1 where f is selected as before. Since the grating lobes alternate between positive and negative amplitudes, the exact method for choosing f becomes more important. Fig- ure 5.16 shows the effect of summing linearly across frequencies (e.g. consecutive FFT bins) compared with summing logarithmically (e.g. constant-Q filterbank, wavelets). In either case, coherent broadband processing produces reasonable sidelobe suppres- 129 sion. As before, the summed beamformer resolution will approach the width of the highest beam. The reason logarithmic frequencies produce better sideband suppres- sion is that there is more balanced cancellation between positive and negative grating lobes. Combined Beam Response (ψ=0°, N=2, d=1.4cm) Combined Beam Response (ψ=0°, N=2, d=1.4cm) 10 10 A C Mag. (dB) Mag. (dB) 0 0 −10 −10 −20 −20 −80 −60 −40 −20 0 20 40 60 80 −80 −60 −40 −20 0 20 40 60 80 1 1 B D Amplitude Amplitude 0.5 0.5 0 0 −0.5 −0.5 −1 −1 −80 −60 −40 −20 0 20 40 60 80 −80 −60 −40 −20 0 20 40 60 80 Bearing Angle, θ (deg.) Bearing Angle, θ (deg.) Figure 5.16. Summed beam patterns for a simple array of N = 2 elements spaced apart by d = 1.4 cm. Combining the narrowband beams with relative phase information intact provides better results than simply summing the magnitudes. The manner in which frequencies are selected also appears to be significant. Summing linearly across frequencies 10 to 100 kHz in 1 kHz bins results in good suppression of grating lobes overall, but high side lobes (a and b). Summing logarithmically across the same frequency range produces a slightly larger main lobe, but significantly suppressed sideband between 20◦ and 40◦ (c and d). Similar to the N = 14 case, the best angular resolution achievable approaches the main lobe width of the highest frequency bin. In summary, coherent addition of narrowband beam patterns provides a sig- nificantly improved beam response over incoherent addition. In fact, for the same N = 2 array configuration, acoustic angular imaging is not possible using conven- tional phase-delay beamforming. For the angular resolving power of coherent sum- mation, the theoretical angular resolution of approximately 12◦ to 16◦ (for linear and logarithmic coherent summation, respectively) is still an order of magnitude larger than the 1.5◦ resolution demonstrated by including spectral pattern matching in the acoustic imaging process (Section 5.3). It should be restated that this form of coher- ent addition only remains valid under the assumption that relative coherence across all frequencies is maintained throughout signal transmission, propagation, reflection, and reception. 130 5.4.3 Limitations to Conventional Beamforming Comparisons The broadband Hinich approach provides a reasonable performance comparison be- tween conventional active array signal processing and bio-inspired broadband sonar. By using a decade of bandwidth, both methods outperform conventional narrowband processing for a simple two-element receiver. Furthermore, exploiting spectral infor- mation for acoustic imaging provides the significant advantage of bio-inspired broad- band sonar over current broadband sonar techniques. The results already shown in this section are representative of the current practice of frequency-domain beamform- ing in high-resolution acoustic imaging; however, some caution in interpreting these results is warranted. The most significant drawbacks of applying phase delay beamforming to the biosonar array geometry are the small aperture-to-wavelength ratio and insufficient element spacing. Given the N = 2 elements are spaced at d = 1.4 cm, the aperture- to-wavelength ratio, L/λ, is between 2.5 and 0.25. Applying conventional array de- sign techniques, the aperture-to-wavelength ratio for 1◦ of angular resolution requires L/λ ≥ 46 (Eq. 2.6). Furthermore, element spacing of d = 1.4 cm  λ/2 (≈ 0.17 cm at 100 kHz) and causes significant spatial aliasing in the form of grating lobes, which prevent the two-element receive array from achieving any directivity unless signals remain coherent across all frequencies (i.e. coherent addition). These problems pre- vent a direct comparison of the acoustic information being processed and are instead indicative that the frequency-domain approach violates the assumptions under which it was originally derived. To make a fair technical comparison between biosonar and convention, an alter- nate view of the problem is required. Grating lobes are aliased spatial images of the main lobe and only appear in the sonar field-of-view when array element spacing is designed inadequately. The ambiguities introduced with grating lobes are merely an artifact of the processing itself and are therefore not implicit in the information being extracted. If beamforming is instead implemented directly in the time domain rather 131 than the frequency domain, then spatial aliasing does not occur. In fact, despite being considered inefficient, time-domain delay-and-sum beamforming is a standard technique for sound source localization with sparse arrays [46, 47, 48]. A better comparison than the Hinich approach might be to compare the matched filter response (the cross-correlation of a transmitted signal with received signals) for 2 closely-spaced scatterers with a broadband signal. This performance bound is the criterion stated by Altes in order to address the issue of resolution in azimuth [22]. In this situation, resolution could easily be determined by finding the minimum angular spacing before the cross-correlation peaks of two closely spaced scatterers overlap and merge into one. With this approach, the wider the spacing between each sensor, the better the angular resolution that can be achieved. At some point, however, correla- tion between the receiving sensor elements will degrade and overall performance will diminish [49]. 5.5 Discussion Acoustic dispersion, or the frequency-dependent speed of sound, is notably absent from this analysis. For aerial biosonar, dispersive effects are not significant at the ultrasonic frequencies considered. In gasses such as air, dispersion only becomes rele- vant “at such high frequencies that the wavelength of the sound wave is smaller than the mean free path of the molecules” [4]. Furthermore, we have assumed isovelocity (line-of-sight) sound propagation paths. This seems reasonable since, as a gas, air is typically able to diffuse freely and create a locally homogenous environment in the region aerial biosonar would be used. Dispersion and absorption characteristics do change at high altitudes, but not near the surface where bats operate [50]. Un- derwater sound propagation is a much different scenario and these effects must be considered for naval applications (see Appendix A). Although non-linear sound prop- agation paths, significantly reduced absorption losses, and various inhomogeneities 132 persist in the underwater realm, echolocating marine mammals do not necessarily require precision imaging at several kilometers. Instead, since these animals are for- aging at close range and using short click impulses, these issues may not significantly affect the biosonar imaging process where high-resolution is needed. Environmental parameters cannot be actively controlled by any sonar system; however, changes in the environment do occur on relatively slow time scales. There- fore, a sonar system emitting hundreds or thousands of pulses per minute should be able to adapt to these changes through iterative feedback or a self-calibration pro- cedure. With current technology, this task might be well suited for adaptive linear filtering or a variety of machine learning techniques [51]. To make a familiar analogy, human vision generally requires time to readjust to changing light levels and focal depth. In biosonar, this focal adjustment period invokes the proper combination of the time-frequency waveform structure, control of respiratory exhalation, movement of the head and pinnae, and memory of prior conditions that led to a sharper image. By the same manner, we propose that echolocating animals must perform a con- tinuous self-calibration to environmental operating conditions and dynamic beam pat- terns by fine-tuning both auditory neural circuitry and motor control circuitry [52, 53]. Neural networks in the auditory system are well known for their sensitivity to single spikes, or more specifically single spike events encoded by populations of neurons [39], and also selectivity to specific waveform features in the auditory cortex [54]. In the bat’s brain, the mechanisms enabling this precision calibration likely occur on two very different time-scales. For example, long-term potentiation (LTP) and depression (LTD) would be the synaptic mechanisms responsible for forming a persistent mem- ory of the time-frequency signature of an echolocation signal for an individual [55, 56]. Short-term synaptic plasticity, on the other hand, is necessary to make the fine-scale adjustments in coincidence detection when there is an abrupt shift in signal char- acteristics caused by either the changing external environment or modified transmit or receive beam patterns. Spike-timing-dependent plasticity (STDP) [57] operates 133 on a fast enough time-scale and is sensitive enough to adjust the perturbed time- frequency signatures from pulse-to-pulse ensuring a well-focused spatial image (refer to Section 2.1.2). Observing this adaptation in bats during infantile development would be an ideal time frame to study, because it forms the neural basis for the abil- ity of an adult bat to cope with the constantly changing environment and the diverse set of echolocation operating modes. Future work in this area will include empirical analyses of how broadband spec- tral information changes in realistic aerial and underwater environments. In addition, the sensitivity of broadband information (and therefore sonar system performance) to changing environmental parameters, directivity patterns, interference from multiple scatterers, and intrinsic/extrinsic noise will be evaluated computationally. This could be accomplished through local perturbation methods by examining one variable at a time at a fixed operating point. Sensitivity, in this sense, can be computed by evalu- ating the partial derivative of the information or performance metric with respect to a single changing parameter (i.e. ∂Y /∂Xi for an output result Y and input param- eter set Xi ). To address uncertainties in a) the directivity patterns of sensors and b) the target strength of complex structured and unstructured objects, Bayesian or variance-based methods would be more appropriate given the appropriate probabilis- tic models and allows for the full exploration of the input parameter space to include interactions and nonlinear responses [58, 59, 60]. Information theoretic approaches are an appealing way to quantify the channel capacity of the acoustic environment and have already been applied to estimate the spatial information encoded by biosonar directivity patterns [25, 23]. Regardless of the method that is used to quantify the information carried in broadband echoes, the goal is to explore the fundamental lim- itations of biosonar that will lead to new and exciting solutions to acoustic imaging with micro-apertures. 134 5.6 Acknowledgments The authors would like to thank Andrew Hull (NUWC) and Dimitri Donskoy (Stevens Institute of Technology) for their helpful guidance on acoustics and transducer mod- eling. We also acknowledge Michael Medeiros, Adam Mirkin, Robert Carpenter, and Ashwin Sarma from NUWC for numerous lengthy discussions on bioacoustics and array signal processing. A Applying Biosonar Modeling to Underwater Acous- tic Imaging There are very significant differences between sound propagation in air and water. The speed of sound is approximately 4.3 times faster in water than air; cwater ≈ 1, 470 m/s at the ocean’s mean temperature (4◦ C) versus cair ≈ 344 m/s at room tempera- ture (22◦ C). Wavelengths are therefore 4.3 times shorter in water. Dispersion, non- homogeneities, and other non-linearities are also more prominent. Despite these diffi- culties, underwater animal models of biosonar (cetaceans) thrive in this environment and prove that mammalian echolocation can not only function, but exceeds some of the limitations that exist in air. The most important modification to the previous results is an updated ab- sorption model. Transducer design and target physics will scale with the changes in wavelength and acoustic impedance, but these models are not fundamentally different than in air. Therefore, this appendix serves to modify the absorption modeling for seawater by rearranging and presenting the equations in the same form. In water, absorption is also a monotonically increasing function of frequency that depends heavily upon environmental parameters [61]. Several important differ- ences from air exist, including the fact that at any particular frequency absorption is on the order of 100 times weaker per unit of distance than in air, so sound waves 135 of the same acoustic frequency will travel over much longer distances. Bottlenose dolphins, for example, are capable of detecting a sphere the size of a ping pong ball at distances of 100 meters [21]. By comparison bats have a limited echolocation range of 10–20 m and may rely upon their spatial memory for more global navigation [62]. The model equations for predicting absorption in water require temperature, T (◦ C); depth (an indirect measure of the pressure), D (m); salinity, S (ppt); and acidity, pH. Similar to the equation in air, absorption can be split into several dominant components: f 2 FrB f 2 FrM     2 α(f ) = α ˆ cr (f )f + α ˆ vib,B (f ) 2 +α ˆ vib,M (f ) 2 (5.18) f 2 + FrB f 2 + FrM where α ˆ cr is the absorption component due to classical physics, α ˆ vib,B is absorption due to the vibrational relaxation in boric acid and α ˆ vib,M is the same for magnesium sulfate (MgSO4 ). These components are further defined: αcr = 4.9e-4 × e−(T /27+D/17) , (5.19) αvib,B = 0.106 × e(pH−8)/0.56 , and (5.20)    T S αvib,M = 0.52 × 1 + e−D/6 . (5.21) 43 35 The relaxation frequencies are   12 S FrB = 0.78 eT /26 , and (5.22) 35 FrM = 42.0 eT /17 . (5.23) The resulting absorption coefficient in water may be applied to Equation 5.10 as before 136 (Figure 5.17). The overall impact of this change on T L is that spherical spreading will dominate for much farther distances out to approximately 100 m. Therefore, frequency spectra of echoes in water do not depend on range as much as they do in air. It is worth mentioning that the reference units for sound differ (re 20 µPa in air and re 1 µPa in water), but this does not impact the relative units used for absorption. Absorption in water (T=−5−35°C, D=0m, S=35ppt, pH=8) 3 10 -5◦ C 0◦ C Absorption Coefficient α (dB / km) 5◦ C 10◦ C 2 15◦ C 10 20◦ C 25◦ C 30◦ C 35◦ C 1 10 0 10 −1 10 −2 10 3 4 5 6 10 10 10 10 Frequency (Hz) Figure 5.17. Absorption coefficient in water vs. frequency at various temperatures between -5◦ C and 35◦ C, depth of 0 m, salinity of 35 ppt, and acidity of 8.0 pH. The same general trends exist for frequency-dependent absorption in water as well as air. Both environments enforce a general low-pass filter, are monotonic functions of frequency, and depend upon environmental conditions such as temperature, pressure, and molecular concentrations. One significant difference, however, is that absorption in water is three orders of magnitude lower. Therefore, sound not only travels faster in water, but also much farther for the same set of frequencies. References [1] “ANSI S1.26-1995 (R2009) Method for Calculation of the Absorption of Sound by the Atmosphere”, American National Standards Institute, New York (2009). 137 [2] H. Bass, L. Sutherland, A. Zuckerwar, D. Blackstock, and D. Hester, “Atmo- spheric absorption of sound: Further developments”, J. Acoust. Soc. Am. 97, 680–683 (1995). [3] H. E. Bass, L. C. Sutherland, and A. J. Zuckerwar, “Atmospheric absorption of sound: Update”, J. Acoust. Soc. Am. 88, 2019–2021 (1990). [4] A. B. Bhatia, Ultrasonic absorption: An introduction to the theory of sound absorption and dispersion in gases, liquids, and solids (Oxford University Press, New York) (1985). [5] R. Urick, Principles of Underwater Sound, 3rd edition (Pennsylvania Publica- tions, Los Altos, CA) (1983). [6] D. H. Johnson and D. E. Dudgeon, Array Signal Processing: Concepts and Tech- niques (Prentice Hall PTR, Upper Saddle River, NJ) (1993). [7] M. E. Bates and J. A. Simmons, “Effects of filtering of harmonics from biosonar echoes on delay acuity by big brown bats (Eptesicus fuscus)”, J. Acoust. Soc. Am. 128, 936–946 (2010). [8] S. N. Rschevkin, “A Course of Lectures on the Theory of Sound”, MacMillan, New York (1963). [9] L. Kinsler, A. Frey, A. Coppens, and J. Sanders, Fundamentals of Acoustics, 4th edition (Wiley, New York) (1999). [10] Q. Zhuang and R. M¨ uller, “Noseleaf furrows in a horseshoe bat act as resonance cavities shaping the biosonar beam”, Phys. Rev. Lett. 97, 218701 (2006). [11] T. W. Cranford, P. Krysl, and J. A. Hildebrand, “Acoustic pathways revealed: Simulated sound transmission and reception in Cuvier’s beaked whale (Ziphius cavirostris)”, Bioinspiration Biomimetics 3, 016001 (2008). [12] T. A. Mooney, M. Yamato, and B. K. Branstetter, Hearing in Cetaceans: From Natural History to Experimental Biology, volume 63, 1st edition (Elsevier Ltd.) (2012). [13] J. L. Aroyan, “Three-dimensional modeling of hearing in Delphinus delphis”, J. Acoust. Soc. Am. 110, 3305–3318 (2001). [14] M. Yamato, D. R. Ketten, J. Arruda, S. Cramer, and K. Moore, “The auditory anatomy of the minke whale (Balaenoptera acutorostrata): A potential fatty sound reception pathway in a baleen whale”, Anat. Rec. 295, 991–998 (2012). [15] R. M¨ uller and J. C. T. Hallam, “Knowledge mining for biomimetic smart antenna shapes”, Rob. Autom. Syst. 50, 131–145 (2005). [16] D. Vanderelst, F. De Mey, H. Peremans, I. Geipel, E. Kalko, and U. Firzlaff, “What noseleaves do for FM bats depends on their degree of sensorial special- ization”, PLoS ONE 5, e11893 (2010). 138 [17] J. Ma and R. M¨ uller, “A method for characterizing the biodiversity in bat pinnae as a basis for engineering analysis”, Bioinspiration Biomimetics 6, 026008 (2011). [18] N. H. Fletcher and S. Thwaites, “Obliquely truncated simple horns: Idealized models for vertebrate pinnae”, Acustica 65, 194–204 (1988). [19] P. M. Morse and K. U. Ingard, Theoretical Acoustics (Princeton University Press, New Jersey) (1986). [20] K. G. Foote, “Importance of the swimbladder in acoustic scattering by fish: A comparison of gadoid and mackerel target strengths”, J. Acoust. Soc. Am. 67, 2084–2089 (1980). [21] W. W. Au and K. J. Snyder, “Long-range target detection in open waters by an echolocating Atlantic Bottlenose dolphin (Tursiops truncatus)”, J. Acoust. Soc. Am. 68, 1077–1084 (1980). [22] R. Altes, “Angle estimation and binaural processing in animal echolocation”, J. Acoust. Soc. Am. 63, 155–173 (1978). [23] R. M¨uller, H. Lu, and J. Buck, “Sound-diffracting flap in the ear of a bat gener- ates spatial information”, Phys. Rev. Lett. 100, 108701 (2008). [24] J. Reijniers and H. Peremans, “Biomimetic sonar system performing spectrum- based localization”, IEEE Trans. Robot. 23, 1151–1159 (2007). [25] J. Reijniers, D. Vanderelst, and H. Peremans, “Morphology-induced information transfer in bat sonar”, Phys. Rev. Lett. 105, 148701 (2010). [26] D. Vanderelst, J. Reijniers, F. Schillebeeckx, and H. Peremans, “Evaluat- ing three-dimensional localisation information generated by bio-inspired in-air sonar”, IET Radar Sonar Navig. 6, 516–525 (2012). [27] J. A. Simmons, S. A. Kick, B. D. Lawrence, C. Hale, C. Bard, and B. Escudie, “Acuity of horizontal angle discrimination by the echolocating bat, Eptesicus fuscus”, J. Comp. Physiol. A 153, 321–330 (1983). [28] A. Rihaczek, Principles of High-Resolution Radar (Artech House, Norwood, MA) (1996). [29] M. E. Bates, J. A. Simmons, and T. V. Zorikov, “Bats use echo harmonic struc- ture to distinguish their targets from background clutter”, Science 333, 627–630 (2011). [30] J. A. Simmons, “The resolution of target range by echolocating bats”, J. Acoust. Soc. Am. 54, 157–173 (1973). [31] C. Moss and J. Simmons, “Acoustic image representation of a point target in the bat Eptesicus fuscus: Evidence for sensitivity to echo phase in bat sonar”, J. Acoust. Soc. Am. 93, 1553–1562 (1993). 139 [32] S. Kay, Fundamentals of Statistical Signal Processing, Volume I: Estimation The- ory (Prentice Hall PTR, Upper Saddle River, NJ) (1993). [33] A. Brand, O. Behrend, T. Marquardt, D. Mcalpine, and B. Grothe, “Precise inhibition is essential for microsecond interaural time difference coding”, Nature 417, 543–547 (2002). [34] F. Schillebeeckx and H. Peremans, “Biomimetic sonar: 3D-localization of multi- ple reflectors”, in IEEE/RSJ International Conference on Intelligent Robots and Systems, 3079–3084 (2010). [35] R. Kuc, “Biomimetic sonar locates and recognizes objects”, J. Ocean. Eng., IEEE 22, 616–624 (1997). [36] R. P. Feynman, R. B. Leighton, and M. Sands, The Feynman Lectures on Physics, Mainly Mechanics, Radiation, and Heat, volume 1 (Addison-Wesley, Reading, MA) (1963). [37] M. Park and R. Allen, “Pattern-matching analysis of fine echo delays by the spectrogram correlation and transformation receiver”, J. Acoust. Soc. Am. 128, 1490–1500 (2010). [38] M. I. Sanderson, N. Neretti, N. Intrator, and J. A. Simmons, “Evaluation of an auditory model for echo delay accuracy in wideband biosonar”, J. Acoust. Soc. Am. 114, 1648–1659 (2003). [39] M. Sanderson and J. Simmons, “Selectivity for echo spectral interference and delay in the auditory cortex of the big brown bat Eptesicus fuscus”, J. Neuro- physiol. 87, 2823–2834 (2002). [40] M. Sanderson and J. Simmons, “Neural responses to overlapping FM sounds in the inferior colliculus of echolocating bats”, J. Neurophysiol. 83, 1840–1855 (2000). [41] P. Saillant, J. Simmons, S. Dear, and T. McMullen, “A computational model of echo processing and acoustic imaging in frequency-modulated echolocating bats: The spectrogram correlation and transformation receiver”, J. Acoust. Soc. Am. 94, 2691–2712 (1993). [42] L. Gao, S. Balakrishnan, W. He, Z. Yan, and R. M¨ uller, “Ear deformations give bats a physical mechanism for fast adaptation of ultrasonic beam patterns”, Phys. Rev. Lett. 107, 214301 (2011). [43] L. Jakobsen and A. Surlykke, “Vespertilionid bats control the width of their biosonar sound beam dynamically during prey pursuit”, Proc. Natl. Acad. Sci. U.S.A. 107, 13930–13935 (2010). [44] M. Warnecke, M. E. Bates, V. Flores, and J. A. Simmons, “Spatial release from simultaneous echo masking in bat sonar”, J. Acoust. Soc. Am. 135, 1–9 (2014). [45] M. J. Hinich, “Processing spatially aliased arrays”, J. Acoust. Soc. Am. 64, 792–794 (1978). 140 [46] M. Gillette and H. Silverman, “A linear closed-form algorithm for source lo- calization from time-differences of arrival”, IEEE Signal Process. Lett. 15, 1–4 (2008). [47] M. Brandstein, J. Adcock, and H. Silverman, “Microphone-array localization error estimation with application to sensor placement”, J. Acoust. Soc. Am. 99, 3807–3816 (1996). [48] H. Do, H. Silverman, and Y. Yu, “A real-time SRP-PHAT source location im- plementation using stochastic region contraction (SRC) on a large-aperture mi- crophone array”, IEEE ICASSP 2007 Proc. 1, I–121–I–124 (2007). [49] J. F. Lynch, T. F. Duda, and J. A. Colosi, “Acoustical Horizontal Array Coher- ence Lengths and the Carey Number”, Acoustics Today 10, 10–17 (2014). [50] H. E. Bass, C. H. Hetzer, and R. Raspet, “On the speed of sound in the atmo- sphere as a function of altitude and frequency”, J. Geophys. Res. 112, D15110 (2007). [51] S. S. Haykin, Neural Networks and Learning Machines (Prentice Hall, Upper Saddle River, NJ) (2009). [52] M. Wehr and A. M. Zador, “Balanced inhibition underlies tuning and sharpens spike timing in auditory cortex”, Nature 426, 442–446 (2003). [53] W. M. Masters, A. J. Moffat, and J. A. Simmons, “Sonar tracking of horizontally moving targets by the big brown bat Eptesicus fuscus”, Science 228, 1331–1333 (1985). [54] C.-Q. Ye, M.-M. Poo, Y. Dan, and X.-H. Zhang, “Synaptic mechanisms of direc- tion selectivity in primary auditory cortex”, J. Neurosci. 30, 1861–1868 (2010). [55] P. Dayan and L. Abbott, Theoretical Neuroscience: Computational and Mathe- matical Modeling of Neural Systems (MIT Press, Cambridge, MA) (2001). [56] E. L. Bienenstock, L. N. Cooper, and P. W. Munro, “Theory for the development of neuron selectivity: Orientation specificity and binocular interaction in visual cortex”, J. Neurosci. 2, 32–48 (1982). [57] S. Song, K. Miller, and L. Abbott, “Competitive Hebbian learning through spike- timing-dependent synaptic plasticity”, Nature Neuroscience 3, 919–926 (2000). [58] A. Saltelli, P. Annoni, I. Azzini, F. Campolongo, M. Ratto, and S. Tarantola, “Variance based sensitivity analysis of model output. Design and estimator for the total sensitivity index”, Comput. Phys. Commun. 181, 259–270 (2010). [59] A. Saltelli, K. Chan, and E. M. Scott, Sensitivity Analysis (John Wiley & Sons, New York, NY) (2000). [60] K. Chan, A. Saltelli, and S. Tarantola, “Sensitivity analysis of model output: Variance-based methods make the difference”, in IEEE WSC 1997, 261–268 (IEEE Computer Society) (1997). 141 [61] M. A. Ainslie and J. G. McColm, “A simplified formula for viscous and chemical absorption in sea water”, J. Acoust. Soc. Am. 103, 1671–1672 (1998). [62] J. R. Barchi, J. M. Knowles, and J. A. Simmons, “Spatial memory and stereotypy of flight paths by big brown bats in cluttered surroundings”, J. Exp. Biol. 216, 1053–1063 (2013). 142 Chapter 6 Discussion, Applications, Future Directions, and Concluding Re- marks 6.1 Discussion The research objectives of this dissertation were two-fold: 1) To improve our under- standing of biosonar from an engineering perspective, and 2) to apply this perspective toward the development of a bio-inspired broadband sonar system. To address the first objective, Chapters 3 and 4 developed an advanced set of methods and tools for studying bat echolocation. The second objective was realized in Chapter 5 through computational modeling and simulation of broadband acoustic information that is relevant to mammalian echolocation. Specifically, Chapter 3 addresses the need for new high-resolution time-frequency techniques in the field of bio-acoustics. The bat’s auditory system itself encodes an acoustic time-frequency representation (TFR) that is not hindered by the presence of multiple harmonic components; however, many existing techniques we would consider to have “high resolution” in fact suffer from cross-component interference or smearing of energy across the time-frequency plane. These multi-harmonic waveforms highlight many of the difficulties faced in existing TFR techniques and is one reason why the short multi-harmonic pulses emitted by echolocating bats are frequently used as a 143 gold standard for any new TFR or transform. The new approach in Chapter 3 ex- ploits the recently developed fractional Fourier transform (FrFT) in order to separate multiple harmonic components. A combination of signal processing ideas, such as empirical mode decomposition (EMD) and Hilbert spectral analysis, are applied to the individual components for the detailed analysis of biosonar signals. The developments in Chapter 3 were based on the original idea of analyzing multi-component signals using the EMD due to its efficacy in analyzing signals from other non-linear systems. Preliminary analysis of bat echolocation signals revealed peculiarities in the EMD results, because the decomposed energy distributions con- tained information from two separate harmonic components. To address this issue, the ensemble EMD was used to process the signals without switching between harmon- ics. The reliability of estimating instantaneous frequency using the FrFT also posed a serious problem. New image processing algorithms were written that exploited the ridge structure found in each harmonic in the rotation-fraction plane. Along with these new methods, their utility was demonstrated by the large databases of signals that were successfully processed. The aggregate of these research contributions were published in the Journal of the Acoustical Society of America. Chapter 4 documents “Arrayzilla” – a high-density reconfigurable microphone array for conducting studies of bats’ transmit beam patterns during echolocation tasks. A considerable amount of thought and attention to detail went into the ar- ray design, including everything from the original concept, through the electronics design phase, to the many acoustic considerations. The mechanical framing was designed in collaboration with SEEMS, LLC (Warwick, RI). Construction, testing, and verification of the array required a substantial amount of assistance from many graduate and undergraduate students in our lab. Alongside this hardware appara- tus is a multi-element beam reconstruction method that utilizes the time-frequency analysis techniques in Chapter 3 and includes new processing methods for correcting frequency dependent transmission losses and microphone variability. The result of 144 synchronously sampling the acoustic space with such high-fidelity and reconstructing bats’ consecutive transmit beam patterns is unprecedented. Demonstrating the use- fulness of this new measurement tool for studying bat echolocation required training several animals to perform echolocation experiments in front of the array. This diffi- cult and time consuming task and was led by Dr. Laura Kloepper and Michaela War- necke. In summary, “Arrayzilla” has proven to be a novel approach to the problem of bioacoustic beam pattern analysis and will remain an invaluable tool for conducting echolocation studies in our lab. In Chapter 5, a numerical model of the physical acoustics is presented to understand the rich set of information available in broadband acoustic pulses and echoes. Through computational modeling and acoustic simulation, this study quan- tifies the performance achievable by a bio-inspired broadband sonar system. The first development reformulates localization of broadband echoes as a spectral pattern matching problem. This alternative description as an echo classifier allows a simpli- fied spectral distance metric to discriminate between echoes arriving from any spatial direction. Other studies have used the Cramer-Rao lower bound (CRLB) [1, 2] or information theory [3] to quantify localization performance, but no previous study has been found that sufficiently addresses the problem of how angular resolution is achieved with broad beam patterns. Adequate resolution is required for acoustic imaging, by definition. Several hypotheses are made regarding how bats resolve mul- tiple closely spaced echoes in angle, but this remains an open question. In addition to characterizing the resolving power of this new imaging approach, this study is the first known to examine sensitivity of broadband spectral localization to changing environmental parameters; critical to the success of any practical bio-inspired broad- band sonar system. Furthermore, the issue of frequency-dependent target strength is not well studied in the literature. The work in Chapter 5 also provides a theoretical examination of broadband target strength for a variety of objects. Together, these developments show the feasibility of constructing and optimizing a micro-aperture 145 broadband sonar system for potential future applications in air and underwater. 6.2 Applications 6.2.1 Multi-Component Signals and Time-Frequency Analysis The multi-component analysis technique described in Chapter 3 has been applied to a variety of bio-acoustic analysis problems. First and foremost, the beam pattern reconstruction method in Chapter 4 relies upon this technique to automate the signal analysis of each emitted echolocation call across hundreds of channels. Despite the variability of emitted signals within and between individual bats, the technique proved to be highly reliable and adaptable to the tens of thousands of signals captured. Multi-component analysis has also been used in investigations into the unique timing patterns of strobe groups emitted by echolocating bats [4]. Several months of data were recorded and collected during a target detection task from a stationary platform. The harmonic components emitted by the bats were isolated and functions for instantaneous frequency and amplitude were extracted. Statistical analyses of strobe group timing (e.g. inter-pulse interval (IPI)) in relation to signal parameters such as call duration, instantaneous energy, frequency span, etc. was based on the information made available through multi-component analysis. Although the multi-component analysis techniques developed in this disserta- tion were originally developed for the signals of Eptesicus fuscus and related species of bats, the approach is applicable to a host of alternative signal types beyond echolo- cation or even acoustics. Neurophysiological data were collected from E. fuscus using an electrode inserted into the cochlear nucleus [5]. Linear FM pulses were played through a loudspeaker directed at the anesthetized animal, received through the ex- ternal ears, and transduced into electrical signals by the piezoelectric effect of the cochlea’s inner hair cells. The signal received at the electrode was proportional to the 146 amplitude of the acoustic pulse and the majority of the signal was surprisingly found to be in-phase with little distortion effects or group delay as might be expected for a mechanical traveling wave. 6.2.2 Beam Pattern Measurement Instrumentation and Techniques A large database of in-flight echolocation calls and post-processed flight tracks has been aggregated throughout the course of many different obstacle avoidance experi- ments in clutter. These experimental data were collected in a controlled flight-room in our laboratory using ultrasonic audio instrumentation and thermal infrared stereo- scopic cameras. The beam reconstruction methods developed in Chapter 4 are provid- ing new uses for these data, namely the estimation of in-flight beam patterns using the sparse array of 24 wall-mounted microphones (the original microphone preamplifier circuit boards that were later redesigned for use on the large reconfigurable array). This method has already been used by several colleagues to estimate beam width of new flight-room data [6]. One of several big brown bats, E. fuscus, were released at the start of a long corridor of hanging plastic chains, which served as dense acoustic clutter. Animals were localized and tracked using their emitted acoustic pulses as they flew through the corridor. The width of the narrow channel changed to one of four settings for each experimental trial: 40, 70, 100, and 140 cm. The beam measurement methods were used to determine if and how the beam shape changed as it progressed through the dense chain array for each corridor width. These data show that transmitted beam widths remained relatively fixed throughout various clutter conditions; however, the total amplitude of the emitted beams were reduced by several decibels at the most difficult setting. This reduction in signal intensity could potentially be an active strategy for reducing the levels of received reverberation backscatter. These pilot experimental results are to be followed up with a more rigorous study using “Arrayzilla.” 147 6.3 Future Directions 6.3.1 Time-Frequency Analysis of Bio-Acoustic Signals There are many different ways to employ the FrFT to extract time-frequency infor- mation. Capus et. al developed a short-time FrFT whereby a sliding window was used in a similar manner to the STFT [7]. The principle idea was to recreate a higher-resolution TFR by finding an optimal FrFT rotation angle, α, at each time instant and displaying |F rF T (α, t)|2 for that column. This works well for FM mono- component signals; however, multi-component signals are not guaranteed to have one optimal rotation for each point in time. Furthermore, the use of a sliding time window spreads energy over time and may not reliably predict instantaneous amplitude. A logical extension to the FrFT-based technique presented in Chapter 3 is cur- rently under development to overcome these limitations. An implicit assumption is made that each component is an approximately linear FM within a short time win- dow. Rather than constructing a global rotation-fraction plane for the entire signal, short time windows are applied in the usual way to analyze overlapping segments of the signal. For each time segment, individual components are identified using local maxima in the rotation-fraction plane. Therefore, no restriction is made on having a single optimal rotation angle, α. Instead, each component has its own unique slope and amplitude at one particular time instant. These line segments are matched to the nearest component in the neighboring time instants. In this way, an improved TFR for each component may be constructed by plotting |F rF T (α, t)|2 , or alternatively each component may be isolated using the time-variant filter method in Chapter 3. This new approach may be more appropriate for cases where the signals and number of components are unknown a priori, because fewer parameters are required to automate the process. A potential downside to this approach is that it is com- putationally more expensive and may preclude real-time operation until computing speed improves or more efficient FrFT methods are developed. 148 The field of time-frequency analysis is considered to be fairly mature; however, currently no single method exists that can simultaneously overcome all of the problems associated with existing methods. Researchers do recognize this shortfall and biosonar signals such as the bat’s echolocation pulse will remain the gold standard with which to compare performance. Looking to the mammalian brain, and the auditory system in particular, will inevitably inspire new ideas and approaches on this front. 6.3.2 Acoustic Measurement and Visualization of the Multi-Dimensional Sound Field Research with the large microphone array is ongoing and future experiments will likely dictate new requirements for the array hardware and signal processing algorithms. This may include non-uniform arrangements, integration with other tools, or increased acoustic sensitivity for other species. Fortunately, the array design was intended to be modular and innovations with micro electro-mechanical systems (MEMS) sensor technology can be taken advantage of by upgrading the microphone circuit boards with these new devices. For example, the current generation of Knowles MEMS ultrasonic microphones (SPU0410LR5H, Knowles Acoustics, Itasca, IL) introduced a back-mounted “zero-height” packages where the devices are mounted on the non- acoustic side of the board and acoustic sensing is performed through a drilled out hole in the printed circuit board. Drilling with a sufficiently small diameter guarantees an omni-directional beam pattern in the front hemisphere since there are no protrusions obstructing the sound path at steep angles. Aside from acoustic directivity, MEMS microphones will continue to improve their frequency response to the point that little or no correction is necessary. Furthermore, it seems feasible that coupling MEMS accelerometers with the existing ultrasonic pressure sensors could result in new types of vector sensors – opening a new door in bio-acoustics research. Beyond any future improvements to the microphone hardware and the beam reconstruction algorithms, it is our hope that “Arrayzilla” will have broader impact. 149 This high-density, modular approach to sensing and visualizing the multi-dimensional sound field may inspire other researchers to create their own innovative measurement systems. Some common scientific applications that could benefit from these ideas include beam pattern measurements of underwater marine mammals, conference room acoustics for future-generation business communications, and acoustic holography measurements of machinery noise in the near-field. 6.3.3 Bio-Inspired Broadband Sonar for Micro-Aperture Imaging Toward the development of a bio-inspired broadband sonar system, the physical prin- ciples and ideas developed in Chapter 5 need to be demonstrated experimentally in a real acoustic environment. Before finding some optimal method to extract broadband information for acoustic imaging, we need to better understand how the system might fail to produce accurate or reasonable results. The potential difficulty with uncer- tainties in environmental parameters is addressed to some extent through sensitivity analysis in Chapter 5. A more serious problem is that for this system to be imple- mented successfully, it must appropriately handle multiple concurrent echoes. We have addressed the problem of mutual-interference by two scatterers, but operating in real environments warrants a more statistical view of the problem. For example, is there a probabilistic pattern in the pulse-to-pulse spectral interference of multiple scatterers that can be used in deconvolving the echoes? The solutions to these prob- lems lie in further understanding the neural networks of the bat’s auditory system. Echolocating animals have proven that this method of broadband imaging does work and that solutions to these difficulties already exist. It is only a matter of finding how biosonar solves the problems and what we might do to replicate and improve upon their solutions. This mode of acoustic sensing represents a significant departure from what is current practice in array signal processing, whereby improving resolution traditionally requires higher operating frequencies or larger array apertures. The implication for 150 demonstrating a bio-inspired broadband sonar is that array hardware only requires a handful of sensing elements and supporting electronics which results in a system that is orders of magnitude more compact. The bio-inspired approach to acoustic sensing effectively transfers the complexity of acoustic imaging from the physical hardware into a signal processing domain that continues to become smaller, faster, and more affordable with time. A surprising result of the modeling work in Chapter 5 was that broadband acoustic imaging does not necessarily require the complex acoustic baffle structures found in many echolocating animals; simple piezoelectric elements may meet perfor- mance objectives by properly designing and orientating beam patterns. This finding may ease the transition of bio-inspired sonar techniques into the realm of man-made sonar imaging and sensing systems. 6.4 Concluding Remarks The research contained in this dissertation represents a bottom-up approach to biomimetic design. Many engineers and physicists attempt to force existing signal processing so- lutions onto an explanation of how bats achieve bio-acoustic imaging. The bat’s sonar imaging process is a complex system that requires a thorough examination of what salient information survives the acoustic-to-neural transduction from sound emission to reception and interpretation. There exists a vast body of research that was lever- aged to understand the fundamental mechanisms of biosonar. Evidence in the form of neurophysiological studies of the bat’s brain and empirical data from behavioral experiments serve to piece together the echolocation story. As these individual pieces come together they will lead to a new unified approach to acoustic imaging. Despite decades of research in this area, there are still many important ques- tions about biosonar that remain unanswered. For example, as mentioned in Chap- ter 3, how does a slight perturbation of the time-frequency structure disambiguate 151 the barrage of overlapping echoes when all the signals and echoes remain highly cor- related [8]? Another mystery lies in the dynamics of beam patterns. We are now developing an understanding of the interplay between broadband beam patterns and complex targets, but how does a rapidly changing beam pattern assist in forming images [9]? Even more intriguing is the extreme tolerance to interference and jam- ming [10, 11]. How can bats fly in extremely dense clutter within close proximity to tens or even hundreds of other bats, all simultaneously operating with nearly identi- cal sonar signals? The answers to these questions lie in part with future technologies for imaging the bat’s brain, but perhaps most importantly in the creativity of future behavioral and neurophysiological experiments with the animals. A more generalized question is how does a bat’s neural information processing differ from their underwater echolocating mammalian counterparts. As stated by Whitlow Au in “The Sonar of Dolphins” [12, Ch. 11]: There are many obvious differences—in fact, hardly any similarities— between bats and dolphins in general. [...] A seemingly endless list of differences between the two classes of animals can be compiled, compared with only a few similarities, among those being that both are mammals and echolocators. Although a common and important sonar function of both animals involves the capture of prey, there are large differences in the physical characteristics and behavior of prey types as well as in the environment they inhabit. Therefore it would not be surprising to find vast differences in the functioning, characteristics, and capabilities of the two sonar systems. Despite the obvious physical differences between these classes of animals and their respective environments, all mammals have the same basic organization of neural structures in the brain. Specializations do exist and are quite frequently found across species; however, the fundamental physics of acoustic wave propagation does not drastically change across the air-water boundary aside from the differences mentioned in Chapter 5. With this point of view, the broadband acoustic information available to all echolocators looks remarkably alike. Therefore, how this broadband acoustic information is used by these different classes of echolocators may be more similar 152 than we ever realized. References [1] R. M¨uller, H. Lu, and J. Buck, “Sound-diffracting flap in the ear of a bat gener- ates spatial information”, Phys. Rev. Lett. 100, 108701 (2008). [2] R. Altes, “Angle estimation and binaural processing in animal echolocation”, J. Acoust. Soc. Am. 63, 155–173 (1978). [3] D. Vanderelst, J. Reijniers, F. Schillebeeckx, and H. Peremans, “Evaluat- ing three-dimensional localisation information generated by bio-inspired in-air sonar”, IET Radar Sonar Navig. 6, 516–525 (2012). [4] L. N. Kloepper, J. E. Gaudette, J. R. Buck, and J. A. Simmons, “Influence of mouth opening and gape angle on the transmitted signals of big brown bats, Eptesicus fuscus”, J. Acoust. Soc. Am. in prep. (2014). [5] J. Knowles, J. A. Simmons, J. Barchi, J. E. Gaudette, S. S. Horowitz, and A. M. Simmons, “Cochlear processing in biosonar: Modeling sound transduction and the cochlear microphonic in echolocating bats”, in Society for Neuroscience, 1–1 (Washington, DC) (2011). [6] I. Matsuo, A. R. Wheeler, L. N. Kloepper, J. E. Gaudette, and J. A. Simmons, “3D acoustic tracking of bats in clutter environments from microphone arrays”, in Acoustics, 1–1 (Tokyo, Japan) (2013). [7] C. Capus and K. Brown, “Short-time fractional Fourier methods for the time- frequency representation of chirp signals”, J. Acoust. Soc. Am. 113, 3253–3263 (2003). [8] S. Hiryu, M. E. Bates, J. A. Simmons, and H. Riquimaroux, “FM echolocating bats shift frequencies to avoid broadcast-echo ambiguity in clutter”, Proc. Natl. Acad. Sci. U.S.A. 107, 7048–7053 (2010). [9] L. Gao, S. Balakrishnan, W. He, Z. Yan, and R. M¨ uller, “Ear deformations give bats a physical mechanism for fast adaptation of ultrasonic beam patterns”, Phys. Rev. Lett. 107, 214301 (2011). [10] M. E. Bates, S. A. Stamper, and J. A. Simmons, “Jamming avoidance response of big brown bats in target detection”, J. Exp. Biol. 211, 106–113 (2008). [11] M. Warnecke, M. E. Bates, V. Flores, and J. A. Simmons, “Spatial release from simultaneous echo masking in bat sonar”, J. Acoust. Soc. Am. 135, 1–9 (2014). [12] W. W. Au, The Sonar of Dolphins (Springer, New York) (1993). 153 Appendix A Modeling of Precise Onset Spike Timing for Echolocation Abstract This Appendix describes a biophysical model of the echolocating bat’s auditory pe- ripheral system and cochlear nucleus to explore the timing precision of echo infor- mation within the bat’s brainstem. In particular, this study focused on the Meddis auditory model and a recurrent network of integrate-and-fire coincidence detection neurons. Many details of the bat’s primary ascending auditory system have yet to be understood; however, included here is an attempt at simulating the critical parts of the peripheral auditory stage and early neuronal transduction that confer precise timing of pulses and echoes. Results of this modeling study were presented at the Acoustical Society of America [1]. A.1 Motivation for a Biophysical Model Auditory computational models are not a new concept. On the contrary, they have been around for at least as long as computing power has been available [2, 3]. The current state-of-the-art in auditory system modeling includes detailed implementa- tions based on the biophysical processes they are trying to mimic. Detailed models of the mechanical-to-neural sound transduction now exist and can successfully account 154 for a vast array of psychoacoustic behavior [4, 5, 6, 7, 8, 9, 10]. Auditory processing in the bat is also beginning to be understood [11, 12, 13, 14, 15], but physiological work on echolocating animals does require more emphasis to understand these highly specialized systems. At present, existing computational biosonar models of bats use trivial functions for auditory neural transduction and processing. Many attempts at creating a com- putational model of bat echolocation end up looking more like a typically engineered sonar signal processing system than the biological “wetware” they sought to mimic. For instance, the head related transfer function (HRTF) and external/middle ear fre- quency shaping characteristics that are essential for localizing sound in azimuth and elevation are often neglected in lieu of using interaural timing difference (ITD) as the primary horizontal cue. Another common problem of existing echolocation models is the over-use of the classical matched-filter (a.k.a. replica correlation). Furthermore, the spike generation process is typically either removed completely, or performed ad hoc without considering the detailed biological processes. The one biological aspect of bat echolocation models that appears consistent amongst researchers is segmenting the incident sound channel into multiple frequency channels or bins using 1) a linear or non-linear filter bank, 2) the short-time Fourier transform (i.e. spectrogram), or 3) an alternative time-frequency representation commonly used in signal processing systems. There is a very good reason for these simplifications to occur. As fast as they are, the computational power of today’s digital signal processors cannot feasibly calculate all of the differential equations necessary to replicate how the brain functions on a large scale. When there are additional time restrictions on computations, calculations must be either simplified or farmed out in parallel as they are in the brain. Therefore, over-simplification of the biological processes in conventional computing is necessary to implement many signal processing auditory models. The additional demand for real-time echolocation processing will inevitably 155 require 1) an exceedingly fast microprocessor or 2) a massively parallel grid of com- putations. In either case, computations are expensive and designers must carefully trade off performance with power, size, and cost. It is therefore essential to model and understand the critical processes of bat echolocation before making any attempts to replicate the underlying processing. A.1.1 Coincidence Detection and Population Coding in the Auditory Sys- tem The auditory system consists of a typical structure amongst all mammals, non- echolocating and echolocating alike. Therefore, the differentiating factors that enable echolocation appear to lie in the precise details of the brain’s neural networks and connectivity. With the exception that a bat’s cochlea operates up to two octaves higher in frequency than humans and many other mammals, the peripheral sensory organ appears to function identically in response to auditory stimuli. We must then ask the question, why can’t a guinea pig or other mammal1 learn to echolocate? Fig- ure A.1 shows typical coincidence detection behavior of bushy cells located in the anteroventral cochlear nucleus (AVCN) of a rat. Here, timing jitter is reduced down to approximately 300 µs (less than the width of a single neural spike) using population coding techniques employed in the cochlear nucleus. Bats’ ability to discriminate differences in shape down to 2 µs delay (millimeter precision) and 20 ns in phase has been criticized as being infeasible given the amount of timing jitter still present in the auditory system, even after population coding. The existence of gap junctions in the CN of bats may be an answer to this unexplained phenomenon. As an initial examination to this possibility, I created a computational model of the well studied Bushy cells in the AVCN of mammals. Since parameters for the Meddis IHC model have not yet been determined, the guinea-pig parameters for 1 There have been rare cases where blinded humans have developed the basic ability to echolocate using clicks, similar to dolphins. One hypothesis for this surprising adaptation might be that learning to use broadband sound to echolocate at an early developmental stage for the brain causes a significant change in functional organization. Perhaps electrical synapses, or gap junctions, in these humans are retained from early development for use in echolocation. 156 Figure A.1. Action potentials recorded from a rat when presented with a low frequency sinusoidal stimulus. (left) The ANC (auditory nerve) and AVCN (bushy cell) exhibit a significant reduction in timing jitter due to populations of coincidence detection neurons. (right) Post-stimulus time histograms (PSTH) relative to stimulus peaks showing the distribution of neural spikes relative to the sinusoidal stimulus. Adopted from [16, p. 110]. medium spontaneous-rate (MSR) fibers are used as in Reijniers and Peremens [17]. If a guinea-pig is capable of echolocation, the timing information of FM sweeps produced by this model would be sufficient for sub-millisecond discrimination. Rejecting this null hypothesis will be an important step toward proving bats must have and use sharper timing information in the brain. 157 A.2 Methods The proposed model architecture is shown in Figure A.2, below. The input signal consists of synthetic generated FM sweeps comparable to the rate of bat calls. The monaural signal enters the cochlear block where it is split into N different frequency channels using a Gammatone filter-bank [9]. The basilar membrane (BM) movement at each channel is then individually passed through the IHC peripheral model [4, 5], which is described in more detail below. The output of the model is essentially a probability of firing a spike on each auditory fiber. Emulating a realistic number of auditory fibers (30:1 ANC-to-IHC ratio) is easily accomplished by generating as many random sequences as ANC on the same IHC spiking probability. Figure A.2. Proposed neural network architecture of the auditory population coding. The random spike processes generated in the peripheral system are then con- nected to the cochlear nucleus block, which consists of a line array of M leaky integrate-and-fire (IaF) bushy cells. These cells are innervated by a tonotopic distri- bution of auditory fibers, overlapping in input frequency by an unknown amount. The neurons can be connected in a strictly feedforward manner or using any configuration 158 of recurrent topology for experimentation with the model. Here, it is assumed that there are approximately 200-500 of such cells in the bat’s AVCN. The output from each IaF neuron consists of a discrete spike train representing a reduction of spike timing jitter. A.2.1 Peripheral System A.2.1.1 Outer and Middle Ear The sound waves passing through the outer ear (OE) and middle ear (ME) are shaped by both, mechanical damping and resonance. Since this system is entirely mechanical, we use a linear time-invariant filter to model the band-pass frequency shaping effects. This model uses the following set of cascaded band-pass filter responses for the OE and ME model, following Reijniers and Peremens [17]: • 2nd order filter, f3dB = (4 kHz – 80 kHz) • 3rd order filter, f3dB = (700 Hz – 100 kHz) A.2.1.2 Cochlea and Basilar Membrane Mechanical vibration of sound projected onto the basilar membrane (BM) is modeled using a gammatone linear filter-bank. The output from the filter-bank simulates the vibration of the BM at each of the N inner hair cells. In most mammals, N is approximately 1,000 cells, but has never been quantified in the bat. The gammatone filter type has been shown to provide a fairly accurate tuning curve for the inner hair cells (IHC) for many mammals [9]. Computational limitations prevent simulation of the thousands of hair cells in the cochlea, so the number of channels is limited to approximately 100. The bandwidth of each IHC overlaps the nearest neighbors such that a subset of channels (10 here) can be substituted with little loss of information [17]. 159 A.2.1.3 Meddis Auditory Peripheral Model The Meddis [4, 5] auditory model of basilar membrane movement to inner hair cell (IHC) neurotransmitter release is used for the first stage of neural spike generation. There are 3 differential equations and 9 parameters that define the dynamical be- havior. Modifying these parameters, it is straightforward to create high, medium, and low spontaneous-rate (HSR, MSR, and LSR) nerve fibers as demonstrated by Sumner et. al. [7, 18]. The output of each IHC frequency channel passed through the Meddis model is the available amount of neurotransmitter between the IHC and auditory nerve cell (ANC) synaptic cleft. Using a uniform random process, U(0, 1), we can create the spikes given a probability of spike occurrence dependent upon past and present stimuli. The time-varying movement of the BM, S(t), is the stimulus that directly affects the membrane permeability, k(t), in a nonlinear manner by applying the equation  g[S(t)+A]  S(t)+A+B for S(t) + A > 0 k(t) = (A.1)  0 otherwise Parameters g, A, and B can be set to increase or decrease this non-linear compression to achieve various dynamic ranges, maximum rate, and spontaneous spiking rates. Figure A.3. Block diagram of the Meddis IHC model. Adapted from [4, 5] From Figure A.3 we see that IHC neurotransmitter (i.e. glutamate) is man- ufactured by the factory on demand until full. The neurotransmitter available for 160 release, q(t), is transferred through the cell membrane into a synaptic cleft at a rate proportional to the cell permeability, k(t). q(t) is defined by the differential equation dq = y(M − q(t)) + x · w(t) − k(t) · q(t) (A.2) dt where M is the maximum amount of neurotransmitter available for uptake at any given time, y is the neurotransmitter production constant, w(t) is the reprocessing store for released neurotransmitter, and x is the constant that determines release rate of reprocessed neurotransmitter. The amount of neurotransmitter in the reprocessing store is also a differential equation defined as dw = r · c(t) − x · w(t) (A.3) dt where r is the rate constant of neurotransmitter re-uptake and c(t) is the total amount of neurotransmitter in the synaptic cleft. Lastly, c(t) is defined as dc = k(t) · q(t) − l · c(t) − r · c(t) (A.4) dt and the only new parameter is l, which is the amount of neurotransmitter lost. The intuition behind the Meddis model is that some transmitter is eventually lost into the surrounding fluid and the remaining amount is reabsorbed for repro- cessing. The reprocessing store acts as a temporary cache that releases broken down neurotransmitter molecules at a rate, x · w(t), so that it reflects the longer time delay before transmitter can be reused. The probability of firing a spike is directly propor- tional to the amount of neurotransmitter currently in the synaptic cleft, c(t). h is the number of vesicles in the synapse and mathematically is just a probability multiplier. A.2.1.4 Spike Refractory Equations Sumner et al. [7, 18] modeled the refractoriness in the inner hair cells by modifying the probability of an action potential by setting 161   1 − cτ e−(t−tl −RA )/sτ for (t − tl ) ≥ RA p(t) = . (A.5)  0 otherwise RA is an absolute refractory period, cτ is the maximum amount of relative refrac- toriness, and sτ is the exponentially decaying time constant for refractoriness. In addition, t is the current time instant and tl is the time of the last spike. This model replicates this method using the same parameters as in Sumner (2002), with RA = 0.75 ms, cτ = 0.55, and sτ = 0.8 ms. A.2.2 Cochlear Nucleus A.2.2.1 Leaky IaF Model Due to the nature of the frequency modulated (FM) sweeps used by the bat, it would be impossible to implement a classical feedforward or recurrent firing-rate neural network model [19] with neurons producing an average of only 1.2 spikes per echo [15]. Therefore, a model is required that accounts for a reasonable amount of accuracy in the neuronal spiking process. A good trade-off between model accuracy and complexity is the well known leaky IaF neuron model [19]. This model tracks the sub-threshold membrane potential of neurons by inte- grating the excitatory and inhibitory post-synaptic potentials (EPSPs and IPSPs) over time, and triggers a spike when the membrane potential reaches a predefined threshold. After a spike occurs, the membrane potential is set to an optional reset level below resting potential. This model requires three differential equations and two update equations. The membrane potential, V , is updated with an exponential decay time constant of τm as dV τm = Vrest − V + gex (t)(Eex − V ) + gin (t)(Ein − V ) (A.6) dt where Vrest is the inactive resting potential, Eex and Ein are the excitatory and in- 162 hibitory presynaptic potentials, and gex and gin are the excitatory and inhibitory synaptic conductances. The conductance is further defined by the simple differential equations dgex τex = −gex (A.7) dt dgin τin = −gin (A.8) dt and gex → gex (t) + ∆gex (A.9) gin → gin (t) + ∆gin (A.10) which relax exponentially with time constants τex and τin . Through this set of differ- ential equations, excitatory and inhibitory presynaptic events will trigger a temporary increase of the two independent synaptic conductances. This model expanded the LiF equations to include recurrent synaptic connec- tions between neurons in the same stage. This required adding an additional excita- tory synaptic transconductance, gre , with time constant, τre , as implemented just as in the equations for gex and gin above. When an LiF neuron fires an action poten- tial, an immediate increase in conductance is added to the recurrent synapses of all neurons within some distance, D. Matrix, R, was used to describe synaptic weights from the M neurons to each of the other M neurons. For example, Rij = 1 refers to a strong excitatory synapse from neuron j to neuron i, such that all action potentials generated by i will cause an excitatory post-synaptic potential (EPSP) in j. Likewise, Rij = −1 refers to a strong inhibitory synapse from neuron j to neuron i, such that all action potentials generated by i will cause inhibitory post-synaptic potential (IPSP) in j. To avoid artificial positive feedback, a self-connected neuron (i = j) should be 163 avoided. A.3 Results A.3.1 Auditory Stimuli The signals used in this model can generally consist of a synthetic or recorded bat echolocation transmission followed by 1 or 2 overlapping echoes, which constitute a “glint.” Figure A.4 shows the signals used for this particular example simulation. Note that overlapping FM waveforms exhibit spectral interference as a function of the delay (30 µs and 60 µs for the second and third signal, respectively). Figure A.4. Time series (left) and spectrogram (right) of a synthetic linear FM and 2 pairs of echoes. Note that the echoes have spectral notches introduced by the spectral interference pattern that correspond to two echoes spaced apart by only 30 and 60 µs, respectively. The cochlear model consists of the constant bandwidth gammatone filter bank. This filter bank creates a time-frequency representation that mimics how the cochlea splits up incident sound into multiple narrow-band channels. For demonstration, Fig- ure A.5 shows the magnitude and phase response of four of these filter channels. The time series signals in Figure A.6 are generated by running the signal from Figure A.4 through this filterbank. Note that the filterbank preserves the linear frequency mod- ulated (LFM) sweep rate of the synthetic chirp. 164 A.3.1.1 Meddis Auditory Peripheral Model The results from the Meddis inner hair cell (IHC) model are shown in Figure A.7 for the same signal and filter-bank shown in the last section. k(t), the membrane permeability rises sharply as a function of the signal amplitude. This allows release of neurotransmitter from the free pool, q(t), into the synaptic cleft, c(t). Lastly, neurotransmitter is reprocessed, w(t), before it is placed back into the free pool. The output generated from the Meddis model is a matrix of spike trains. This model allows for a fanout for similar channels. In Figure A.7 there are 4 frequency channels and a fanout of 10, creating 40 nerve fibers that are passed on to the cochlear nucleus stage. A.3.1.2 IaF Neurons The cochlear nucleus stage uses leaky IaF neurons with feedforward or recurrent synaptic connections. The spike generation is highly dependent on the amount and activity levels of presynaptic potentials. Parameters can be adjusted to obtain the correct amount of sensitivity for inputs, though. Figure A.9 demonstrates the case for 4 independent feedforward neurons accepting the presynaptic input shown in Figure A.8. As we can see, the spike timing is fairly accurate, but additional model Figure A.5. Magnitude and phase plot of 4 channels in a gammatone filterbank between 25kHz and 100kHz. 165 validation needs to be performed before any statistical tests can be performed on these results. The effect of recurrence on the network is largely dependent upon the amount of synaptic input from the auditory nerve stage. If the neurons are firing regularly due to synaptic bombardment, then recurrent synapses will not have a large effect. However, when the synaptic input is low enough to cause irregular spiking patterns, then recurrent connections can cause spikes to fire that would otherwise depress (see Figure A.10 for an example of this). A.3.1.3 Integration with BiSCAT BiSCAT is a biosonar simulation tool developed primarily in MATLAB. Its intended use is to quickly and efficiently compare various monaural and binaural auditory processing models and parameters at various stages in the auditory processing stages. Figure A.11 shows the layout of three panels of the GUI. A.4 Discussion The auditory system is one of the few places in the brain where the role of precise timing is definitive. Acoustic waves are naturally abundant, but typically carry in- Figure A.6. Example gammatone filterbank output using the signal as shown above and generated at 4 arbitrary frequencies. (left) Entire time sequence and (right) close-up of transmission signal at 100 ms. 166 Figure A.7. Internal states of the Meddis model (k, q, c, & w) in response to the stimulus in Figure A.4. formation in relatively sparse packets. For animals to extract any useful information from sound, the auditory system must therefore respond within a very short time scale. Auditory neurons, particularly in the CN, are tuned to extremely small time windows on the order of one-tenth the width of an individual spike. For these rea- sons, the auditory systems of mammals, birds, and insects are ideal animal models to Figure A.8. Pspike (left) and resulting spike train (right) for 40 LSR auditory nerve fibers 167 explore the function and dynamics of spiking neural networks. The model presented in this chapter shows an example of coincidence detection neurons based upon a detailed biophysical model of the cochlea and a subsequent net- work of IaF neurons with locally recurrent connections to adjacent frequency channels. This model is oversimplified in several ways. First, there are numerous tunable pa- rameters in the Meddis model that adjust the sensitivity of auditory nerve cells and membrane permeability. These parameters have never been measured in bats, so the model is prone to error and over-fitting. The Heil model requires less parameters than the Meddis model, yet still encodes the onset response of acoustic waves [20, 21]. This approach has been taken by Reijniers and Peremans [22]; however, this study focused on passive localization cues with a general mammalian anatomy rather than active echolocation. Another important simplification is the use of IaF neurons in the subsequent neural network layer. These models cannot account for realistic spiking neurons, because the biological neuron dynamics are absent in the basic set of differential Figure A.9. Membrane potential (and spikes) with 4 IaF neurons. 168 equations governing subthreshold behavior. By replacing the IaF network with an Izhikevich model [23, 24], a broader range of spike timing behavior can be accounted for. One of the benefits of the Izhikevich model is that, unlike the biophysical Hodgkin and Huxley model, the exact concentration and type of chemical neurotransmitters are unnecessary to know. Instead, the realistic dynamics of auditory neurons can be measured empirically using patch-clamp methods on a sub-population of neurons in the CN and elsewhere in the auditory brainstem and translated directly to the neural network in the form of generalized parameters to the reduced-order differential equations. Even without physiological measurements, the Izhikevich neural network is a more appropriate model than IaF to study neural information processing via precise onset spike timing. The interconnectivity of neurons within the CN is highly complex and the func- tion of each type of neuron has yet to be identified [25]. Although octopus and bushy cells have been shown to perform coincidence detection, the degree of connectivity within and across broader frequency bands (e.g. T-stellate and multipolar cells in the ventral CN) is unknown at this time. The morphology and associated computa- tional function becomes even more convoluted within the IC, because many different afferent and efferent fibers connect through this neural complex [26, 14]. This model Figure A.10. IaF neurons (M=4) with random, but overlapping synaptic input (N=100). (left) With a strictly feedforward network, only one neuron fires an AP. (right) In this excitatory recurrent network, all three neighboring neurons receive a strong EPSP from the first neuron, which is just enough to cause APs in two of the three cells. Simulation was performed using identical pseudo- random generator seeds. 169 assumes a very primitive form of a coincident detection network of homogenous cells. Although cells are frequency selective due to the bandpass filter bank mimicking the cochlea, no automatic training is performed on the cells’ synaptic weights. Even the most realistic neuron model must have realistic connections. This includes the degree of connectivity within and between the various cell layers, as well as an accu- rate description of dendritic delay (i.e. delay sensitive octopus cells). As techniques for probing the cell morphology improve, so too will the capability of computational neuroscience to replicate these auditory networks in simulation and silicon. References [1] J. E. Gaudette and J. A. Simmons, “Modeling of precise onset spike timing for echolocation in the big brown bat, Eptesicus fuscus”, J. Acoust. Soc. Am. 127, 1861 (2010). [2] J. C. R. Licklider, “A duplex theory of pitch perception”, Experientia 7, 128–134 (1951). [3] P. Joris, P. Smith, and T. Yin, “Coincidence detection minireview in the auditory system: 50 years after Jeffress”, Neuron 21, 1235–1238 (1998). [4] R. Meddis, “Simulation of mechanical to neural transduction in the auditory receptor”, J. Acoust. Soc. Am. 79, 702–711 (1986). [5] R. Meddis, “Simulation of auditory-neural transduction: Further studies”, J. Acoust. Soc. Am. 83, 1056–1063 (1988). [6] M. Hewitt and R. Meddis, “An evaluation of eight computer models of mam- malian inner haircell function”, J. Acoust. Soc. Am. 90, 904 (1991). [7] C. Sumner, E. Lopez-Poveda, L. O’Mard, and R. Meddis, “A revised model of the inner-hair cell and auditory-nerve complex”, J. Acoust. Soc. Am. 111, 2178–2188 (2002). [8] C. J. Sumner, R. Meddis, and I. M. Winter, “The role of auditory nerve inner- vation and dendritic filtering in shaping onset responses in the ventral cochlear nucleus”, Brain Res. 1247, 221–234 (2009). [9] E. Lopez-Poveda, “Spectral processing by the peripheral auditory system: Facts and models”, Int. Rev. Neurobiol. 70, 7–48 (2005). 170 [10] R. Meddis, “Auditory-nerve first-spike latency and auditory absolute threshold: A computer model”, J. Acoust. Soc. Am. 119, 406–417 (2006). [11] S. Haplea, E. Covey, and J. Casseday, “Frequency tuning and response latencies at three levels in the brainstem of the echolocating bat, Eptesicus fuscus”, J. Comp. Physiol. A 174, 671–683 (1994). [12] S. Dear and N. Suga, “Delay-tuned neurons in the midbrain of the big brown bat”, J. Neurophysiol. 73, 1084–1100 (1995). [13] H. L. Hawkins, T. A. McMullen, A. N. Popper, and R. R. Fay, eds., Auditory Computation, volume 6 of Springer Handbook on Auditory Research (Springer, New York) (1995). [14] M. Sanderson and J. Simmons, “Neural responses to overlapping FM sounds in the inferior colliculus of echolocating bats”, J. Neurophysiol. 83, 1840–1855 (2000). [15] M. Sanderson and J. Simmons, “Target representation of naturalistic echoloca- tion sequences in single unit responses from the inferior colliculus of big brown bats”, J. Acoust. Soc. Am. 118, 3352–3361 (2005). [16] A. R. Moller, Hearing, Anatomy, Physiology, and Disorders of the Auditory System, 2nd edition (Academic Press, Burlington, MA) (2006). [17] J. Reijniers and H. Peremans, “On population encoding and decoding of auditory information for bat echolocation”, Biol. Cybern. 102, 311–326 (2010). [18] C. Sumner, E. Lopez-Poveda, L. O’Mard, and R. Meddis, “Adaptation in a revised inner-hair cell model”, J. Acoust. Soc. Am. 113, 893–901 (2003). [19] P. Dayan and L. Abbott, Theoretical Neuroscience: Computational and Mathe- matical Modeling of Neural Systems (MIT Press, Cambridge, MA) (2001). [20] P. Heil and D. Irvine, “First-spike timing of auditory-nerve fibers and comparison with auditory cortex”, J. Neurophysiol. 78, 2438–2454 (1997). [21] P. Heil, “First-spike latency of auditory neurons revisited”, Curr. Opin. Neuro- biol. 14, 461–467 (2004). [22] B. Fontaine and H. Peremans, “Bat echolocation processing using first-spike latency coding”, Neural Networks 22, 1372–1382 (2009). [23] E. M. Izhikevich, “Hybrid spiking models”, Philos. T. Roy. Soc. A 368, 5061– 5070 (2010). [24] E. M. Izhikevich, Dynamical systems in neuroscience, the geometry of excitability and bursting (MIT Press, Cambridge, MA) (2007). [25] D. Oertel and E. Young, “What’s a cerebellar circuit doing in the auditory sys- tem?”, Trends Neurosci. 27, 104–110 (2004). [26] E. Covey and J. Casseday, “Timing in the auditory system of the bat”, Annu. Rev. Physiol. 61, 457–476 (1999). 171 Figure A.11. Layout of each of three tabbed panels in the BiSCAT GUI 172