Potential landscape perspectives on roaming: Insights on formaldehyde from geodesic paths by D. Vale Cofer-Shabica Sc. B., Brown University, 2009 A dissertation submitted in partial fulfillment of the requirements for the Degree of Doctor of Philosophy in the Department of Chemistry at Brown University Providence, Rhode Island May 2018 © Copyright 2017, 2018 by D. Vale Cofer-Shabica This dissertation by D. Vale Cofer-Shabica is accepted in its present form by the Department of Chemistry as satisfying the dissertation requirement for the degree of Doctor of Philosophy. Date Richard M. Stratt, Director Recommended to the Graduate Council Date Peter M. Weber, Reader Date Brenda M. Rubenstein, Reader Approved by the Graduate Council Date Andrew G. Campbell Dean of the Graduate School iii Vita Dylan Vale Cofer-Shabica was born in Miami, Florida, in January of 1987. In addition to understanding and helping others to understand new things, he loves to bike, to cook, and to make music—most often by singing. He aspires to leave the world a better place than he found it. Education Ph.D. Physical Chemistry Brown University, Providence, RI, 2012–2018 Sc.B. Chemical Physics Brown University, Providence, RI, 2005–2009 Publications, Talks, & Posters D. V. Cofer-Shabica, R. M. Stratt. What is special about how roaming chemical reactions traverse their potential surfaces? Differences in geodesic paths between roaming and non-roaming events. The Journal of Chemical Physics 2017, 146, 214303 J. M. Budarz, M. P. Minitti, D. V. Cofer-Shabica, B. Stankus, A. Kirrander, J. B. Hastings, P. M. Weber. Observation of femtosecond molecular dynamics via pump-probe gas phase x-ray scattering. Journal of Physics B: Atomic Molecular and Optical Physics 2016, 49 D. V. Cofer-Shabica, R. M. Stratt, The geometries of potential energy landscapes imply dynamical signatures for roaming reactions, American Chemical Society, 250th National Meeting, Boston, MA, 2015, PHYS 554 M. P. Minitti, J. M. Budarz, A. Kirrander, J. Robinson, T. J. Lane, D. Ratner, K. Saita, T. Northey, B. Stankus, V. Cofer-Shabica, J. Hastings, P. M. Weber. Toward structural iv femtosecond chemical dynamics: Imaging chemistry in space and time. Faraday Discussions 2014, 171, 81–91 D. V. Cofer-Shabica, Wandering Molecules, Brown University, Research Matters, Provi- dence, RI, 2014. Invited Talk. https://www.youtube.com/watch?v=X3xyMP9EAco Awards Brown University Department of Chemistry Providence RI Elaine Chase Award for Leadership and Service 2017 William T. King Prize for Teaching 2014 Teaching Fellow 2013, 2014, 2015 Service Brown University Providence, RI Diversity and Inclusion Action Committee 2016–2018 WE Teach STEM Discussion Group 2015–2018 Stand Up for Graduate Student Employees 2013–2017 Blackstone Academy High School Pawtucket, RI Exhibition Night Judge 2013–2018 Teaching Graduate Teaching Assistant 2012–2015 Brown University Providence, RI Math Teacher 2009–2011 Blackstone Academy Charter School Pawtucket, RI The Metropolitan Regional Career and Technical Center Providence, RI Affiliations Member of the American Chemical Society 2015–Present v Acknowledgments The completion of my doctorate marks the end of my second degree at Brown—the university for which I moved to and made a home of Providence, Rhode Island. This place is the only in which I have lived as an adult and, as such, it has shaped me greatly. More, even, than Brown or Providence, I have been shaped, mentored, loved, helped-along, and cared-for by many people. Without them I would not be writing nor would I have had the ability to complete my degree. To all those named below and to those who I do not name, I extend my great and heartfelt thanks. My work with Richard Stratt made my graduate education. He drove me to render half-baked ideas concrete and to consider carefully the linkages in the chains of my arguments. Rich shared his commitments to teaching and, I hope, passed on some of the wisdom he has accumulated about how to tell a scientific story. How rare and how lucky to have a mentor who cares as much about word-choice and flow as I. I am grateful beyond measure for all that he has taught me. All the members of Rich’s group contributed to my education—providing me with ideas and with places to try out new thoughts. I am particularly indebted to Layne Frechette for his friendship and for his ideas during the early parts of my project. The other members of my committee all played important parts in my development as scientist and teacher. Peter Weber took me in and employed me when I was wounded and wondering what I would do with myself. Brenda Rubenstein has been a much appreciated advocate and ally in the department. Matthew Zimmt listened and acted to make space for me when I told him of teaching assignments, “We can do better.” Even though he passed away the year before I completed my degree, Jimmie Doll was a force throughout my education: he taught my first course in chemistry as well as my last three. I am grateful for his encouragement and his stories and his gentleness when I vi asked him about results that were not even wrong. The staff of the Chemistry department has done so much to make my decade-plus stay productive and warm. Ken Talbot and Randy Goulet gave my bike a place in the machine shop even after I became a theorist. Rose Barreira, Sheila Quigley, and Lynn Rossi all helped me put out fires in my personal and academic life. Margaret Doll, Carol Defeciani, and David Blair shepherded our cluster of computers. Ana Maria Fortes made sure that our offices were clean. I am especially grateful for my friendship with Eric Ferrara. My relationships with all the members of Stand Up for Graduate Student Employees (SUGSE) have given me meaning and purpose during my time at Brown. Struggling with you to make Brown a better place to live and work for everyone has been an honor and privilege and taught me more about how to be in community and to work for change than I could have imagined. I want to thank Kyle Trenshaw for his constant support and friendship, and for founding and inviting me to participate in WE-Teach-STEM—a much needed space for discussing the patriarchy and other systems of domination in the academy. I learned so much by listening. Christopher Brook and Carolyn Trombi literally kept me in fighting shape and helped me hold to what I pass off as sanity in our bi-weekly Tae Kwon Do class. Your friendship has sustained me through many storms of the heart and mind. The Providence Sacred Harp singers have given me community to call home and a place to sing. My soul soars with our voices. In addition to sharing songs, Jesse Polhemus, Laura Hodges, and Mike Richards have become close friends and confidants by sharing of themselves. Tara Mulder and I loved each other; she supported me and made sure I completed my degree when I thought that I might leave. Cassie Houtz is an unexpected joy who fell into my life and made it possible to finish by feeding me bodily and emotionally. I am immensely thankful for my family; without their encouragement I would be with nothing. My parents, Nancy and Stephen Cofer-Shabica, have supported me more than I can express since I left home. Knowing that I have their confidence is one of my deepest sources of strength. Molly Shabica and Rachel Shabica are each amazing sisters who taught me much and cheered me along the way. In the last three years of vii my degree, my relationship with Betsy Cofer-Shabica, my youngest sister, grew-up and became one of mutual care. Thank you for always being ready for a phone-cry and for being a conduit to my playful and silly self. This dissertation is dedicated to my grandparents. Grandpa was an organic chemist and a story-teller who fed my early love of science and of art. Grandma taught me to bake and gave me the voice with which I sing. Granny spoiled me rotten and knew my favorite dessert even as the penultimate of thirteen grandchildren. Granddaddy was a professor of geology, whose sharp mind continues to inspire and who has encouraged me to think broadly about how to play a part in the healing of the world. Thank you. viii Contents List of Tables xii List of Figures xiii 1 Introduction 1 1.1 Organization of this document . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Roaming and formaldehyde . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Roaming observed in a broad class of systems . . . . . . . . . . 5 1.2.2 Previous theoretical efforts towards explaining roaming . . . . 7 1.3 Geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.1 An idea from condensed systems . . . . . . . . . . . . . . . . . 10 1.3.2 Why geodesics for roaming formaldehyde? . . . . . . . . . . . 11 2 Methods 13 2.1 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1.1 Formaldehyde . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1.2 The potential energy landscape ensemble . . . . . . . . . . . . 16 2.2 Geodesic paths on a molecular energy surface . . . . . . . . . . . . . 17 2.2.1 Center of mass conservation . . . . . . . . . . . . . . . . . . . 22 2.2.2 Geodesic parameters . . . . . . . . . . . . . . . . . . . . . . . 31 2.2.3 Trapped at a fixed point in 2 dimensions . . . . . . . . . . . . 34 2.3 Defining and sampling regions of interest . . . . . . . . . . . . . . . . 36 2.3.1 Monte Carlo for microcanonical reactants . . . . . . . . . . . . 37 2.3.2 Molecular dynamics for products . . . . . . . . . . . . . . . . 38 2.3.3 Monte Carlo random walk for the roaming region . . . . . . . . 39 ix 2.3.4 Geodesic itineraries . . . . . . . . . . . . . . . . . . . . . . . . 42 2.4 Analyzing geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.4.1 Total lengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.4.2 Diatomic Vibration . . . . . . . . . . . . . . . . . . . . . . . . 44 2.4.3 Diatomic Rotation . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.4.4 Boundary fraction . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.5 Estimating geodesic lengths . . . . . . . . . . . . . . . . . . . . . . . . 47 2.5.1 Total Kinematic Length . . . . . . . . . . . . . . . . . . . . . . 47 2.5.2 Vibrational Kinematic Length . . . . . . . . . . . . . . . . . . . 49 2.6 A search for saddles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.6.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.7 Numerical methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 2.7.1 With respect to integration . . . . . . . . . . . . . . . . . . . . 52 2.7.2 Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.7.3 Hessian matrices & vibrational frequencies . . . . . . . . . . . 54 2.8 Computing the entropy of a sampled distribution . . . . . . . . . . . . 58 3 Results and Discussion 60 3.1 An image of the formaldehyde potential . . . . . . . . . . . . . . . . . 61 3.2 Locating a 2nd order saddle on the Bowman energy surface . . . . . . 63 3.2.1 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.2.2 Search results . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.2.3 No “roaming saddle” to be found . . . . . . . . . . . . . . . . 66 3.3 Observations from molecular dynamics . . . . . . . . . . . . . . . . . 69 3.3.1 Bird’s nest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.3.2 Molecular dynamics branching ratios . . . . . . . . . . . . . . 74 3.3.3 An estimate of the energy-dependent incubation period . . . . 78 3.3.4 Existence of high JCO roamers . . . . . . . . . . . . . . . . . . 81 3.4 Geodesics are consistent with estimate . . . . . . . . . . . . . . . . . . 84 3.5 Geodesics are inherent dynamics . . . . . . . . . . . . . . . . . . . . 86 3.6 Observations from geodesics . . . . . . . . . . . . . . . . . . . . . . . 90 3.6.1 Implications of the bird’s nest . . . . . . . . . . . . . . . . . . 91 3.6.2 Key observations at a single energy . . . . . . . . . . . . . . . 92 x 3.6.3 The null effect of dynamical endpoint correlation . . . . . . . . 98 3.6.4 Trends with increasing landscape energy . . . . . . . . . . . . 100 3.6.5 Convergence of entropy of distributions with energy . . . . . . 108 3.7 Roaming as an entropically mediated phenomenon . . . . . . . . . . . 112 4 Concluding remarks 113 4.1 New methods, new observations . . . . . . . . . . . . . . . . . . . . . 113 4.2 Future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Bibliography 116 A Derivation of path-integral form of the equation for free diffusion 125 A.1 Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 A.2 Unconditional minimization . . . . . . . . . . . . . . . . . . . . . . . 129 B Two more ways of preserving the center of mass during escape steps 133 B.1 Swatting a fly without moving: Follow the inverse-mass-weighted gradient134 B.2 Laser-cannon: A general projection . . . . . . . . . . . . . . . . . . . 137 C Microcanonical sampling for the anharmonic oscillator 143 C.1 Normal Mode Hamiltonian . . . . . . . . . . . . . . . . . . . . . . . . 143 C.2 Construction of Normal Mode Coordinates . . . . . . . . . . . . . . . 145 C.3 Corrections for Anharmonicity and Selection of Total Angular Momentum147 D Monte Carlo sampling the landscape ensemble 150 D.1 General requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 D.2 Implementation specifics . . . . . . . . . . . . . . . . . . . . . . . . . 152 E Integrating externally-supplied FORTRAN code 157 E.1 Form of switching functions . . . . . . . . . . . . . . . . . . . . . . . 164 ? Parts of this dissertation are reproduced from D. V. Cofer-Shabica, R. M. Stratt. What is special about how roaming chemical reactions traverse their potential surfaces? Differences in geodesic paths between roaming and non-roaming events. The Journal of Chemical Physics 2017, 146, 214303 with the permission of AIP Publishing. xi List of Tables 2.1 Atomic masses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 Comparison of step sizes in geodesic algorithm . . . . . . . . . . . . . 32 2.3 Microcanonical and landscape energies . . . . . . . . . . . . . . . . . 33 2.4 Roaming region constants . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.1 Properties of stationary points on the Bowman surface . . . . . . . . . 65 3.2 Coordinates of second order saddle on Bowman’s surface . . . . . . . 67 3.3 Coordinates of global minimum on Bowman’s surface . . . . . . . . . 67 3.4 Coordinates of cis-HCOH minimum on Bowman’s surface . . . . . . . 67 3.5 Coordinates of trans-HCOH minimum on Bowman’s surface . . . . . . 68 3.6 Coordinates of transition state for cis-trans isomerization . . . . . . . . 68 3.7 Coordinates of H2 CO −−→ trans-HCOH transition state . . . . . . . . 68 3.8 Coordinates of tight transition state on Bowman’s surface . . . . . . . . 68 3.9 Coordinates of proposed roaming saddle . . . . . . . . . . . . . . . . 69 3.10 Dispositions of molecular dynamics trajectories with energy . . . . . . 75 3.11 Fraction of roaming trajectories with high JCO . . . . . . . . . . . . . . 83 3.12 Success rate of computed geodesics by channel and energy . . . . . . 91 3.13 Comparison to dynamically correlated (paired) geodesics . . . . . . . . 99 3.14 Mean total kinematic length for geodesics by channel and energy . . . 103 3.15 Portion of identically zero boundary fraction vs. energy . . . . . . . . 107 3.16 Mean boundary fraction vs. energy . . . . . . . . . . . . . . . . . . . 107 3.17 Entropy of total length distributions vs. energy . . . . . . . . . . . . . . 108 D.1 Energy-specific acceptance ratios and seed configurations . . . . . . . 156 xii List of Figures 1.1 Schematic representation of roaming . . . . . . . . . . . . . . . . . . . 3 1.2 Geodesics on a simple potential energy surface . . . . . . . . . . . . . 9 1.3 Geodesics on a simple potential energy surface . . . . . . . . . . . . . 11 2.1 Energy scales for formaldehyde energy surface . . . . . . . . . . . . . 15 2.2 The potential energy landscape ensemble . . . . . . . . . . . . . . . . 17 2.3 Free and escape steps wind around boundaries . . . . . . . . . . . . . 20 2.4 Müller-Brown potential . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.5 Fixed point on the Müller-Brown potential . . . . . . . . . . . . . . . . 36 2.6 Hessian parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.1 A projection of the potential surface . . . . . . . . . . . . . . . . . . . 62 3.2 Orientation to the H2 CO surface . . . . . . . . . . . . . . . . . . . . . 71 3.3 Six roaming trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.4 Molecular dynamics channels (reactive) . . . . . . . . . . . . . . . . . 76 3.5 Molecular dynamics channels . . . . . . . . . . . . . . . . . . . . . . 77 3.6 Rates of reaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 3.7 Failure of statistical distribution of reaction times . . . . . . . . . . . . 80 3.8 A “cold” shoulder in the JCO distribution . . . . . . . . . . . . . . . . 82 3.9 Roamers exhibiting high JCO . . . . . . . . . . . . . . . . . . . . . . . 84 3.10 Distribution of lengths over full geodesic paths . . . . . . . . . . . . . 85 3.11 Inherent dynamics pictograph . . . . . . . . . . . . . . . . . . . . . . 87 3.12 Geodesic as inherent dynamics of formaldehyde . . . . . . . . . . . . 89 3.13 Total length distributions . . . . . . . . . . . . . . . . . . . . . . . . . 94 3.14 Component length distributions . . . . . . . . . . . . . . . . . . . . . 95 xiii 3.15 Boundary fraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 3.16 Total length distributions in the high-energy limit . . . . . . . . . . . . 101 3.17 Convergence of total length distributions . . . . . . . . . . . . . . . . 102 3.18 Component length distributions in the high-energy limit . . . . . . . . 104 3.19 Boundary fraction in the high-energy limit . . . . . . . . . . . . . . . . 105 3.20 Convergence of distribution of boundary fractions . . . . . . . . . . . 106 3.21 Entropy of total length distributions . . . . . . . . . . . . . . . . . . . 110 3.22 Entropy of component length distributions . . . . . . . . . . . . . . . . 111 A.1 Discretizing a Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 B.1 A projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 xiv CHAPTER 1 INTRODUCTION Roaming is a novel reaction mechanism [6] characterized by fragments that take long excursions, far from the reacting center before returning and dissociating. This strange dynamical feature turns out to occur in a large number of systems [7] and has generated considerable experimental and theoretical effort around the detection and explanation of the phenomenon in the last decade. Formaldehyde, H2 CO—being one of the simplest systems to exhibit roaming and being the first known to do so—provides us with an attractive arena within which to understand this novel dynamical phenomenon. The geodesic formalism [8] is a tool to understand the inherent dynamics of a classical system by characterizing the most efficient pathways through the energy landscape. The inherent dynamics of a system are the dominant, most important modes of motion—instead of describing the fine details, the inherent dynamics give the overall behaviors and sequencing of events. Likewise, geodesics, rather than being concerned with the enumeration of minima and transition states between them, provide a global view of the potential surface. As theorists, one of our primary tasks is to tell stories that provide durable intuition to those who would understand or reason about the systems we study. Oftentimes, the plots of those stories are driven by experimental observations and/or insights gained through analytic equations or numeric calculations. The work presented here is another attempt to tell the story of roaming as it manifests in formaldehyde. Experiments by van Zee and co-workers [9] first hinted at a hole in our understanding of formaldehyde’s photodissociation dynamics; extant theories of reaction dynamics, such as transition 1 2 state theory [10], neither account for their observations nor enable us to reason about the roaming mechanism. We use the theoretical framework provided by geodesics to make a new window into the inherent dynamics of roaming in formaldehyde. 1.1 Organization of this document In the thesis that follows, we make a thorough investigation of roaming in the context of formaldehyde using geodesics as our primary lens. The present chapter resumes by providing background on roaming in the context of formaldehyde and other notable systems. We also survey the various theoretical thrusts towards understanding roaming in general. Moving on, we then discuss the group’s development of geodesics as a tool for understanding the inherent dynamics of condensed systems and make an argument for why geodesics in a small-molecule system might be applicable to the problem of roaming. Chapter 2 details our methods of study and analysis, beginning with the models we use and assumptions we make about formaldehyde and roaming. We then move to a description of geodesics in the context of a small molecule, discussing new wrinkles and our resolutions to them. In Chapter 2, we also describe the molecular dynamics and Monte Carlo methods used to sample regions of the formaldehyde potential. The chapter concludes with a description of several of the numerical methods used throughout our studies. In Chapter 3, we present and discuss our major results. We begin by looking through the frame of stationary points and molecular dynamics, but quickly exhaust their utility and move on to geodesics. Our discussion first takes up a single energy, as in our paper [1], and then moves to an energy-dependent investigation. Our observations lead us to conclude that the roaming mechanism is an entropically mediated phenomenon in formaldehyde. We discuss the implications of this conclusion. The final chapter, 4, is summative. We recapitulate the major findings as well as the technical developments that enable our study. We also make suggestions about future directions for research. 3 Figure 1.1: Schematic representation of roaming Roaming is a route to products that involves a fragment taking long excursions from a dissociating parent before returning and abstracting another component (often, but not exclusively, an atom). In the case of formaldehyde, roaming provides another route to close-shelled molecular products when excited near the radical dissociation threshold. A single hydrogen may “roam” up to several angstroms away from the formyl parent for several picoseconds before returning and abstracting another hydrogen: H2 CO −−→ HCO···H −−→ H2 + CO. 1.2 Roaming and formaldehyde Because of its relative simplicity, ease of handling, and relevance to fundamental reaction mechanics, formaldehyde, H2 CO, has served as a test bed for theorists and experimentalists interested in fragmentation dynamics for many years; indeed by the early 1980s, formaldehyde was already considered a well beaten horse [11]. The great amount of scrutiny to which the four-atom molecule has been subjected makes it perhaps surprising that a totally unknown reaction mechanism lay hidden amongst its dynamics1 for so long. In the mid 1990s, van Zee and co-workers presaged roaming after they observed a strange shift as a function of energy in the distribution of the product rotational states of CO when photolyzing H2 CO under collisionless conditions near 1 We will address the following point later in this chapter, but it is worth indicating now that in most of the experiments treating formaldehyde, H2 CO is excited to its first electronic state, S1 , which then non-radiatively decays to the ground state, S0 , where vibrationally hot, close-shelled products are formed. 4 the radical dissociation threshold [9]. Dissociation via a slightly energetically higher vibrational state, yielded a fraction of substantially slower CO rotation. The “cold shoulder” that appeared in the distribution of JCO product states when exciting near the radical threshold suggested to them the opening of a second channel to closed-shell molecular products. It was not until over a decade later, in a joint theoretical-experimental effort, that Townsend and colleagues [6] revealed how bizarre the origin of the low-energy bump in van Zee’s distribution was. Near the radical dissociation threshold, one of the hydrogens may be ejected from the parent, but not actually depart. This roaming hydrogen may linger several angstrom’s away for up to several picoseconds before returning and abstracting another hydrogen. Figure 1.1 shows this schematically. In departing by this route, the H2 is typically highly vibrationally excited and, as a consequence of energy conservation, the CO is left with much less internal energy. Because the excited vibronic states of CO are not heavily populated, most of the energy comes from rotation; hence the greater population in low JCO states. The authors of that first paper characterizing roaming [6] immediately recognized that the phenomenon was not at all mediated by the well-known tight transition state to molecular products2 . Aside from the great strangeness of the mechanism, a major piece of evidence was that molecular dynamics studies initiating trajectories from the transition state to molecular products revealed no evidence of roaming [13, 14]. If roaming did proceed via the transition state, surely these studies would have encountered the mechanism. A note about energy surfaces and scales: After excitation to the first excited elec- tronic state, H2 CO decomposes into either molecular (scheme 1.1) or radical products (scheme 1.3). Molecular products are formed on the electronic ground state, S0 , and radical products on the S0 and T1 (first excited triplet) states [15]. Roaming to molecular products (scheme 1.2) occurs on the S0 surface [6]. Roaming, however, is not the primary channel for the formation of molecular products in formaldehyde. Estimates range from 15% to 40% as a function of energy [16]; but, roaming remains an ill-defined 2 We refer to a “tight” transition state in the sense that it involves the formation of an activated complex with an energetic barrier [12]. 5 process and these estimates should be taken as order-of-magnitude only. Experimen- tally, the onset of roaming is somewhere between 100 cm–1 and 200 cm–1 below the the radical dissociation threshold [17]. The relevant experimental energy scales3 for processes on the S0 surface are given here: H2 CO −−→ H2 + CO (direct) 27720 cm–1 [9] (1.1) −−→ HCO···H −−→ H2 + CO (roaming) ≈ 30200 cm–1 [17] (1.2) −−→ H · + HCO · (radical) 30328 cm–1 [18] (1.3) Townsend and colleagues [6] demonstrated that roaming in formaldehyde occurs on a single ground electronic state under classical dynamics and microcanonical initial conditions with total angular momentum of 0. Therefore, even though quantum effects surely play a role in dynamics of energy transfer leading up to roaming, roaming can be understood from a purely classical perspective. This classical viewpoint will be our frame of reference through the investigation. 1.2.1 Roaming observed in a broad class of systems While roaming was first observed and characterized in formaldehyde, researchers quickly sought and found many other systems where roaming occurs [7]; within the first decade of observation, over 20 systems were reported to roam [19]. Many early investigations identified roaming by multi-component product state distributions as in formaldehyde. Out of all the systems, a collection of characteristics of roaming fragments came together: large amplitude, wandering motion on a flat region of potential and time- scale separation between that motion and the other vibrations of the system. Roaming has been reported in a wide range of unimolecular as well as bimolecular reactions. Here we briefly describe a few of the systems exhibiting roaming to give a sense of the great plurality of systems that roam. Acetaldehyde, CH3 HCO—which is simply formaldehyde with a hydrogen replaced by was a methyl group, CH3 —was a natural second place to find roaming [20]. Here the relatively massive methyl group roams away from the formyl before abstracting another 3 After this chapter, our investigation will be exclusively based on a high-accuracy, full-configuration- space, model potential for formaldehyde. The experimental values given here are connected to energies on that surface by the addition of the zero point energy; see Section 2.1.1. 6 hydrogen (though rarely a hydrogen will roam and abstract the methyl). Roaming seems to be the dominant channel for formation of molecular products in acetaldehyde and can account for as much as 80% of the flux [21]. Roaming on multiple electronic surfaces was first reported on the D0 and D1 surfaces of NO3 [22, 23, 24]. These authors also report that the NO3 −−→ NO + O2 reaction proceeds exclusively by roaming [23]. It seems that many roaming systems also feature radical-radical abstraction path- ways [7] and at least one author has speculated that wherever there exists such a barrierless route to abstraction leading to excited, closed-shell products, roaming will be present too [25]. This idea was indeed borne out in a large number of H + Cn H2n+1 alkane reactions [26]. Even though roaming can be described in purely classical terms, much speculation has been made about the quantum manifestations of roaming. There are not enough studies to address this point in generality, but the roaming mediated reaction, H + MgH −−→ Mg + H2 offers at least one finding: Just below break-up energy of the complex, the vibrational wave functions of bound states seemed to be highly delocal- ized [27]. This is not at all surprising given the great variety of roaming pathways seen in other systems [17]. “Roaming” or something like it has even been reported in the liquid phase [28]. Mereshchenko et al. claim that CHBr3 exhibits a roaming-mediated isomerization to BrHCBr – Br while solvated. In this system, “roaming” occurs on a 100 fs time-scale, with large amplitude motion and on a flat region of the potential energy surface. While 100 fs is an order of magnitude faster than roaming in most of the other systems described here, the other properties: flat potential and unpredictable wandering are certainly consistent. Roaming has been presented as a failure of transition state theory and a challenge to our understanding of reaction dynamics in general [7, 19]. Indeed roaming does present such an opportunity and has rightly inspired considerable interest in a theoreti- cal framework for understanding the mechanism—the present work is no exception. However, we would be remiss if we failed to point out that roaming is only another example in a long history of systems the dynamics of which are not well understood in terms of barriers and extrema [25, 29]. We suspect that a broader perspective on the 7 energy landscape is necessary to provide insight into the dynamics of these systems. 1.2.2 Previous theoretical efforts towards explaining roaming Previous attempts to provide a theoretical explanation for roaming can be broadly divided into two camps based on whether they take a local or global perspective on the configuration- or phase-space. Local descriptions often rely on the identification of a reaction coordinate and/or a dividing surface and then make some kind of transition state calculation to compute a rate or branching ratio. Global methods seek to apprehend the large scale (phase- or configuration-space) structures that guide the dynamics of roaming. In this section we will give a brief exposition of the major ideas advanced in each camp. “Roaming transition states” Klippenstein, Harding, and colleagues have done considerable work to advance a statistical, transition state theory of roaming [30]. They were the first group to identify a “roaming” transition state, or a saddle point resembling roaming, in formaldehyde and acetaldehyde [31]. In both cases, the saddles were located 30–70 cm–1 below the radical dissociation threshold. They present a multi-path transition state theory with transition states constructed to take into account dynamical effects as well as the features of the energy surface. Their predictions compare favorably to reduced dimension trajectories in acetaldehyde [32] at energies less than 350 cm–1 above the radical asymptote, but suffer substantially above this limit, where roaming is still prevalent [21, 33]. In the same family of theories, Andrews, Kable and Jordan [34] present a two param- eter model coupled to a variational transition state theory calculation that successfully fits roaming vs. radical branching ratios over a narrow range of energies for H2 CO, CH3 HCO, and NO3 . Unfortunately, they are not able to speak to the portion of the molecular channel arising from the tight transition state or provide much insight into how to think about the mechanism. The Bowman group has also weighed in on the subject of roaming saddles [35] and contends that, at least in the aldehydes studied, the saddles are insufficiently 8 energetically separated from the tight transition state to be treated independently in a statistical theory (this judgment is disputed [36]). They also have observed [19] that in all attempts to define intrinsic reaction coordinates [37] by way of roaming saddles [38, 39], the dynamics show massive deviations from these paths, euphemistically called “large amplitude motion” in the vicinity of the saddles. Such deviations are perhaps at the root of the terminally frustrated attempts at a statistical theory for the prediction of branching ratios. Phase space structures governing roaming dynamics In a recent series of papers [40, 41, 42, 43, 44], the Bristol-Cornell-Crete collaboration offers a rather different perspective on roaming in the language of non-linear dynamical systems theory. In a variety of model systems, the authors identify phase-space structures that guide the dynamics of roaming. In their scheme, roaming trajectories are given their bounds by phase-space dividing surface constructed by computing the normally hyperbolic invariant manifolds of the space [41]. They also identify periodic orbits— which function like attractors—and suggest that the origins of roaming behavior are in a family of periodic orbits within what they identify as the roaming region of phase-space. Of particular interest is the authors’ observation of the fractal nature of phase-space bands of initial conditions leading to different products, which suggests that the subspace leading to roaming is highly variegated [42]. The collaboration also identifies these periodic orbits and unstable manifolds in a reduced-dimensional model of formaldehyde [43]. They propose that there exists a particular periodic orbit associated with roaming that bounds all roaming (but not radical) trajectories and guides them back to molecular products [43, 44]. While the results they achieve do much to address the “whys” of roaming, their approach is unfortunately limited to two degree of freedom systems because of technical (though not insurmountable) hurdles to computing normally hyperbolic invariant manifolds in larger systems. 9 Figure 1.2: Geodesics on a simple potential energy surface In the figures above, R and P correspond respectively to the reactant and product wells of the Müller-Brown potential [45]. A steepest-descent reaction path connecting them is shown in the squares (). We also show geodesic paths, circles ( ), between the reactants and products (right) and points labeled A and B (left). In addition to showing how the geodesic path differs from the reaction path, the figure on the right also shows a geodesic in several stages of refinement using the algorithms discussed in Section 2.2. Figure modified from [8]. 1.3 Geodesics Similar in spirit to the global methods employed by the Bristol-Cornell-Crete collabora- tion, our geodesic formalism [8, 46] is a means for studying the large-scale structures of a dynamical system. We seek to understand the inherent dynamics—the most im- portant components of the dynamics, smoothed and stripped of noise. Geodesics are decidedly not concerned with the basins, transition states, or stationary points of a potential surface, but with the properties of the major thoroughfares by which a system traverses configuration space4 . Figure 1.2 shows the contrast between a geodesic path and a steepest-descent reaction path connecting the basins. We will formalize our definition of geodesics later, in Chapter 2, but looking ahead, our notion of shortest paths (geodesics) as most important paths comes from a formal 4 Thatgeodesics are constructed entirely in the space of configurations rather than the phase-space of the system is another distinction between our methods and those of the Bristol-Cornell-Crete collaboration. We are still able to pose fruitful questions of the potential surface so long as we probe the properties of paths rather than of points. 10 rendering of a solution to the Fokker-Planck equation for free diffusion as a path integral: Z R ,t " Z t  2 # f –1 dR G R0 → Rf , t = e D R(τ ) exp (1.4)    e dτ e e R0 ,0 e 4D 0 dτ e G R0 → Rf , t is the probability density for diffusing from R0 to Rf in a time t and  D R(τ ) indicates an integral over the space of all possible paths, R(τ ). D is the diffusion e  e e e constant. The insight comes from the integral term in the exponential. When diffusion e e is slow, D is small and the only paths that will contribute are those that minimize the 2 integral over dR/dτ , i.e., those that minimize the force-free action. Those same paths are also the shortest paths, or geodesics of the system. There is a quantitative e connection between the rate of diffusion in a system and the length of its geodesics; as they elongate, the dynamics slow down [8]. Geodesic theory is related conceptually and in aim to transition path theory [47], a scheme for understanding rare-events in potential systems by deriving average trajectory flows. The techniques provide a prescription for constructing, among other things, the “current” of reactive trajectories between regions of configuration space. The flow lines of such currents are filtered objects, with unimportant motions washed out, much like our geodesics and notion of inherent dynamics. In contrast, however, where the methods of transition path thory are difficult to apply to systems with many degrees of freedom, there are simple algorithms [8] for computing geodesics in systems with 2 to 1000s of degrees of freedom. 1.3.1 An idea from condensed systems The initial ideas for geodesics and our notions of most efficient paths were developed in the context of condensed systems [8, 46, 49, 50, 51, 52]. Geodesics have enabled a number of remarkable successes including quantitative prediction of the diffusion- percolation transition [8], revealing the underlying dynamics of preferential solvation in a binary mixture of competing solvents [49], and, more recently, providing insight into the glass transition in hard spheres [51]. What all of these phenomena have in common is that they manifest in systems where small variations in their energy landscapes are unimportant. Indeed, in the extreme case of hard spheres, the potential has no fine variation—it only takes on two values: 0 or ∞. 11 Figure 1.3: Geodesics on a simple potential energy surface Geodesic path (solid red) between endpoints (filled, black circles) in a simple model of a liquid. In this two-dimensional example, it is apparent how geodesics are like the shortest-path solutions to a maze. Figure modified from [48]. Even in the case of ordinary liquids at room temperature, the well-depths of most of the minima are far below the few kB T per degree of freedom (∼ 200 cm–1 ) available to the system. Rather than being mediated by gathering enough energy into a special coordinate and surmounting a barrier, these processes are driven by the system’s ability to navigate a disordered landscape (see Figure 1.3). These entropic considerations require us to probe the global structure of energy landscape instead of searching for special stationary points. 1.3.2 Why geodesics for roaming formaldehyde? Much like the perspective taken by the members of the Bristol-Cornell-Crete collabora- tion [43, 44], we are firmly of the view that understanding the peculiar dynamics of roaming will not come from the properties of stationary points on the energy surface. We also have a different goal than many of the other theoretical projects in Section 1.2.2. Rather than fiddling with branching ratios, we seek to understand the essential properties 12 of the formaldehyde energy landscape that give rise to roaming. In deploying geodesics to understand roaming, we implicitly assume that the minor variations in the potential are not central to the dynamics—indeed, we take the minima to be irrelevant. Our perspective is in stark contrast to the basin-driven lens offered by Wigner’s transitions state theory [10] and even Wales’ network-of-basins view on energy landscapes [53]. We hypothesize that such disregard for local features of the surface is not only appropriate, but essential because photolysis injects such enormous amounts of energy into the system. For many regions of the landscape, fluctuations in the potential will be small in comparison to the kinetic energy—the system will skate high above any dips or rises. Given this great energetic excess, it would be impossible to pin down the backbone of the dynamics in terms of just minima and barriers. In the particular case of roaming, the potential in the region of the landscape where roaming occurs is quite flat and the roaming hydrogen strays far from the reacting center. The global exploration of the landscape suggested by its wanderings is just the kind of problem to which geodesics are suited. Another way of distinguishing our perspective is by thinking in terms of low-energy versus high-energy limits on the energy landscape. In the low-energy limit, it makes sense to prioritize stationary points because excursions away from them are rare. In contrast, in the high-energy limit, the system’s dynamics will be driven by global connectivity, which renders the details of stationary points unimportant. In the case of roaming formaldehyde, we are clearly in the high energy limit and wide exploration of the space is called for; the holistic mode of analysis offered by geodesics, insensitive to the details of the potential, fits the problem well. In the next chapter, we develop the tools to unambiguously specify geodesics on a small-molecule potential surface. We discuss the landscape ensemble, the space within which geodesics are defined, in Section 2.1.2, and then give a description of how they are computed in Section 2.2. CHAPTER 2 METHODS In this chapter we describe our models and the assumptions that underlie them. We then proceed to a description of our algorithms, sampling procedures, and other methods. The initial sections deal with computing geodesics on a small-molecule potential energy surface. This task is related to but different from previous work done by the group in several key ways. The remainder treat the search for a saddle on such a surface, numeric methods, and a technique for estimating the entropy of an unknown distribution. 2.1 Models 2.1.1 Formaldehyde To treat roaming in the photodissociation of Formaldehyde (H2 CO), we use a single classical, ground-state potential surface. This is reasonable at the energy scales of interest because formaldehyde dissociates on the ground state after excitation to and relaxation from the first excited singlet state [15]. The potential surface we use was generously provided to us by Joel Bowman, of Emory University and is described in detail by Zhang et al. [54]. A summary of the development and features of the surface follows. The surface is global in that it covers the space of all configurations at energies from the ground state all the way up to triple fragmentation (H2 CO −−→ H + H + CO). In addition to explicitly treating molecular dissociation, the authors constructed it with an eye towards studying the abstraction 13 14 reaction H + HCO −−→ H2 + CO; this because of its importance to roaming. The surface is assembled from independently constructed fits to 6 different regions of configuration space in internal coordinates. Regions expected to be well described by single reference methods were computed with coupled clusters [CCSD(T)]. Where these were expected to fail, multi-reference, configuration-interaction (MR-CI) with complete active space, self-consistent field (CASSCF) methods were used (with 6 active orbitals and 6 active electrons). Roughly 170,000 single point energy calculations went into the fits, which were then joined by quintic switching functions1 . Fits were validated against the ab initio calculations, adjusted, and generally found to be in quite good agreement. The authors note that the surface is suitable for dynamics studies, but not for those demanding “spectroscopic accuracy” because of generally modest energetic deviations (there was, however, a large, 3.23%, one at a single stationary point) and discrepancies in computed frequencies. Likewise, the surface is sufficient for our purposes. The most important property of the surface is that it exhibits roaming; indeed it is this surface that first contributed to the discovery of roaming by Townsend and co- workers [6]. As part of our validation of the potential, we implemented the molecular dynamics scheme described in Section 2.3.2 and successfully reproduced the product distributions reported in their paper. That roaming is exhibited on this surface is a point not to be minimized or underes- timated. As our investigation targets the inherent dynamics of roaming and photodis- sociation in formaldehyde, we do not need a perfectly accurate potential. We require only one that has the essential features that give rise to roaming. Classical treatment of the Bowman surface meets this criteria. While quantum effects may play a role in the photodissociation of formaldehyde (and they certainly do guide energy flow and deposition through the electronic states), we know that roaming manifests classically on this surface and so we pursue a purely classical treatment. For the purposes of making contact with experimental energies, the zero-point energy of formaldehyde must be added to the energy of any excitation laser. Zhang and co-workers give a harmonic estimate of the zero-point energy [55] as: Ezpe = 5844 cm–1 (2.1) The potential is defined in a 6-degree-of-freedom space of inter-atomic distances, but 1 The published form deviates from that used in their source code; see Appendix E.1. 15 Figure 2.1: Energy scales for formaldehyde energy surface This scale is for the fitted energy surface used in our investigations. Plotted also is the experimental limit of detection for roaming. All energies are given in wavenumbers. Figure modified from [54]. all of our simulations (and path-finding routines) are carried out in the 12-dimensional, Cartesian lab frame. A detailed description of the way we interface with the FORTRAN code provided by the Bowman group is in Appendix E. The relevant energy scales for the problem and the energy surface are given in figure 2.1. Our investigations probed energies from 35000 cm–1 to 45000 cm–1 , corresponding to excitation wavelengths2 between 349 nm and 255 nm; see Table 2.3 for all energies used. This range starts just above the radical dissociation threshold (Erad –1 µ > 33240 cm ) allows us to compare the three channels of interest: direct dissociation to molecular products, roaming dissociation to molecular products, and radical dissociation. Roaming is not observed experimentally above 41060 cm–1 [17]. A natural system of units for the study of phenomena in the energy range of electronic excitations is that of Hartree atomic units and we use them throughout our investigation. The values of the electron’s rest mass, me , its charge, e, and the reduced Planck constant, ¯h, are all set to unity. The fundamental unit of length is the Bohr radius, a0 = 0.529 Å; 2 The upper end of our energy range is non-physical as the same electronic pathways are not accessed at this wavelength. We can, however, make useful comparison between molecular dynamics and geodesic results on this surface even at experimentally inaccessible energies. 16 Element Mass / me * H 1837.15 C 21874.66 O 29156.95 Table 2.1: Atomic masses * me : rest mass of the electron the unit of energy is the Hartree, EH = 27.211 eV = 2.1947 × 105 cm–1 ; and time passes in units of ¯h/EH = 0.0242 fs. The masses of the atoms relevant to the simulation are given in Table 2.1. 2.1.2 The potential energy landscape ensemble What we call the inherent dynamics of a system can be thought of as the system’s classical dynamics stripped of noise (e.g. high frequency vibrations or rotations) that are not essential to the motions of a system. They describe the dominant thoroughfares of configuration space—the major pathways followed during a chemical or physical transformation. In a sense, the inherent dynamics are the most efficient routes through configuration space. To understand the nature of roaming, we take formaldehyde as a model system and examine the topology and connectivity of its energy landscape. There are certainly many ways one could study the space of allowed paths, but a conceptually simple route is a binary partition of configuration space into allowed and disallowed regions: those where the high energy of photo-excitation makes the variegation of the potential irrelevant and those where the potential is so large as to be practically inaccessible. We do this formally by introducing the potential energy landscape ensemble [46]. The landscape ensemble is a statistical mechanical ensemble like the canonical or microcanonical ensembles. However, instead of a fixed temperature or energy, its thermodynamic variable of control is a fixed maximum potential energy, called the landscape energy, EL . Within this ensemble, all configurations such that the potential is less than the landscape energy, V(R) ≤ EL (2.2) e 17 Figure 2.2: The potential energy landscape ensemble The landscape energy, EL , provides the thermodynamic variable of control in the landscape ensemble. This fixed maximum potential energy plays a similar role to the microcanonical energy or the canonical temperature. In the figure, clear valleys are accessible, V(R) ≤ EL , while the shaded mountains are forbidden. Figure from [46]. e are admitted with equal probability; see Figure 2.2. Equiprobability is a result of the same entropy maximization arguments that lead to equiprobable states in the microcanon- ical ensemble. In the thermodynamic limit, the landscape ensemble gives the same configuration space probability densities any other ensemble (e.g. micro-canonical, canonical) gives. In some ways, this binary division is the simplest partitioning that one could make given a space with a scalar field on it3 . Such a divided space gives us a frame within which to locate the most efficient paths. 2.2 Geodesic paths on a molecular energy surface The group’s initial development of the ideas behind efficient paths occurred in the context of diffusing liquids [8]. Writing the solution for the Fokker-Planck equation for free diffusion as a path integral4 provides our guiding principle. Z R ,t Z t –1   f G R0 → Rf , t = e D R(τ ) exp T R(τ ) dτ (2.3)     e e R0 ,0 e 4D 0 e e 3 Rather,it is the simplest partitioning of a scalar field that leaves us with any chance of the space remaining connected. We could have also demanded V(R) = EL , but this space would have volume of zero measure and would surely be hopelessly disconnected e for all but the simplest potentials. 4 See Appendix A for a full derivation. 18 Here, G R0 → Rf , t is the probability density for diffusing from R0 to Rf in a time t  and D R(τ ) indicates an integral over the space of all possible paths, R(τ ), respecting  e e e e whatever boundary conditions we impose. T is the kinetic energy and D is the diffusion e e constant. We examined and were then motivated by eq. 2.3 with the goal of understanding the inherent dynamics or most efficient paths of systems under study. There are many ways that we could formulate a notion of most efficient path, but an expedient one comes from combining eq. 2.3 with the potential energy landscape ensemble discussed in Section 2.1.2. The landscape ensemble partitions configuration space into allowed and dis-allowed regions and provides boundary conditions for eq. 2.3. When diffusion is slow (D is small), the paths that dominate are those that minimize the force-free action in the exponential. Z t S R(τ ) = dτ T(R(τ )) (2.4)   e 0 e We require that we this functional be minimized in the space of paths respecting the landscape criterion, where V(R(τ )) ≤ EL for all τ . These same paths minimizing the action are also shortest paths or geodesics, which minimize the kinematic length (to be e introduced in Section 2.4.1): Z t `= dτ [2T]1/2 (2.5) 0 Note the dimension of T is [Energy] and so the integral in eq. 2.5 has dimension of [Length] × [Mass]1/2 . The paths that minimize the functional are therefore technically mass-weighted shortest paths. We will confront this distinction in Section 2.2.1. From the landscape perspective, the dynamics of a system are not classical paths in phase space, but much simpler objects, geodesic paths in configuration space. These geodesics provide information about the connectivity and structure of the energy landscape. This is our main result: that the inherent dynamics are given by geodesics in landscape ensemble. Casting the diffusion equation in the path-integral manner of eq. 2.3 is so powerful because it frees us from having to solve what is otherwise a horribly complicated and specific problem—diffusion over a potential is a different (possibly insoluble) differential equation for every potential. We integrate over all possible paths, composed of many 19 infinitesimally short steps. Because the potential does not vary dramatically over such short scales, we can use the equation for free diffusion. But, the impact of the potential does not go away; its presence is encoded in the, perhaps quite stringent, boundary conditions imposed by the landscape criterion, V(R) ≤ EL . The condition in eq. 2.4 has form of a non-holonomic constraint.5 Finding solutions e to problems with non-holonomic constraints is often difficult, but the form of such solutions is specified by the Kuhn-Tucker theorem [56]. The theorem holds that the desired solution will be the union of paths that unconditionally minimize the functional in eq. 2.4 and those where strict equality holds. The Kuhn-Tucker theorem points to the need for two components of an algorithm for computing geodesics: free-steps that unconditionally minimize the path length and steps that trace along the boundary of the obstacles defined by V(R) = EL contours. For an N atom system, the kinetic energy can be written in terms of the lab-frame, Cartesian coordinates for the centers of each e atom: 1 ˙T ˙ T= R MR (2.6) 2e e Where RT = (x1 , y1 , z1 , x2 , y2 , . . . , zN ) is the total configuration-space vector with R˙ its time derivative, and M is the mass matrix, given by: e e (M)iµ,jν = δi,j δµ,ν mi i, j = 1 . . . N ; µ, ν = x, y, z (2.7) With mi is the mass of the ith particle in the N-particle system. The path between a target vector, Rf , and a starting position, R0 , that unconditionally minimizes eq. 2.4 in time t for such a kinetic energy is then the straight-line path given e e by:  τ τ  R(τ ) = 1 – R + R (2.8) e t e0 t ef Discretizing this path is straight-forward6 : R – Ri Ri+1 = Ri + δs e f e i (2.9) e e |Rf – R | e e 5A holonomic constraint is one that can be expressed as an equality relationship between a constant and a function of the parameters of its domain, e.g. coordinates and time, as in: 0 = f(q1 , q2 , . . . , t). 6 Consider R(τ + ∆τ ) e 20 Figure 2.3: Free and escape steps wind around boundaries From top to bottom: A free step causes system to enter forbidden region, which it then escapes to the boundary. The repeated action of free steps into the forbidden region and escape steps out trace the boundary of the obstacle in accordance with the Kuhn-Tucker theorem. Figure from [8]. where δs is a step length, chosen to be small relative to the features of the potential7 . This is the same procedure used previously by the group [8]. After each step we check to make sure that the landscape criterion, V(R) ≤ EL , is satisfied. If it is, we accept the step. If it is not, we are in a forbidden region and must escape. The repeated combination of e the free, straight-line, steps and escape steps will cause the geodesic to wind around the obstacles; see Figure 2.3. Typically, the group has used a Newton-Raphson root search to find the nearest solution to V(R) = EL . Rn+1 = Rn – ζ ∇V Rn (2.10) e e e e e 7 See Section 2.2.2. 21 where V(Rn ) – EL ζ = e 2 (2.11) ∇V Rn e e By repeated application of eq. 2.10, we would take steps following the gradient to the edge of the V(R) = EL boundary. However, in the present case, there is an additional complication. Formaldehyde is a system that has heterogeneous masses and so moves e along the gradient (as the above, naïve Newton-Raphson step would take) will displace the center of mass. This problem had never been encountered and its exposition is treated alone in Section 2.2.1. Resolution is found in a projected Newton-Raphson root search that prevents center of mass perturbation via a uniform back-translation. Escape steps are then found by iteratively following the projected gradient until the landscape criterion is satisfied: Rn+1 = Rn – ζ (1 – M) · ∇V Rn (2.12) e e e e where ζ is defined as before in eq. 2.11 and the projection, (1 – M), is a 3N × 3N matrix, to be derived in Section 2.2.1. Geodesics between endpoints are therefore found on the formaldehyde surface by applying eq. 2.9 and, as necessary, eq. 2.12 until the system is within 2δs of the target. Optimization While the geodesics found by the deterministic portion of our algorithm satisfy the Kuhn-Tucker theorem, they are not necessarily shortest paths—the theorem establishes necessary but not sufficient conditions of the form of the result. We therefore apply a simple optimization scheme to reduce the lengths. During the optimization phase, we construct geodesics using the previously developed algorithm and so all the properties of those geodesics hold here as well. The optimization round proceeds as follows: 1. A random configuration along the R0 → Rf path is selected: Ri , not one of the end-points e e e 2. Ri is perturbed: 0 Ri = Ri ± δp (1 – M) · Eˆ e e e Where the sign is chosen randomly between + and – with equal probability, δp = δs is the length of a small, perturbative step, and Eˆ is a configuration-space 22 unit vector along a randomly chosen, atomic, lab-frame axis. (1 – M) is the same projection we use to enforce a constant center of mass (see Section 2.2.1). 0 3. If V(Ri ) > EL , the configuration is forbidden; a new Ri is then chosen and the 0 0 process repeated until V(Ri ) ≤ EL and Ri is in the allowed region. e e e e 0 0 4. Two new geodesics, R0 → Ri and Ri → Rf , are computed. e e e e 0 5. If the total R0 → Ri → Rf path is shorter than the original, the new path replaces the old; otherwise the old path is retained. e e e The above procedure is repeated many times until the path length is converged; explicit termination criteria are given in Section 2.2.2. 2.2.1 Center of mass conservation In systems with heterogeneous masses, a gradient-following Newton-Raphson root search erroneously introduces center of mass translation. Here we explore the ramifica- tions of this and derive a projected root search that does not. This topic is also taken up in Appendix B, where two other methods are described. The results derived in this section allow us to construct geodesics that are guaranteed not to displace the center of mass while traversing configuration space. This is of particular importance to the present work, where the center of mass motion contributed a non-trivial excess to the total kinematic length. We considered enforcing other conservation properties, such as the Eckart conditions for rotations [57]. But the molecular-dynamics-derived endpoints do not obey these conditions8 and so there is no way to impose them on the geodesics. Notation In the analysis of molecular systems with N atoms in d physical dimensions, it is sometimes desirable to picture the system as specified by a single vector in the Nd- dimensional configuration space and at other times as N spatial vectors. We use the following definitions to transition between representations. 8 nor should they, though they do conserve angular momentum. 23 • Let ~xi ∈ Rd be the column vector specifying the spatial coordinates of the ith atom; i = 1 . . . N. • Let R ∈ RdN be the column vector representing the total configuration of the system; RT = (~xT1 , ~xT2 , . . . , ~xTN ) e e • Let mi be the mass of the ith atom. We can extract spatial vectors from the configuration space vector by defining the d × dN matrix, C ~ such that ei ~ ·R ~xi = C (2.13) ei e For instance, in 3 spatial dimensions,   1 ~ = 1   C e1    1 ... In general,   ↔ ~ C i j = δi,j 1 , j = 1...N (2.14) e ↔ Where 1 is the d × d unit matrix. In the other direction we have: N ~ T · ~xi X R= C (2.15) e i=1 e i We compute the center of mass of the system as follows: N 1 X ~xcm = mi~xi Mtotal i=1 N 1 X ~ ·R = mi C Mtotal ei e (2.16) i=1   N 1 X ~  = mi C · R Mtotal ei i=1 e ~ ·R ≡M e e 24 PN ~ is a d × dN dimensional, where Mtotal = i=1 miis the total mass of the system and M block-diagonal matrix composed of N ordered, d × d blocks with mi (i = 1 . . . N) along e the diagonals. ~ = mi 1   ↔ M , i = 1...N (2.17) e i Mtotal Should the Center of Mass be Conserved? Our intuition is that center of mass translation should increase the length of a path between two regions of configuration space which have the same center of mass; this is indeed the case as proved below. The length we seek to minimize (eq. 2.5) is √ Z ` = dτ 2T(τ ) (2.18) where T(τ ) is the kinetic energy as a function of progress along the path, defined (equivalently to eq. 2.6) as: d~xi 2 N X  2T(τ ) = mi (2.19) dτ i Discretizing the integral in eq. 2.18 (see eq. 2.117 in Section 2.7.1) gives the contribution from a single small step, v uN m ∆~x 2 uX ∆` = t i i (2.20) i where ∆~xi is the displacement of the ith center in a time ∆τ . Suppose the {∆~xi } are decomposed into 2 components: ∆~xi = ∆~xi0 + ∆~x (2.21) where ∆~x corresponds to net center of mass translation and the ∆~xi0 preserve the  center of mass. That is: N mi ∆~xi0 X ~0 = (2.22) i 25 Inserting eq. 2.21 into eq. 2.20 yields: v uN 2 mi ∆~xi0 + ∆~x uX ∆` = t i v   u uXN N N 2 mi ∆~xi0 + ∆~x 2 mi ∆~xi0  · ~x (2.23) X X =t mi +  u i i i v uN 2 mi ∆~xi0 + M∆~x 2 uX =t i where the last term of the middle line is 0 by eq. 2.22. Identifying, v uN 2 mi ∆~xi0 uX ∆`0 = t (2.24) i as the length in the absence of center of mass translation, we then have: q ∆` = ∆`20 + M∆~x 2 > ∆`0 (2.25) Thus completing the proof that net center of mass translation increases the kinematic length. Is the Center of Mass Conserved? Having established that to be “shortest”, paths through configuration space must preserve the center of mass in real space, we turn to the algorithm to compute geodesics in the potential energy landscape ensemble presented in the previous section. There are two components of the algorithm, which treat “free” and “escape” steps respectively. We now analyze each in turn. Free Steps We give eq. 2.9 for computing free (straight-line) steps between the current position, Rt , and the target, Rf : e e R – Rt Rt+1 = Rt + δR e f e t (2.26) e e Rf – R e e 26 Collapsing the scalars into the constant γ yields: Rt+1 = Rt + γ Rf – Rt (2.27)  e e e e We can impose the requirement that our boundaries have the same center of mass (recall eq. 2.16) by demanding: ~ · R0 = M M ~ ·R (2.28) f e e e e And this is indeed the case for endpoints generated by molecular dynamics. We then ask if R1 , computed by eq. 2.27 will have the same center of mass as R0 . e e ? ~ · R0 = M ~ · R1 M (2.29) ~ · R0 + γ R – R0 e e e he  i =M f e e e e Canceling the common term and the scalar leaves: ~ · R – R0 ,   ~0 = M (2.30) f e e e which is a restatement of the limits on our boundary conditions, eq. 2.28. This proves, therefore, that the first step does not perturb the center of mass. By induction, we conclude: ~ · R0 = M M ~ · Rt = M ~ ·R (2.31) f e e e e e e for all t. And so the center of mass is not perturbed by the free steps of the algorithm. Escape Steps When the system enters a region where the potential is greater than the landscape energy, V(R) > EL , An escape step is required. Usually this entails a Newton-Raphson root search—iteratively following the gradient until the landscape e energy criterion is satisfied: V(Rn ) – EL Rn+1 = Rn– e 2 · ∇V Rn (2.32) ∇V Rn e e e e e e Such steps do no preserve the center of mass; here we show how. As in the case of the free steps, we collapse all the scalars into a constant term, ζ, leaving: Rn+1 = Rn – ζ ∇V Rn (2.33) e e e e 27 Again our question is, Does the iterated map preserve the center of mass? We encode this as (recall eq. 2.16): ~ · Rn =? ~ M M · Rn+1 (2.34) e e e e ! ? ~ = M · Rn – ζ ∇V n e e e R e Canceling the common term and the scalar leaves us with the following expression: ? ~ ~0 = (2.35) M · ∇ V R n e e e which doesn’t look too good. Does it Hold? Proving that eq. 2.35 holds in general would be quite difficult9 , but invalidating it requires only one counter-example. All the problems treated by the group’s analysis of geodesics to date, involve translationally invariant potentials, so we confine our attention to a particularly simple one: the pair-potential between two mutually interacting centers, labeled 1 and 2, in 3 dimensions. The form of our potential is then: V = u(~x1 – ~x2 ) (2.36) Taking ~x = ~x1 – ~x2 We compute the derivative in eq. 2.35, arriving at:   ∂u + ∂~x  ∇V =     (2.37) – ∂u e ∂~x Inserting eq. 2.37 into eq. 2.35 gives ~ ∂u ∂u M · ∇ V n = m1 – m2 e e R ∂~x ∂~x ∂u (2.38) e = m1 – m2  ∂~x ? = ~0 Showing that eq. 2.35 fails for m1 6= m2 . Therefore, a simple, gradient-following root search does not preserve the center of mass with heterogeneous masses, which 9 and would require mistakes, as it does not. 28 erroneously lengthens the path. On the flip-side, eq. 2.35 does hold when the masses are the same. It can be shown that eq. 2.35 holds for a sum of such pair potentials with equal masses and therefore the group’s previous results are safe from this issue. Enforcing Center of Mass Conservation Analytically In this section, we derive a transformation to displacements in configuration space, which leaves them unmodified except for removing any center of mass translation they would have introduced. We effectively accomplish this via a uniform back-translation. This general result is then verified. The method presented here is the one used in our work though is only one of several. Two others, of staggeringly different complexity, are presented in Appendix B. Suppose we have a general displacement in configuration space, ∆R, and would like to remove center of mass motion. We therefore seek P such that: f e ~0 = M~ · ∆R + P (2.39)  e f e This leads to the under-determined relation: ~ · P = –M M ~ · ∆R (2.40) e e e f There are many solutions to this equation (including P = –∆R, which wouldn’t help us much!), but we can choose the following convenient approach to constructing a e f useful solution: Let P be the configuration space vector which displaces each center by the same quantity such that eq. 2.40 is satisfied. This quantity, of course, will be the e negative of the net center of mass displacement introduced by ∆R and is given by: f ~ cm = –M ∆x ~ · ∆R (2.41) e f We can build P from ∆x ~ cm using eq. 2.15 as follows: e   N ~ T  · ∆x ~ cm X P= C i (2.42) i=1 e e ~ T, Defining the summed matrices as C e  T ↔ ~ C =1 , i = 1...N (2.43) e i 29 we can write ~T · M P = –C ~ · ∆R (2.44) e e e f Further, defining T M=C ~ ·M ~ (2.45) e e we can write P = –M · ∆R (2.46) e f So the configuration space representation of ∆R, which includes no center of mass translation is: f ∆R? = ∆R + P = (1 – M) · ∆R (2.47) f f e f Where 1 is the dN × dN identity. The dN × dN matrix M has a band structure defined by: mj ↔ Mi,j = 1 , i, j = 1 . . . N (2.48) Mtotal For instance, in three spatial dimensions, d = 3, we have:   m1 0 0 m2 0 0 mN 0 0  0 m1 0 0 m2 0 · · · 0 mN 0      0 0 m1 0 0 m2 0 0 mN       1 0 0 m2 0 0 m  1  ..  M= (2.49)   0 m 1 0 0 m 2 0 .  Mtotal   0 0 m1 0 0 m2           .. ..    . .   Armed with this result we can re-write the equation for escape steps (eqs. 2.32 and 2.33) as: Rn+1 = Rn – ζ (1 – M) · ∇V Rn (2.50) e e e e where V(Rn ) – EL ζ = e 2 (2.51) ∇V Rn e e as before. 30 Verification We would like to verify that our expression (eq. 2.50) conserves the center of mass. We proceed as before, using our new result. ? ~ ~ · Rn = M M · Rn+1 (2.52) e e e e ! ? ~ =M · Rn – ζ (1 – M) · ∇V n e e e R e After, rearranging and canceling the common term as well as the scalar, we have: ? ~ ~ · ∇V n = M (2.53) M R M · · ∇ V n R e e e e e e Inserting the definition for M (eq. 2.45) gives: ? ~T · M     ~ · ∇V n = ~ ·C ~ · ∇V n (2.54) M R M R e e e e e e e e which can only hold if ~ =M M ~ ·C ~T · M ~ (2.55) ~T e e e e We can see that this does indeed hold by inserting the component definitions of C ~ (eq. 2.17): (eq. 2.43) and M e e   N ~T · M ~ T · M h i ~ ·C ~ = ~ C ~ X M M i i e e e i e e e   N X mi ↔ ↔ ~ = 1 · 1 · M Mtotal i e    (2.56) N 1   X ↔ = 1  ~ mi  · M Mtotal i e h↔ i = 1 ·M ~ ~T · M e ~ ·C M ~ =M ~ e e e e What of the Newton-Raphson Root Search? From the above, eq. 2.50 clearly pre- serves the center of mass. However, the point of eq. 2.32 was a root search to ensure V(R) = EL . Does the sequence of projected Newton-Raphson steps retain the property e 31 that they locate the V(R) = EL boundary of the forbidden region? Yes it has; the con- struction of M began with P in eq. 2.42. In which we specified displacing all centers by e the same amount, ∆x~ cm . This amounts to a net translation, under which our potential is e invariant. In Appendix B.2, we take up the general problem of projected root searches. If our problem contained a field and the potential was not translationally invariant or if the boundary configurations did not have the same center of mass, all of this would go out the window. In particular if it were not the case that the potential was translationally invariant, we would not be able to rationalize there being something “wrong” with center of mass translation in the geodesic as we did earlier. 2.2.2 Geodesic parameters With respect to step sizes, we take steps during the optimization phase of length equal to the steps during the deterministic phase, δp = δs = 5 × 10–2 a0 = 2.645 × 10–2 Å. We chose this value because we expected variations in the potential to be small on this scale and smaller values markedly increased computation time without substantively increasing the length of the path found. Lengthening of the path corresponds to skipping fewer obstacles and/or more closely tracing curves—thereby more accurately exploring the landscape. See Table 2.2 for a comparison of lengths over a range of step sizes. With regards to termination criteria, we aggressively optimize of the length of the paths: requiring 103 trials resulting in no shortening. This means that on average, optimizing a single geodesic requires computing order 104 paths. In most cases, the optimization phase offers modest (5%) reductions in length of the computed geodesics, however there are some extreme cases where it yields length reductions of order one half. There are also times when the relative positions of endpoints prevent our algorithm from finding a connecting path between them10 —it is quite possible that one does not even exist.Pairs of endpoints were excluded from analysis when the algorithm failed to find a path within a “reasonable” (orders of magnitude greater than typical) amount of time. Specifically, if the algorithm required more than 102 more steps than 10 For details and rates of failure, see Table 3.12 on page 91. 32 δs /a0 Computed Path Length / a0 Execution Time / s 5 × 10–1 16.6827 0.06 5 × 10–2 17.2255 0.33 5 × 10–3 17.2570 3.51 5 × 10–4 17.2620 35.81 Table 2.2: Comparison of step sizes in geodesic algorithm Comparison of Euclidean path lengths (without optimization) and execution times to compute geodesics for different step sizes on the formaldehyde potential with EL = 34928.7 cm–1 . In this example, the Cartesian distance between the products and reactants is 15.923 a0 . An attempt was made with δs = 5 × 10–5 , but it did not terminate on its own within 1000 seconds. We postulate that this is because the step size was smaller than the scale used in computing the gradient (see Section 2.7.2), which rendered the algorithm inaccurate. an unobstructed path would require, the search was abandoned: $ % R – R0 2 imax = 10 · e e f (2.57) δs where b·c is the floor function. We also set the cutoff at 103 , but this served only to lengthen execution time and did not appreciably decrease the rate of failure. Selecting a landscape energy for a small molecule system Another concern of importance is the selection of the landscape energy. In the group’s previous work, which has focused on macroscopic systems, the average value of the potential over the course of a simulation was used [8, 49]. Indeed, in the thermodynamic limit, hVi = EL exactly. In the present case, we are not in the thermodynamic limit and such a solution would be of little use. The system spends much of its time in the reactive well, where the majority of the internal energy is kinetic. The moment of passage through or around any transition state is but fleeting and will not meaningfully contribute to the average. Another technique is simply to select some probable, “thermodynamic” value for the energy. This is what we did in our first paper treating formaldehyde [1]: the landscape energy was taken from the peak of the distribution of values of the potential while a hydrogen was far-removed from the formyl over many molecular dynamics simulations. 33 Eµ / cm–1 excitation wavelength / nm 34500 349.0 ∗35000 343.0 35500 337.2 ∗36000 331.6 36223 329.2 37000 321.0 ∗37600 314.9 37688 314.0 38000 311.0 38688 304.5 ∗38814 303.3 ∗41010 284.4 ∗45000 255.4 Table 2.3: Microcanonical and landscape energies These energies were selected to span the range of energies over which roaming manifests in formaldehyde and to allow maximal overlap with the studies of Houston et al. [58]. Energies above were used for sampling reactant well and for molecular dynamics studies. An experimental realization of these energies would correspond a laser with energy Eµ – Ezpe ; Ezpe = 5844 cm–1 , see Section 2.1.1. An asterisk (*) indicates energies for which geodesics were computed. The horizontal line shows the approximate limit at which roaming is observed experimentally (41060 cm–1 [17]). While this is fine, it is still somewhat arbitrary in that it depends on the choices for “far-removed”. It could also conceivably forbid paths that are indeed classically energetically accessible on the energy surface. In small systems, with few degrees of freedom, it is the partitioning of energy into different degrees of freedom that controls the particulars of the dynamics. The key observation is this: the microcanonical energy, Eµ , already sets a limit on the maximum potential energy, V(R) ≤ Eµ . For rare processes, such as roaming, energy may need to be partitioned among the various degrees of e freedom very specifically. By taking the landscape energy equal to the microcanonical energy, EL = Eµ , all such partitionings and transformations between them which are classically accessible, are accessible to the geodesics we construct in the landscape ensemble. Desiring such properties, this is just what we do. For this work we compute geodesics at the energies given in Table 2.3; see Figure 2.1 for comparison to the relevant energies in formaldehyde. While all the major conclusions 34 described in Chapter 3 rest upon geodesics constructed with EL = Eµ , we do, at times, appeal to analyses conducted for our previous work [1]. In these cases, Eµ = 37600 cm–1 and EL = 34928.7 cm–1 . In that earlier work, we also repeated our analysis of direct and roaming geodesics with the landscape energy set slightly below the radical dissociation threshold11 , EL = 33238.7 cm–1 < 33240 cm–1 = Erad µ . We did not repeat the sampling procedure for reactants or products described below in Section 2.3, but did for the roaming region. Our results and conclusions were virtually identical for this lower bound on the landscape energy. 2.2.3 Trapped at a fixed point in 2 dimensions As part of getting up to speed with the group’s methods, we re-implemented Wang and Stratt’s early work on geodesics [8]. In the process we encountered a previously unreported failure mode of the geodesic algorithm described in Section 2.2. The failure is only of practical concern in systems with two degrees of freedom and presumably lay undetected because the group typically treats problems 1 to 3 orders of magnitude larger. The Müller-Brown potential [45] is a 2-dimensional model surface that captures the key features of typical chemically-relevant energy landscapes. It has two major basins, an intermediate, and two saddle points—these correspond to reactant, product, inter- mediate, and transition states between them. Formed as a superposition of Gaussians, it is simple to compute and the surface is often used as test case for new optimization or path-finding algorithms. So it was used by our group. The potential is shown in figure 2.4. The major surprise while implementing the algorithm came during the testing phase. When finding geodesics from the upper (“reactant”) basin to the lower (“product”) basin, the program became stuck in an infinite loop for landscape energies below -34.06. In the opposite direction, the same thing happened at an even higher energy: -29.93. Both of these energies are comfortably above the highest saddle connecting the basins, -40.7, so there should have left plenty of space for a transition. The error was not in the implementation, it is a feature of the algorithm when 11 Radical geodesics cannot be computed with landscape energy below the threshold for radical dissociation. 35 Figure 2.4: Müller-Brown potential In the arbitrary units of the original paper, this depiction spans an energy range of roughly -150 to +150. Basins have depths of -146.7, -80.8, and -108.1 (diagonally from upper left) and saddles are at energies of -40.7 and -72.3 (diagonally from upper left). operating in two dimensions. The failure is more likely to occur the longer the algorithm is confined to a basin with concave boundaries. En route to escape, the trial point, Rt , winds around the edge of the basin. As it does, the unit vector pointing at the target e smoothly rotates in one direction. At the same time, the direction of steepest descent, the unit gradient, rotates smoothly in the opposite direction. If the particle does not escape the basin in time, at some iteration, the two are parallel and point directly at each other. Then the algorithm encounters a fixed point, and will forever take steps back-and-forth in the same location: Rt+1 = Rt ; see Figure 2.5. The fixed point is en stable because following the gradient in the Newton-Raphson search (eq. 2.12) on one e side and the trial step towards the target (eq. 2.9) on the other will drive the system back towards the problem location and thus the errors associated with computing in finite precision will be of no help in escape. The problem manifests for two degrees of freedom, but not for more because the rotating unit vectors trace out the unit circle, S1 . Given long enough time and if they rotate in “opposite directions”, they are guaranteed to be coincident. On unit spheres of greater dimension, there is more than one angular degree of freedom and the target and gradient vectors may wander without encountering each other. Thus for systems 36 Figure 2.5: Fixed point on the Müller-Brown potential The highlighted contour is at the landscape energy. The initial path of the algorithm is shown in black. The the dashed lines point to the product, RProduct , from the current position, R(t) . The gradient is shown with arrows. A fixed point occurs at the › when e the vectorepointing at the product is parallel to the gradient: Rf – Rt k ∇V R Rt  e e e e with n > 2 degrees of freedom, the probability of coincidence on Sn–1 is infinitesimal and the problem does not occur. 2.3 Defining and sampling regions of interest Our algorithm generates paths that give us information about the connectivity between regions of configuration space. Rather than being concerned with the fine details of minima, we sample the volumes of configuration space corresponding to the species whose inter-conversion we wish to study: reactants, molecular and radical products, and roaming intermediates. As geodesics are paths between points, we need a way to sample these volumes of configurations space. The following section details our methods of drawing points from the regions 37 of configuration space we explore. To compute geodesics, we select endpoints by randomly drawing points from the appropriate distributions. Some of our sampling techniques produce points in phase space (including kinetic degrees of freedom). As our algorithm only requires points in configuration space as input, we simply project out the momenta when necessary. 2.3.1 Monte Carlo for microcanonical reactants To microcanonically sample region of the reactants we use the method of normal mode approximation with anharmonic corrections described by Peslherbe and coworkers [59]; we describe the high-points here and in more detail in Appendix C. First the potential is approximated by normal modes constructed from a diagonalization of the Hessian matrix at the global minimum. Energy is then deposited in these modes according the microcanonical distribution for a collection of harmonic oscillators with total energy, Eµ . In our studies, Eµ takes on the values given in Table 2.3 on page 33. As we desire to fix total angular momentum to 0, excess angular momentum is then removed by subtracting the net instantaneous rotation of the molecule. First, we form the spurious angular momentum (this is simply the total angular momentum of the molecule), N ~js = X ~ri × mi~vi (2.58) i=1 Where the sum is over atomic centers and ~ri , ~vi , and mi give the Cartesian position, velocity, and mass of the ith atom. To correct for this term we add ~j = –~js to the molecule by computing the vector angular velocity: ω = I–1~j (2.59) where I–1 is the inverse of the moment of inertia tensor [60]. We then add this angular velocity to each center: ~vi0 = ~vi + ω × ~ri (2.60) As we deposit energy in the normal modes, there will automatically be no net translational momentum. Finally, to correct for anharmonicity of the potential, the coordinates are iteratively displaced and the momenta scaled until the phase space point reflects the desired total microcanonical energy (see Appendix C.3). 38 Using this method, 4 × 104 such phase space points were generated and then used as inputs to molecular dynamics described in the next section. They were also used as initial configurations for our algorithm for constructing geodesics on the energy landscape. Of note: only those configurations, whose phase space points lead to molecular products were used as input to the geodesic algorithm. We have no reason to believe that such dynamical correlation would impact our results; see Section 3.6.3. 2.3.2 Molecular dynamics for products To generate configurations corresponding to molecular and radical products, the reactant phase space points are propagated via molecular dynamics. At each time step forces act along the negative of the gradient. A 4th order Runge-Kutta integrator [61, equation 25.5.10] implemented in the GNU Scientific Library [62] advances the state of the system according to Hamilton’s equations for the potential. This point in phase space is propagated forward in steps of at most12 5 ·¯h/Eh = 0.121 fs until one of the following occurs: 1. The center of mass distance between H2 and CO is greater than 12 Bohr (12 a0 = 6.350 Å), indicating dissociation (molecular or radical), or 2. 12.1 ps of simulation time (5×105¯h/Eh ) have elapsed without the above occurring, indicating no reaction. During the trajectory, the state of the system is recorded every 100¯h/EH = 2.42 fs. These termination conditions and time step choices are commensurate with those used by Bowman and coworkers [55] when working on the same surface. Once a reactive trajectory has been identified, the H – H bond distance is probed to assign the trajectory to radical or molecular products. We assign molecular products for bond distances less than 6 a0 (3.18 Å) and radical products otherwise. We do not address the triple-fragmentation channel as it is energetically inaccessible. For each of the energies listed in Table 2.3 on page 33, 4 × 104 such trajectories were simulated; their final dispositions are discussed in Section 3.3. The final states of 12 Ifthe implementing library deems the accuracy (estimated through step-doubling [63]) with the given time-step insufficient, the time-step is reduced by 20%. This is repeated to a minimum of 0.84¯h/Eh at which point the trajectory is discarded as a failure. 39 the reactive trajectories were then used as endpoints for geodesics13 . 2.3.3 Monte Carlo random walk for the roaming region One of the difficulties associated with studying roaming is unambiguously deciding when it happens. Given the high vibrational excitation necessary to elicit roaming in H2 CO, C – H bonds may often be quite stretched during a molecular dynamics trajectory and a hydrogen might often “look like” it is or is about to roam. The situation is in little danger of improving as there is no consensus in the literature defining roaming. Our notion of roaming is based on two components. The first is a static, configuration space requirement. Like other workers [30, 42], we make use of a roaming region, which is described below. The second is a dynamical feature of a trajectory, namely that after excitation, the roaming hydrogen visits this region and then returns to abstract the other H from the formyl and to dissociate to close-shelled molecular products. Because roaming is a rare event, the molecular dynamics studies described in the previous section do not produce enough configurations to adequately sample the roaming region at a given energy. In this section, we also describe the random walk Monte Carlo method used to sample the roaming region in the landscape ensemble. Defining the roaming region A succinct definition for the roaming region remained elusive for some time. Key to our effort adding a force-based constraint to the otherwise geometric considerations. The force criterion allows us to ensure that a dissociating hydrogen is indeed on a potential energy plateau (small forces if and only if small gradients), characteristic of roaming dynamics. For the geometric criteria, we err on the side of over-constraint so as to avoid mis-classifying states of radical dissociation as roaming. A configuration-space point is in the roaming region per our definition if the following criteria are satisfied; see Table 2.4 for constants: 13 While the ends of all reactive trajectories were used as endpoints for geodesics, only those initial phase space points that resulted in molecular products were used as start-points. We discuss the impacts of dynamical correlation in detail in Section 3.6.3, but suffice it to say that we expect this particular choice to have no impact on our results. 40 1. The restoring force on the roaming hydrogen is of small magnitude14 : FH(0) < Fmax (2.61) 2. The hydrogens are separated by more than a minimum and less than a maximum distance: dmin < ~rHH(0) < dmax (2.62) The roaming hydrogen, designated H(0) , is uniquely identified in configuration space as the hydrogen with the greatest Euclidean separation from the total center of mass. We define the restoring force on the roaming hydrogen as: ~ (0) V(R) FH(0) = –ˆrH(0) –CM · ∇ (2.63) H e where   ~ (0) = ∂ ∂ ∂ ∇ H , , (2.64) ∂xH(0) ∂yH(0) ∂zH(0) and ˆrH(0) –CM = ~rCM – ~rH(0) / ~rCM – ~rH(0) (2.65)  As usual, V(R) is the formaldehyde potential energy function and ~rH(0) and ~rCM are the real-space coordinates of the roaming hydrogen and total center of mass respectively. e One may notice that under the above definitions, attractive forces are positive and repulsive forces are negative (eq. 2.65 results in this inversion of convention). We hope that our choice allows us to express the bounds on the force more intuitively as it is most important the restoring force be small in magnitude14 . We place a maximum on the force because in roaming trajectories, there are large, persistent H – HCO separations, which require small forces on the wayfaring hydrogen. Bounds on hydrogen-separation are required because the force criterion alone would mis-classify equilibrium and transition state configurations, clearly not representative of roaming, as such. We use the hydrogen separation as opposed to H – HCO separation because large hydrogen-formyl separation can also correspond to 14 Ifthe restoring force is to be attractive, it must be small otherwise the hydrogen is not on a plateau and will plunge back into the reactive center. For simplicity, we place no bound on the extent to which the force is repulsive; the geometric and energetic constraints already cover this case. 41 Parameter Value Fmax -103 cm–1 /Å dmin 3.10 Å dmax 4.76 Å Table 2.4: Roaming region constants dissociation to molecular products15 . The upper bound excludes radical dissociation, which otherwise comprises a majority of the space. One benefit of the criteria we employ is the ability to distinguish a wide variety of roaming behavior. Had we simply relied on a dynamical signature like “low rotational quantum number after dissociation”, e.g. JCO ≤ 16, we would have missed trajectories that manifestly roam with JCO much higher; see Section 3.3.4 for more. Sampling the roaming region To sample the roaming region we use a Monte Carlo random-walk technique adapted from Brady and coworkers [64], which was originally designed to facilitate micro- canonical sampling of Lennard-Jones clusters. Here, we use it to sample configurations in the potential energy landscape ensemble for EL with the additional requirement that the configurations be in the roaming region described above. The details are given in Appendix D. Our probability density is a function which equals 1 if the trial configuration is in the roaming region and meets the landscape criterion and otherwise equals 0. Because the codomain of this density is binary, the acceptance probability is as well—there are no marginally better or worse moves, only allowed and disallowed. To generate trial moves for the random walk, we displace a single random atom, j, along a random coordinate, by a random amount, ∆ ~j =µ ˆj (ξ – 0.5)δx . Where ξ is a uniformly distributed, random variable on [0, 1), δx is a scaling factor and µ ˆj ∈ {ˆx, yˆ, zˆ} is along a randomly selected Cartesian axis of the jth atom. After each such step, an additional move to preserve the center of mass is made by moving all atoms by the 15 Consider,for distantly separated H2 and CO, the relative centers of mass of a single hydrogen and a formally defined “formyl” composed of the other hydrogen and the CO. 42 same, appropriate amount. Thus the total move for each atom i is: mj   ∆~ri = µˆj (ξ – 0.5)δx δi,j – ; i = 1...N (2.66) M where M = N i=1 mi is the total mass of the system. P To select δx , we computed several “burn-in” sequences of 103 steps at EL = 37600 cm–1 during which δx was scaled up and down in 5% steps until we achieved an acceptance ratio between 0.1 and 0.9. During the data collection phase, δx was pinned to its final value of 0.27562 a0 , which yielded acceptance ratios between 0.80 and 0.95. To seed the random walk we took configurations satisfying both the landscape and roaming criteria from molecular dynamics trajectories that went to molecular products (roaming trajectories); this yielded between 200 and 9000 configurations at each energy. Using each configuration as a different starting position, we generated random walks of length 105 steps, recording a configuration every 100 moves. This gave a list of 105 –107 points in the roaming region, which were then shuffled and used as inputs to the geodesic algorithm. For energy specific details, see Table D.1. 2.3.4 Geodesic itineraries Once points in each of the regions were generated at each of the landscape energies, we construct itineraries for the geodesics as follows: • A reactant configuration is drawn from the those phase space points microcanon- ically sampled from the reactant well as described in Section 2.3.1 that later dissociates to closed shell products. • A radical product and a molecular product endpoints are each selected from the products of the molecular dynamics simulation described in Section 2.3.2 • An intermediate, roaming point, is drawn from points sampled via a random walk over the roaming region and the landscape ensemble as described in Section 2.3.3. Geodesics are then computed as described in Section 2.2 between points as given here: direct: reactant → molecular product  S  roaming: reactant → roaming point roaming point → molecular product 43 radical radical: reactant → product The union (∪) specifies that roaming geodesics are the concatenation of two geodesics. Tens of thousands of unique routes are thus planned and executed at each energy. 2.4 Analyzing geodesics Having computed some hundreds of thousands of geodesics, we would like to analyze them in aggregate. Our goal is to extract the key features of the inherent dynamics of the of each of the dissociation channels—and in particular to understand the genesis of roaming. 2.4.1 Total lengths Given that the geodesics are characterized by a minimum kinematic length, measuring the lengths of the paths is a natural place to start. The total kinematic length of a path gives us information about the total amount of motion required along the geodesic. This length is defined as follows: Z t `Total = dτ [2 · T(τ )]1/2 (2.67) 0 where T is the kinetic energy, τ is a progress variable on [0, t], specifying the limits of integration are over the entire path. To find the total kinematic length, we insert T T = 12 R˙ MR˙ and have: Z t h i1/2 T e e `Total = dτ R˙ MR˙ (2.68) 0 e e dR where R˙ = dτe is the rate of change of the configuration space vector R and M is the diagonal mass matrix: e e   m  1    m1   m1   M= (2.69)      m2     ...     mN 44 or (M)iµ,jν = δi,j δµ,ν mi i, j = 1 . . . N ; µ, ν = x, y, z (2.70) Where mi is the mass of the ith particle in the N-particle system. (Recall Table 2.1 for the relevant masses.) In Section 2.7, we will discuss the finite difference expressions used to calculate the integral in eq. 2.68. The reduction provided by eq. 2.113 yields, for n steps: n–1 h i1/2 T δRi MδRi X `Total = (2.71) i=0 e e where δRi = Ri+1 – Ri . The expression for the length is simply the mass weighted sum of the steps along the path. e e e As the kinetic energy can be partitioned between many degrees of freedom, we can interrogate kinematic lengths corresponding to a subset of them by constructing the appropriate component of the kinetic energy (e.g.: rotation or vibration) and integrating as before. In doing so we can probe the contributions of different kinds of motion to the inherent dynamics. In general, we can construct arbitrary kinematic lengths provided that we can construct a kinetic energy, Tα , such that: X H= Tα + V α Tα = Tα (R, R˙ ) e e Though the significance of such lengths will be limited to our ability to interpret them physically. In the case of the diatomic rotations and vibrations we treat, the motions are well-defined and physically significant even if the species are not chemically isolated. 2.4.2 Diatomic Vibration If we wish to understand the vibrating motions of a pair of atoms in our system, we can construct the vibrational kinematic length. The vibrational kinetic energy for 2 particles, α and β is: 1 2 TVib = µr˙ (2.72) 2 m ·m Where r = k~rk = ~rα – ~rβ is the separation between the centers and µ = mαα+mβ is β the reduced mass. In analogy with eq. 2.67, we insert eq. 2.72 into the expression for 45 the length (eq. 2.67) and have: Z t h i1/2 `Vib = dτ µr˙ 2 (2.73) 0 Z t √ = µ dτ kr˙ k (2.74) 0 The resulting sum is: n–1 √ X i `Vib = µ δr (2.75) i=0 where δri = ri+1 – ri . 2.4.3 Diatomic Rotation We can perform a similar calculation for the rotational degrees of freedom. The rotational kinetic energy for a pair of particles, α and β, is: 1 2 TRot = I~ω (2.76) 2 1 2 dˆr 2   = µr (2.77) 2 dτ Where ω~ is the angular velocity, I is the moment of inertia, and r and µ are defined as  2 ~ 2 = dτ above. The unit orientation vector is ˆr = ~rr and it is true that ω dˆr [60]. Again, we insert the expression for kinetic energy into eq. 2.67 and have:  #1/2 dˆr 2 Z t "  √ `Rot = µ dτ r2 (2.78) 0 dτ Now we compute: ~r   dˆr d = (2.79) dτ dτ k~rk ~r˙ ~r d   = – k~rk (2.80) r r2 dτ ! ~r˙ ~r ~r · ~r˙ = – (2.81) r r2 r 46 With some re-arrangement, we arrive at a more compact form: ! ˆ dr ~ ˙ r ~ r˙ = – ˆr ˆr · (2.82) dτ r r 1 ˙ h  i = ~r – ˆr ˆr · ~r˙ (2.83) r ˙~r  = · 1 – ˆrˆr (2.84)  r where 1 is the identity and we have used the dyadic identity ~x(~x · ~y) = ~y · ~x~x. dˆr in hand, we can compute the argument to the square root in our integral: With dτ 2 " #T " #  dˆr ~˙ r ~˙ r r2 = r2 · I – ˆrˆr · I – ˆrˆr (2.85)   · dτ r r T T = ~r˙ · I – ˆrˆr · I – ˆrˆr · ~r˙ (2.86)  T = ~r˙ · I – ˆrˆr · ~r˙ (2.87)  The tensor terms in the penultimate step can be reduced by simple distribution and the identity: (ˆxˆx)2 = ˆxˆx, which holds for all unit vectors ˆx; see eq. B.21 for derivation. The reduction also follows from the idempotency of projections. Algebra out of the way, we can write a final expression for the rotational kinematic length: Z t 1/2 T  √ `Rot = dτ ~r˙ · I – ˆrˆr · ~r˙ (2.88)  µ 0 which is then reduced to the sum: n–1 i1/2 √ X h iT  i i  i `Rot = µ δr · I – r r · δr ~ ˆ ˆ ~ (2.89) i=0 where δ~r i = ~r i+1 – ~r i and ˆr i = ~r i /r i . 2.4.4 Boundary fraction In the geodesic approach, the potential energy surface enters into the dynamics only as a constraint on otherwise straight-line paths. It manifests as obstacles in the configuration space, which the geodesics then trace around. In our previous work, studying systems 47 with many hundreds of degrees of freedom, we found that the geodesics hugged these obstacles for the near-entirety (>99%) of the path [8]. Formaldehyde, with only six internal degrees of freedom, is sufficiently low-dimensional that the obstacles are sparse and geodesics ricochet between them. By measuring the fraction of path spent along the boundaries of these obstacles, we can understand how tortured and constrained the routes taken by geodesics are. We define the boundary-fraction of a geodesic, hΘi, as the length-fraction of the path that is very near the landscape energy, EL : ]1/2 Θ[R(τ ); EL ] Rt dτ [2T hΘi = 0 R t total (2.90) [2T ] 1/2 e 0 dτ total where:  1 EL – V(R) < δEL Θ[R; EL ] = e (2.91) e 0 otherwise for suitably small δ = 10–4 , Θ is finite only for configurations “on the boundary”. With δ = 10–5 our results are indistinguishable. While we used the total kinematic length as our weighting function, we could have easily used any of the other kinetic energies: diatomic rotation or vibration. When computing the boundary fraction in this other way, the distributions are nearly to within statistical noise. 2.5 Estimating geodesic lengths To verify our calculations it is desirable to make an analytic estimate of the sizes of the kinematic lengths involved. Here we use a simple model to make such an estimate. These estimates should be considered lower bounds for a given pair of endpoints because they assume no potential obstacles—that the landscape energy is effectively infinite. 2.5.1 Total Kinematic Length We conceive of our system as having only 2 components: a mass of mH2 = 2 · mH located at ~rH2 and a mass of mCO = mC + mO located at ~rCO . Further we will content ourselves to deal only with the translational motions of these fragments and therefore 48 restrict ourselves to a line. Finally, we assume that all motion is barrier-free—that no potential obstacles intervene, or that for a straight-line path, V(R) ≤ EL . Are these reasonable simplifications? They are if the contribution to the kinematic e length from translation is much greater than from internal rotation or vibration of H2 and CO. In particular, we consider only direct dissociation paths, which are relatively unhindered. Recalling our expression for kinematic length (eq. 2.67), the required kinetic energy of the simplified system is: 2T = mH2~r˙H22 + mCO~r˙CO 2 (2.92) If we assume constant velocity during dissociation and a total time t, we can write: ∆~rH2 2 ∆~rCO 2 2T = mH2 + mCO (2.93) t t where ∆~rCO and ∆~rH2 are the total distances traveled by CO and H2 respectively. Fixing our total momentum to zero (implying zero net motion of the center of mass) allows us to write: 0 = ~pH2 + ~pCO (2.94) which implies: ∆~rCO ∆~rH2 mCO = –mH2 (2.95) t t and therefore: mH2 ∆~rCO = –∆~rH2 (2.96) mCO Inserting this relation into our expression for 2T, eq. 2.93 yields: ∆~rH2 2 mH2     2T = mH2 1+ (2.97) t mCO Now we only need an expression for ∆~rH2 , which we can find in terms of ∆~rH2 –CO , the change in the center of mass separation of H2 and CO. ∆~rH2 –CO = ~rH2 – ~rCO f – ~rH2 – ~rCO i (2.98)   = ∆~rH2 – ∆~rCO (2.99) mH2   = ∆~rH2 1 + (2.100) mCO 49 where we have used eq. 2.96 to eliminate ∆~rCO . Using eq. 2.100, we eliminate ∆~rH2 from 2T in eq. 2.97, which yields: ∆~rH2 –CO 2 mH2   2T =  m  (2.101) t 1 + H2 mCO 2 ∆~rH2 –CO  = µH2 –CO (2.102) t where µH2 –CO is the reduced mass for H2 and CO, defined in the usual way. Now our expression for the total length is: Z t ! ∆~rH –CO √ 2 `Total = dτ µH2 –CO (2.103) 0 t which is readily integrated to yield: √ `Total = ∆~rH2 –CO µH2 –CO (2.104) Using the appropriate masses from table 2.1 gives µH2 –CO = 3427.5 me . We estimate ∆~rH –CO to be of order 10 a0 because the molecular dynamics trajectories 2 used to produce endpoints for the geodesics were terminated when the centers were separated by 12 a0 . Taken together, we estimate the lower bound on total kinematic length to be in the range of: √ `Total & 585 a0 me (2.105) 2.5.2 Vibrational Kinematic Length An identical analysis can be preformed for vibrational degrees of freedom. Using the expression for TVib from eq. 2.72 we have: 2T = µr˙ 2 (2.106) Again using the assumption of constant velocity yields: ∆r r˙ = (2.107) t Which we can combine with the expression for vibrational kinematic length, eq. 2.73: Z t  k∆rk √  `Vib = dτ µ (2.108) 0 t 50 The expression is readily integrated to give: √ `Vib = k∆rk µ (2.109) That this result for diatomic vibration is quite similar to the estimate we made for total kinematic length, eq. 2.104, is no accident. The previous analysis of the total kinematic length effectively treated formaldehyde as single large dissociating diatomic. Such a simplification was reasonable because we expect the internal motions to be of secondary importance relative to the overall motion of the dissociating fragments. In the specific case of H2 vibration, we have µ = 918.56 me . We posit k∆~r k—the total straight-line motion of the vibrating H2 —is of order 1 a0 and therefore estimate the vibrational length as: √ `Vib & 30 a0 me (2.110) 2.6 A search for saddles The existence of a S0 /S1 conical intersection for formaldehyde was first reported by Araujo et al. in 2008 [65] and brought to our attention by one of Harding and Kilp- penstein’s papers [36] investigating the putative “roaming” saddle. We were curious if this conical intersection would be evident in the fitted ground state potential surface for formaldehyde provided to us by Joel Bowman [54]. If it were, we would expect to find a second order saddle (a saddle with 2 “downhill” directions or 2 negative force constants) in its vicinity. The saddle does indeed exist; in this section, we detail the methods of its finding and of our unsuccessful search for the roaming saddle on the Bowman potential. We describe the characterization of the saddles by frequency analysis in Section 2.7.3. 2.6.1 Methods There a variety of methods for locating saddles of arbitrary order on potential energy surfaces. The more sophisticated schemes such as Miller’s “eigenvector following” [66] rely on inverting the Hessian matrix. This poses difficulties when working in Cartesian coordinates due to the zero-frequency modes associated with translation. These can be worked-around using projection operators [67], but complicates the matter further. 51 A method for finding saddles, which in general is bad—but in certain, limited cases may be appropriate—is to simply minimize the square of the gradient of the potential. 2 This works at all because ∇V(R) = 0 for all stationary points of a potential, V(R). This 2 is a bad idea for large, global searches because there may be local minima of ∇V(R) , e e e which interfere with locating the global extrema, all identically 0. In our case, the idea e e isn’t terrible because we expect to start with a good guess for the location of the saddle. It even turns out to work. Minimization Function & Algorithm The object of investigation is the square of the gradient of the potential function provided by Joel Bowman, but we lack an analytic form of this fit or its gradient. To compute the gradient, we proceed via finite-differencing as described in Section 2.7.2. 2 To minimize such a finite-differenced function as ∇V(R) , it is wise to avoid a method requiring a further gradient. This avoids accumulation of errors by multiple e e numeric differencing schemes. Nelder & Mead’s simplex algorithm [68] is a good, general purpose algorithm which does not require a gradient. We employ the faithful implementation of their algorithm in the GNU Scientific Library [62]. While the simplex method of minimization is performant and was effective, after the fact, I noticed the following, perhaps-useful relation. Some straightforward vector algebra gives the gradient of the square of the gradient of a function as the simple product of the gradient and the Hessian: 2 h i ∇ (∇V) = ∇ ∇V · ∇V = 2∇∇V · ∇V (2.111)   e e e e e ee e One can see that ∇∇V is the Hessian by interpreting ∇∇ as a dyad. Of course this requires estimating the Hessian at each step, which isn’t cheap. And while this is ee ee better than twice applying a finite difference, it is still the (noisier) product of two noisy (finite-differenced) objects. Points of departure To locate the second order saddle associated with Araujo’s conical intersection, we applied the simplex minimization scheme described in the previous section starting from the published coordinates (given in Table 3.2). To search for the roaming saddle, 52 we generated 105 starting coordinates by taking randomly oriented steps of random length uniformly distributed between 0 and 3 a0 away from the proposed location of the roaming saddle (given in Table 3.9) and then applied the same search to the given location and the 105 other trial points. Section 3.2 details our findings and reports on the newly located saddle. 2.7 Numerical methods As is typical of numeric calculations on machines with finite precision, comparison operations were made to within specified tolerances. In all cases the condition for equality was less than or equal to one part in 106 . Double precision (64-bit) floats were used throughout. Vector operations were performed using the BLAS linear algebra subroutines via the GNU Scientific Library [62] in the C language with double precision on an x86_64 machine. Data analysis was implemented in a mixture of C, Python (with NumPy [69]), and Haskell [70] (using the hmatrix package for vector operations). Wherever required, random numbers were produced using the MT19937 variant [71] of the Mersenne Twister algorithm implemented in the GNU Scientific Library. Parallel data processing was aided by the use of GNU Parallel [72]. A considerable amount of the source code has been placed in the Brown University Digital Repository [73]. The remainder of this section gives the details of the finite-difference calculus employed throughout the study. 2.7.1 With respect to integration Computing properties of the geodesic often requires computing integrals over functions of the kinetic energy. For example, the total kinematic length, `total , in terms of the total kinetic energy, Ttotal , is: Z t 1/2 `total = dτ 2Ttotal (2.112)  0 where τ is an arbitrary progress variable. Because of the algorithm we use to compute the geodesics (Section 2.2), we have configurations of the system, ~x, for unevenly spaced 53 {τi }. All such integrals were performed using the following Riemann sum: Z τn h i Xn–1 h i ~ ~˙ dτ f x(τ ), x(τ ) ≈ ~ ~˙ δτ i f xi , xi (2.113) τ0 i=0 d~x ≈ (~x where δτ i = τi+1 – τi and ~x˙ i = dτ i+1 – ~xi )/(τi+1 – τi ) = δ~xi /δτ i is the rate of change of the configuration space vector ~x computed via simple forward difference. The geometric picture of this simple technique corresponds to summing the distances between adjacent configuration space points with the appropriate mass weighting. While higher order techniques for integration and the required derivatives exist, we have directly verified that the errors introduced by this method are several orders of magnitude smaller than the statistical noise associated with averaging over many paths. Additionally, these “higher order” methods loose the simple geometric significance of summing over mass-weighted path lengths. Given that there is no physical content in τ , properties of the geodesics should be independent of it. In all integrals for the kinematic length, δτ i drops out. We show this T explicitly in the case of the total kinematic length. Insert Ttotal = 12 R˙ MR˙ and eq. 2.112 becomes: e e Z i1/2 T h `total = dτ R˙ MR˙ (2.114) e e where M is the diagonal mass matrix given in eq. 2.69. Expressing our integral (eq. 2.114) as a sum (via eq. 2.113) yields: n–1 i1/2 T X h `Total = δτ i R˙ MR˙ (2.115) i=0 ei ei 1/2 T  n–1 δR δR X i i = δτ i  e M e  (2.116) δτ i δτ i i=0 n–1 h i1/2 T X = δRi MδRi (2.117) i=0 e e And we are indeed left with an expression independent of τ . 2.7.2 Gradients While the potential described in Section 2.1.1 has an analytic form, extracting it from the source code would have been prohibitively tedious. We therefore treat it as a black 54 box and compute gradients numerically. Partial derivatives are computed along each of the 12 Cartesian, lab-frame coordinates defining the configuration space. We use the 5-point formula from Abramowitz and Stegun [61, eq. 25.3.6]: 1 ∂f h5   = – 8f + 8f – + O (2.118)  f f ∂xi ~x 12 –2 –1 +2 +2 where fj = f ~x + jhˆxi . We take the inter-abscissa spacing, h = 10–4 a0 .  2.7.3 Hessian matrices & vibrational frequencies While Abramowitz and Stegun give a 5th order expression for ∂ 2 F/∂x2 , the off diagonal terms are not treated with the same level of precision. David Eberly gives a clear description of finite differencing schemes and their derivation for uni- and multi-variant functions [74], which was followed to construct the approximation we use here. The key result is that repeated application of orthogonal finite differencing methods (e.g., first along x then y) give rise to expressions where the coefficients are the outer products of the coefficients for each approximation. 2  ∂ 2F 4 1 X 1,1 F  = O(h ) + C ij (2.119) ∂x∂y 144h2 ij i,j=–2 where Fi,j = F(x + ih, y + jh) and the coefficients are given by:   1 –8 0 8 –1 –8 64 0 –64 8   1,1   C = 0 0 0 0 0 (2.120)      8 –64 0 64 –8   –1 8 0 –8 1 Indexes on C1,1 are relative to the center. As hinted, the coefficients for this mixed derivative expression arise from the outer product of those for a first derivative: C1,1 = C1 ⊗ C1 where C1 = [1, –8, 0, 8, –1]. And indeed, to 5th order, e e e 2 dF 5 1 X  1 = O(h ) + C Fi dx 12h e i i=–2 55 with Fi = F(x + ih). The diagonal elements of the Hessian are taken from Abramowitz and Stegun, eq. 25.3.24 [61]: 2   ∂ 2F 4) + 1 C 2 Fi X 2 = O(h (2.121) ∂x 12h e i i=–2 with Fi = F(x + ih, y) and C2 = [–1, 16, –30, 16, –1]. To calculate frequencies, we diagonalize the mass weighted Hessian given by, e ∂ 2V 1 , k, l = 1 . . . N α, β = x, y, z (2.122) ∂rkα ∂rlβ m1/2 m1/2 k l where rkα and rlβ are the lab-frame, Cartesian coordinates of the N-atom system. The eigenvalues are the mass-weighted force constants, ki , and can be converted to frequencies in wavenumbers, ν˜i , by the relation p ki ν˜i = 2πc where c is the speed of light. For formaldehyde, there are six vibrational modes and so the six frequencies with largest magnitude are assigned to vibrational modes. We assign the lowest three to net translation and the intervening three to net rotation. We use the 4th order expressions for all elements of the Hessian matrix given in eqs. 2.119 and 2.121. A step-size of h = 7 × 10–3 was empirically determined to minimize the magnitude of translational and rotational modes (see Figure 2.6). If we were trying to be still more careful, we would choose h such that it could be represented exactly in base-2 (and therefore exactly representable on a binary computer). The implementation of these methods was verified against analytic calculations for Morse and harmonic oscillators with a variety of frequencies. 56 Switching functions? To stitch together the multi-reference and single-reference portions of their fit to the formaldehyde potential, the Bowman group employed switching functions of the form16 :     1 x≤0  w(x) = 1 – 10x3 + 15x4 – 6x5 0 < x < 1   0  1≤x where x is a dimensionless variable corresponding to the scaled distance of the farthest hydrogen from its nearest neighbor (see [54] for details). At x = 0, 1, w(x) is only differentiable to second order and so higher derivatives of V are discontinuous across the hyper-planes defined by x = 0 and x = 1. Accordingly, approximations to derivatives of V relying on such continuities (e.g. eqs. 2.119 and 2.121) will fail if these planes intersect the sampling region of the approximation, a hyper-cube of side length 4h centered on the point where the derivative is to be approximated. The regions where we compute frequencies are not expected to be near the boundaries of the switching function. If we were computing the hessian in the roaming region (perhaps in the vicinity of a roaming saddle), we would need to be more careful. In one instance, computed frequencies differed slightly (< 2.5%) from the published values (see Section 3.2.1). To rule out interference from the switching function, we repeated the calculation evaluating the elements of the Hessian by a 2nd order approxi- mation from Abramowitz and Stegun [61] and found the computed frequencies to be within 0.01% of the first. 16 See Appendix E.1 for a note on a mistake in their published version. 57 104 103 102 101 100 Vibration 10−1 Rotation Translation 10−2 −5 10 10−4 10−3 10−2 10−1 +994.85 +0.05 0.0 -0.05 -0.10 -0.15 10−5 10−4 10−3 10−2 10−1 Figure 2.6: Hessian parameters Results from computing normal modes at the second order saddle via the Hessian, eqs. 2.119 and 2.121. Shown are geometric averages of the vibrational, rotational, and translational normal modes (in wavenumbers) as a function of abscissa-spacing, h in a0 . The vertical line shows the selected value of h = 7.00 × 10–3 a0 , which is used throughout the investigation. The lower panel is a very tight zoom on the vibrational modes and shows that the vibrational modes are converged and stabilized for this value of h. Recall, from the upper panel, that the vibrational modes are grossly well-behaved over the whole range of values. 58 2.8 Computing the entropy of a sampled distribution Part of our analysis will involve computing the entropy of continuous distributions of unknown form, but from which we can draw samples. The entropy of a continuous distribution is given by [75]: Z S/kB = – dxρ(x) ln [ρ(x)] (2.123) Supposing that one had no access to the distribution, ρ(x), but could sample it, there are a variety of ways that one could imagine constructing an estimate of S/kB from the samples {xi }. One general approach is to construct a model of the distribution (via histogramming or kernel density estimation) and then to compute a numeric integral for eq. 2.123. This is attractive because it yields the intermediate representation of the distribution itself. However, these methods require tuning of parameters (bin or kernel width) and the computed entropy is sensitive to them. The estimate will also be sensitive to the particulars of the numeric integration employed. An alternative approach, which directly constructs the entropy from the samples, is given by Henri Theil [76]. Given a set of n observations of real numbers, {xi }, it is possible to construct the distribution satisfying a maximum entropy principle for the observations. That is, the distribution with maximum entropy consistent with the observations. Qualitatively, this maximum entropy distribution will be composed of many intervals of uniform density. The entropy of this maximum entropy distribution can then be had analytically as: n 2 – 2 ln 2 1 X h n  i+1 i–1 i S0 /kB = + ln x –x (2.124) n n 2 i=1 where x0 = x1 , xn+1 = xn and x1 < x2 < · · · < xi < · · · < xn is the ordered sequence (or order statistics) of the set {xi }. The method is only appropriate for samples drawn from a continuous distribution. The Theil method, along with numeric integration over histograms and kernel density estimates were compared to the analytic entropies for densely sampled Gaussian and uniform distributions of many thousands of widths. The method given in eq. 2.124 had lower relative error in all cases and is the method we use to compute the entropies of length distributions throughout. It should be noted that eq. 2.124 takes the log of 59 differences of adjacent order statistics. If any two samples are identical, the result will be divergent17 . We use double precision numbers throughout this work and with 64 bits of precision, the probability of violating the strict inequality requirement is infinitesimal. For any dimensioned distribution, ρ(x) will require normalization in the ln of eq. 2.123. The change of dimension will introduce an unknown constant offset in the entropy. This requirement is related to the reason that there is no absolute entropy scale classically—without quantized observables, there is no absolute “smallest” division and therefore no way to count up states. As a consequence, the absolute entropies themselves will not be meaningful, but differences will be. 17 Suchdivergence, of course, is consistent with the maximum entropy principle used to construct the estimate. CHAPTER 3 RESULTS AND DISCUSSION In this chapter we describe the results of our investigations into the dynamics of roaming on the formaldehyde potential energy surface. The chapter is split sharply between two different perspectives: a local view grounded in classical trajectories and stationary points, and a global view, aiming to reveal the inherent dynamics in terms of geodesics and their properties. The first section reports on our search for several stationary points of the surface. We successfully locate a previously unreported second order saddle. However, we could not locate another, previously reported stationary point—the “roaming saddle”. We discuss the implications of their respective presence and absence. We then move on to observations provided by molecular dynamics techniques, which serve as a point of departure and comparison for understanding the global information carried by geodesics. In Section 3.4, we begin our discussion of geodesics by validating some of their basic properties and showing how the information they contain aligns with our notion of inherent dynamics. The geodesics themselves provide novel insight to the roaming problem on the formaldehyde surface. Our major result has to do with the entropy asso- ciated with the each of the dynamical channels available to the fragmenting molecule. From the relative entropy of these channels we are able to predict the decline in the incidence of roaming at high energy. 60 61 3.1 An image of the formaldehyde potential To orient ourselves to the space, in Figure 3.1, we plot a slice of the potential surface along with several stationary points. We constructed this particular projection in terms of a set of Jacobi coordinates: {rH–H , rH2 –CO , rC–O , θ1 , θ2 , φ} where rH–H , rH2 –CO , and rC–O are H – H, H2 – CO, and C – O separation respectively, θ1 is the angle between the vectors ~rH–H and ~rH2 –CO , θ2 is the angle between the vectors ~rC–O and ~rH2 –CO , and φ is the out-of-plane torsional angle. To compute the image shown, we varied rH–H and rH2 –CO and fixed the other coordinates at their equilibrium values: rC–O = 2.2883 a0 , θ1 = π/2, θ2 = π, φ = 0. Note that because of our choices for the projection, some features of the surface are not visible. For instance, this particular “shadow” of the 6-dimensional object shows neither the cis nor trans isomers of HCOH even though they are present on the surface. For a similar reason, it appears that some of the locations of the stationary points do not match up with the potential. The tight transition state to molecular products, , is displaced slightly to the left of the saddle clearly visible on the surface. In two dimensions, saddles of degree two appear as the peaks of hills, and the shadow of the second order saddle we locate in Section 3.2, is in fact the hill to the right of its marker, . The Global minimum, , appears accurate to the potential contours because all coordinates are fixed to the global minimum at that point. The putative roaming saddle, 4, is located in an extremely flat region—the spacing between contours is an order of magnitude closer there than on the rest of the surface. 62 12 420 10 435 00 00 Roaming 500 Region 00 8 rH−H / a0 00 420 6 4 40000 35000 25000 2 15000 Bird’s 5000 Nest 0 0 2 4 6 8 10 12 rH2 −CO / a0 Figure 3.1: A projection of the potential surface Vertical and horizontal axes are respectively H – H and H2 – CO separation in atomic units. The circle, , is the global minimum of the surface. The square, , is the tight transition state to molecular products from the global minimum. The triangle, 4, is the location of the putative roaming saddle. The diamond, , is the newly-located saddle of degree two associated with the S0 /S1 conical intersection; see Section 3.2. The colors of the contours do not change much in the upper right quadrant; this is intentional because the potential is so flat in this region. The roaming region is defined in Section 2.3.3; the bird’s nest will be defined and discussed in Section 3.3.1. 63 3.2 Locating a 2nd order saddle on the Bowman energy surface In Section 2.6, we describe the simplex-minimization-based technique to search for saddles on the Bowman potential energy surface. Our targets are two saddles that had not been previously located on the potential: 1. We posit the existence of and search for a saddle of degree two in the vicinity of the S0 /S1 conical intersection1 first reported by Araujo and co-workers [65]. 2. We also search for the so-called “roaming saddle”, reported by Harding et al. [31], which some believe (see Section 1.2.2) governs the rates of roaming reaction in the usual fashion of saddles in transition state theory. If such a saddle is essential to roaming, we expect to find one on the Bowman surface, upon which roaming certainly manifests. We seek the “roaming saddle” because of the significance it takes in some theories of roaming (again, see Section 1.2.2). If it is not present on the Bowman surface, such a feature cannot be essential to our understanding. We search for the second order saddle because, if it exists, we expect it to form part of ridge bounding the bird’s nest (see Section 3.3.1). 3.2.1 Verification We are fortunate to have available for comparison, literature values for the properties several stationary points on the Bowman surface. The description of the surface from Zhang et al. [54] lists stable minima and first order saddles with their energies and normal frequencies. For each of the stationary points given in their paper, we confirm the location by way of the minimization scheme described in Section 2.6 and we give published and computed values for frequencies and energy. As measures of 1 Conical intersections break the degeneracy of the intersecting states along two coordinates [77]. When constructing a high-fidelity adiabatic energy surface for the lower state, the intersection may manifest as a stationary point with two imaginary frequencies (two “down hill” directions). In a 1- or 2- degree-of-freedom system, such a feature would appear as a maximum, but in our case, there are 4 other internal degrees of freedom. These will have real frequency and so the stationary point is a saddle. See [36] for more in the case of formaldehyde. 64 accuracy, we also give the magnitude of the gradient and the root sum squared atomic displacements between our initial guess and the final location, for each stationary point. We define, v u iµ 2   uX iµ ∆R = R – R0 = t R – R0 µ = {x, y, z}, i = 1 . . . N (3.1) i,µ e e where R0 is the published configuration from which the search was initiated and R is the point we located. ∆R thereby provides a measure of the distance from the location e e of the start of the search to the stationary point which was found. Table 3.1 gives the named stationary points (St.P) along with computed and pub- lished [54] values for comparison. In each pair of rows, the upper row holds the computed results and the lower row holds the published values (in parentheses). In all instances where computed frequencies differ from those published (excepting for TS3 , see caption), we suspect it is because of rounding; had we simply truncated the values, they would be identical. While the frequency deviations for the transition state to molecular products are small (< 2.5%), we were still perplexed and investigated two possible sources of error. It is possible that the choice for inter-abscissa spacing, h = 7 × 10–3 , was improper for TS3 on the potential surface. However, we discarded this possibility after constructing graphs similar to those in Figure 2.6 to test for convergence—the frequencies at TS3 were even better converged with our choice of h than for the second order saddle. It is also possible that there would be interference from the switching function (see page 56) if the hyper-volume within which the function was evaluated contained a discontinuity in the 4th derivative (as exists at the boundaries). We also dispensed with this theory after using a 2nd order approximation to evaluate the elements of the Hessian; the computed frequencies were within 0.01% of the first. We suspect that the frequencies we provide in Table 3.1 are correct for the surface and the published values are in error. 3.2.2 Search results Beginning our search from the location of the S0 /S1 conical intersection reported 2 by Araujo et al. [65] and minimizing ∇V(R) , our simplex minimization method unambiguously locates a second order saddle point. We converge such that the vertices e e Stationary Point V(R)/cm–1 ∇V(R) /(cm–1 /Å) ∆R/Å {˜ νi }/cm–1 e e e Global Min 3.36 × 10–6 2.05 × 10–3 7.62 × 10–6 1165, 1337, 1624, 1807, 2811, 2950 (0.00) (1164, 1337, 1623, 1806, 2810, 2949) trans-HOCH 17983.37 0.595 × 10–3 3.66 × 10–4 1108, 1282, 1369, 1715, 2927, 4099 (17983.37) (1108, 1281, 1369, 1714, 2927, 4098) cis-HOCH 19517.27 1.14 × 10–3 1.12 × 10–3 1221, 1343, 1440, 1609, 2741, 3839 (19517.27) (1221, 1342, 1440, 1608, 2741, 3839) TS1 27335.32 1.63 × 10–3 5.82 × 10–4 1074i, 967, 1208, 1465, 2542, 3531 (27335.33) (1074i, 968, 1208, 1465, 2541, 3531) TS2 30140.00 1.17 × 10–3 1.02 × 10–3 2068i, 726, 1260, 1454, 2599, 2993 (30140.00) (2068i, 725, 1259, 1454, 2598, 2992) TS3 30583.66 0.778 × 10–3 7.30 × 10–4 1825i, 612, 659, 1357, 1864, 2994 (30583.66) (1824i, 598, 652, 1351, 1861, 3039) 2nd order saddle 33777.20 2.42 × 10–3 0.560 541.2i, 1165.3i, 313.0, 935.9, 1862.0, 2818.5 Table 3.1: Properties of stationary points on the Bowman surface Comparison of various published [54] and computed properties (see text) of stationary points on Bowman’s surface. The first row, are our results. The second rows give are published values within parentheses. The configurations of the stationary points were published [54] in terms of angles and bond lengths to a precision of one-tenth degree and one-ten-thousandth Angstrom. TS1 is between the cis and trans isomers; TS2 is between the global minimum and the trans isomer; and TS3 is between the global minimum and molecular products, the “tight” transition state. TS3 is the only example of departure from the expected values; deviations larger than 1% are bolded. Energies all agree with published values to 6 significant figures. We give ∇V(R) as a measure of the accuracy with which we have located the stationary points; it is quite good in all cases. The location e eof the conical intersection from which we found the second order saddle was published in [65]; this stationary point is the only for which the search distance, ∆R (eq. 3.1), is non-trivial. 65 66 of the simplex are within, 10–14 a0 , of their centroid2 . At the saddle, the gradient’s magnitude is: ∇V(R) = 5.83 × 10–9 Eh /a0 = 2.42 × 10–3 cm–1 /Å. This value is consistent with our findings for other, previously published stationary points on the e e surface (see Table 3.1). The presence of the second order saddle in the vicinity of a conical intersection is inherently a multi-electronic-surface result that manifests under the use of the Born-Oppenheimer approximation in the construction of our adiabatic potential surface. It is non-trivial that the Bowman group’s fit [54] is able to capture the feature. The saddle’s energy, gradient, and frequencies are listed in Table 3.1, which shows there are indeed two imaginary frequencies. The other six normal modes, corresponding to net -translation and -rotation, all have frequency of magnitude less than 1.0 cm–1 /Å, which is of the same magnitued as the rotational and translational modes for the other, previously published stationary points. The coordinates of the second order saddle are given in Table 3.2 along with those of the conical intersection. The coordinate system is chosen such that all of the free variables are 0. We place C at the origin, O along the y axis, and orient such that HCO is in the x – y plane. The hydrogen excluded from the formyl is the farthest away from the total center of mass. Distances are given in Bohr. Coordinates for all other stationary points of interest are given in the same coordinate system in Tables 3.3–3.9. 3.2.3 No “roaming saddle” to be found Using the same methods, we searched the Bowman surface extensively for the reported “roaming saddle” [31]. We initiated searches from the proposed location [35] of the roaming saddle and perturbed its position by taking random steps in a 12-dimensional ball of up to several a0 from the starting position (see page 51). However, even in trials of 105 such searches, we were unable to converge the gradient to less than 102 cm–1 /Å, which we deemed insufficiently flat—indeed, the gradients at the other stationary points we located have magnitude 5 orders of magnitude smaller. Such searches also suffered in the following ways: • Of those points with “shallowly” sloping gradient, none unambiguously possessed 2 Themeasure of vertex displacement from centroid is an indicator of the degree of convergence of our algorithm—in this case, quite high relative to features of any molecular energy landscape. This value is consistent with those we found for the other stationary points on the surface. 67 2nd order saddle x y z ~rH(1) = 1.672984 –1.508401 3.683196 ~rO = 0.0 2.222303 0.0 ~rC = 0.0 0.0 0.0 ~rH(2) = 1.485300 –1.472777 0.0 S0 /S1 conical intersection x y z ~rH(1) = 0.935657 –1.296518 3.332712 ~rO = 0.0 2.257969 0.0 ~rC = 0.0 0.0 0.0 ~rH(2) = 1.238146 –1.679863 0.0 Table 3.2: Coordinates of second order saddle on Bowman’s surface Given first are the coordinates of the saddle, which we located located on Bowman’s surface [54]. Below, are the published coordinates [65] (rotated and translated into the same reference frame) of the conical intersection from which the search for the saddle was initiated. x y z ~rH(1) = –1.793899 –1.124608 0.0 ~rO = 0.0 2.288201 0.0 ~rC = 0.0 0.0 0.0 ~rH(2) = 1.793899 –1.124608 0.0 Table 3.3: Coordinates of global minimum on Bowman’s surface Location refined and coordinates transformed from the published configuration [54]. x y z ~rH(1) = 2.030988 –0.502804 0.0 ~rO = 0.0 2.513517 0.0 ~rC = 0.0 0.0 0.0 ~rH(2) = 1.726049 3.153715 0.0 Table 3.4: Coordinates of cis-HCOH minimum on Bowman’s surface Location refined and coordinates transformed from the published configuration [54]. 68 x y z ~rH(1) = –2.070234 –0.327514 0.0 ~rO = 0.0 2.513931 0.0 ~rC = 0.0 0.0 0.0 ~rH(2) = 1.747948 2.997965 0.0 Table 3.5: Coordinates of trans-HCOH minimum on Bowman’s surface Location refined and coordinates transformed from the published configuration [54]. x y z ~rH(1) = 0.208602 –0.724582 2.013021 ~rO = 0.0 2.510365 0.0 ~rC = 0.0 0.0 0.0 ~rH(2) = 1.516407 3.562945 0.0 Table 3.6: Coordinates of transition state for cis-trans isomerization Also on the Bowman Surface; in their papers, referred to as TS1 . Location refined and coordinates transformed from the published configuration [54]. x y z ~rH(1) = –1.893152 –0.852520 0.0 ~rO = 0.0 2.490956 0.0 ~rC = 0.0 0.0 0.0 ~rH(2) = 1.920528 1.378915 0.0 Table 3.7: Coordinates of H2 CO −−→ trans-HCOH transition state Also on the Bowman surface; intheir papers, referred to as TS2 . Location refined and coordinates transformed from the published configuration [54]. x y z ~rH(1) = 2.909270 –1.080449 0.0 ~rO = 0.0 2.212928 0.0 ~rC = 0.0 0.0 0.0 ~rH(2) = 0.638885 –1.953390 0.0 Table 3.8: Coordinates of tight transition state on Bowman’s surface In Bowman’s papers, referred to as TS3 ; location refined and coordinates transformed from the published configuration [54]. 69 x y z ~rH(1) = 2.331282 –0.475906 6.482761 ~rO = 0.0 2.235156 0.0 ~rC = 0.0 0.0 0.0 ~rH(2) = 1.720165 –1.206485 0.0 Table 3.9: Coordinates of proposed roaming saddle Coordinates transformed from the published configuration [35]; saddle first reported by Harding and co-workers [31]. No such saddle could be located on Bowman’s surface. a single imaginary vibrational frequency (most often there were 2 imaginary frequencies in the top 6 modes). • Further, the vibrational and rotational modes were frequently difficult to separate, there being 4–5 modes of similar magnitude (of order 10 cm–1 ; this was the case over several orders of magnitude of h when calculating the Hessian). It is unclear if these were indeed zero-frequency modes, which would have indicated that the algorithm had located a ridge. It is possible that more sophisticated methods such as eigen-vector following [66] would succeed in locating such a roaming saddle. It is also possible, and this is our conjecture, that the Bowman surface simply does not capture this feature. If nothing else, it is clear that roaming does not require a well-defined saddle. Attempts to define a roaming transition state3 in terms of such a feature are dubious at best. 3.3 Observations from molecular dynamics In this section, we discuss a the results of the molecular dynamics simulations described in Chapter 2. Before we begin, Figure 3.2 provides an instructive map of space we explore. Formaldehyde has 6 intramolecular degrees of freedom4 , but this particular 2- dimensional projection allows us to see distinctions between each of the three channels 3 “Transition state” used here in the sense of a minimum-flux dividing surface in configuration or phase space for an RRKM rate calculation [78]. 4 Recallthat even though there are 3 real-space coordinates for each of the 4 atoms, the potential is invariant under net rotation (about 3 axes) and translation (along 3 axes). The system therefore has 3 × 4 – 3 – 3 = 6 intramolecular degrees of freedom. 70 in which we are interested: direct dissociation, radical dissociation, and roaming- mediated dissociation to molecular products. We show three trajectories, one from each pathway. The coordinate along the abscissa is separation between the H2 center-of- mass and CO center-of-mass. Motion to the right of the figure indicates the separation of H2 from CO; one can indeed see that the direct dissociation (dotted red) and roaming (solid black) pathways terminate in the lower right, as separated molecular products. The ordinate gives the separation between the two hydrogens in the system. The radical dissociation trajectory (dashed blue) moves from reactants up the axis, indicating H – H separation, just as we expect. In addition to the trajectories, we show several actual and proposed stationary points of the surface; see caption. The roaming region depicted in the upper half of the figure is the region defined in eqs. 2.62 and 2.61. The horizontal lines follow trivially from the bounds on Hydrogen separation in eq. 2.62. The curve bounding the right side of the region comes from the force constraint (eq. 2.61). We obtain this boundary by projecting the configurations from the Monte Carlo random walk for sampling the roaming region (described in Section 2.3.3) into the rH–H –rH2 –CO space and drawing a bounding edge. The figure immediately reveals some of the key features of configuration space and distinctions of the roaming trajectories. The roaming trajectory (solid black) spends a long time with the loosely associated hydrogen far from the formyl parent. This is consistent with the findings of other workers, who observe that roaming hydrogens may spend up to several picoseconds a distance of several Angstroms from the parent [17]. To get an idea of what an extraordinarily long time this is—and to what extent the relevant timescales differin roaming—consider that the typical vibrational period of the H – H or H – C stretch is of the order of 10 femtoseconds—two orders of magnitude faster. The many wiggles in the black curve while in the roaming region are variations in H – H separation caused by H – C vibrations in the formyl! We can also see the distinction in product H2 vibrational excitation for the roaming and direct pathways. By observing the H – H oscillations once H2 has dissociated from CO (along the lower right), we see how much more vibrationally excited is the roaming H2 than the directly dissociating H2 as evidenced in the amplitude of the oscillations. Vibrationally hot H2 is in line with energy-conservation requirements for rotaionally cold CO. 71 12 10 Roaming Region 8 rH−H / a0 6 4 2 Bird’s Nest 0 0 2 4 6 8 10 12 rH2 −CO / a0 Figure 3.2: Orientation to the H2 CO surface Vertical and horizontal axes are respectively H – H and H2 – CO separation in atomic units. The circle, , is the global minimum of the surface. The square, , is the tight transition state to molecular products from the global minimum, TS3 . The triangle, 4, is the location of the putative roaming saddle. The diamond, , is the newly-located saddle of degree two associated with the S0 /S1 conical intersection; see Section 3.2. A typical directly dissociating trajectory is shown in dotted red. A typical radical dissociation trajectory is shown in dashed blue. A typical roaming trajectory is shown in solid black. In this and all other figures with axes of dimension length, the axes are measured in atomic units. The trajectories shown were computed with a microcanonical energy, Eµ = 37600 cm–1 measured from the global minimum (as will be all Eµ ). 72 Though the roaming trajectory appears to pass quite close to the proposed location of a roaming saddle (4), recall that this figure projects flat 4 degrees of freedom and the roaming trajectory may pass quite far “above” or “below” the region of configuration space where, for other potentials, a saddle is reported. To show the diversity of roaming behavior and emphasize that not all roaming trajectories pass by the proposed roaming saddle, we offer Figure 3.3, which shows six randomly selected roaming paths, 3 three of which avoid the roaming saddle’s location entirely. As an aside, this particular projection may be of general utility for visualizing roaming reactions, many of which have the same general scheme: ABC −−→ AB···C −−→ A + BC (3.2) Where ···C indicates the roaming species. In the case of formaldehyde A is CO and B and C are both H. For the case of acetaldehyde (CH3 HCO), all is the same except that C is now the methyl group, CH3 . The projection we use in Figures 3.2 and 3.3 is the A – BC center of mass separation along one axis and B – C center of mass separation along the other. We will make use of the general scheme in eq. 3.2 again when we visualize geodesics in Section 3.5. 3.3.1 Bird’s nest The region in the lower right of both of Figures 3.2 and 3.3, the “Bird’s Nest”, turns out to be key to our analysis. It is so named for the jumbled (nest like?) appearance of the traces of the trajectories around the reactant well. We defined the extent of the bird’s nest to be mutually exclusive with the roaming region, which bounds the bird’s nest from above rH–H < 5.867 a0 . The right bound is the was selected to create the smallest region which still contained the bulk of the pre-dissociation (and pre-roaming) configurations rH2 –CO < 3.420 a0 . This selection was made without regard for the stationary points identified on the surface5 , but we note that the edge of the bird’s nest as defined runs along the ridge, which connects the first order saddle corresponding to the transition state to molecular products and the second order saddle associated with the S0 /S1 conical intersection. The ridge is visible in the projection of the potential shown in Figure 3.1. We will discuss its significance to geodesic analysis in Section 3.6.1 5 In fact, we defined the bird’s nest prior to identifying the relevant stationary point. 73 12 10 Roaming Region 8 rH−H / a0 6 4 2 Bird’s Nest 0 0 2 4 6 8 10 12 rH2 −CO / a0 Figure 3.3: Six roaming trajectories To drive home the point that not all roaming trajectories pass through the neighborhood of the roaming saddle, we here show six randomly selected roaming trajectories. They show a diversity of behavior, but all have in common that they enter the roaming region and then return to dissociate to molecular products. Each trajectory was computed with a microcanonical energy, Eµ = 37600 cm–1 . 74 3.3.2 Molecular dynamics branching ratios Though our analysis is not concerned with detailed prediction of branching ratios for the H2 CO dissociation reaction, we can estimate them for the classical surface under study. Table 3.10 gives the energy-dependent results of molecular the molecular dynamics runs. The ratios are also plotted in Figures 3.4 and 3.5; the former shows only reactive trajectories and the latter shows all trajectories. Recall that these plots and figures should not be taken as representative of what formaldehyde would do in an experiment as we perform no semi-classical filtering or selection of trajectories6 , but rather of the classical dynamics of the Bowman potential surface (see Section 2.1.1). We are concerned with the dynamics of roaming—a phenomenon which manifests purely classically. Figure 3.4 tells the bulk of the story as it shows ratios normalized with respect to trajectories that succeeded in reacting. All trajectories are initiated at an energy above the radical dissociation threshold (Erad –1 µ > 33240 cm ). Even though it is accessible, the radical channel is not a dominant component of the flux until energies of 38814 cm–1 and higher, when it achieves parity with the direct dissociation channel. Of particular interest to us is the behavior of the roaming channel. Lower energies show a rocky rise in the fraction of roaming trajectories and then a decline that begins at 38814 cm–1 . We should note that above 41060 cm–1 , roaming is not observed experimentally [17]. It is likely that above this energy, the same electronic pathway that—at lower energies— allows the deposition of enormous amouts of vibrational energy into the S0 , ground state surface is no longer accessible [17]. That we use the results above 41060 cm–1 for comparison to geodesic results is sensible because we are seeking to understand the way the inherent dynammics of this surface shift with energy. All of these observations come with a caveat that for lower energies, they are biased towards fast processes. We know that each trajectory has sufficient energy to dissociate, but given that we stop integration after roughly 12 ps, not all do. We do not know the relative time scales on which reactions through each channel occur, but we certainly suspect that they are different, especially at high energies (see how much slower is roaming in Figure 3.6). 6 Ourpurely classical treatment does not discard states for violating zero-point energy or other “semi- quantum” concerns; see [58]. 75 Energy / cm–1 unsuccessful no reaction direct radical roaming 34500 184 38441 1097 260 18 35000 326 36967 1825 813 69 35500 507 34637 2943 1806 107 36000 778 31231 4533 3283 175 36223 943 29295 5304 4226 232 37000 1331 22329 8549 7415 376 37600 ∗1 ∗18253 11211 10016 519 37688 1814 15750 11340 10553 543 38000 1963 13101 12747 11590 599 38688 2279 8042 15161 13761 757 38814 2322 7359 15462 14071 786 41010 2544 993 18549 17040 874 45000 2782 24 17671 18711 812 Table 3.10: Dispositions of molecular dynamics trajectories with energy For each energy, we give the final disposition of the 40000 molecular dynamics trajec- tories we computed on the formaldehyde energy surface (details given in Sections 2.3.1 and 2.3.2). Unsuccessful trajectories were deemed insufficiently accurate for the time step at some point during integration and discarded. Trajectories are classified as de- scribed in Section 2.3.2. Except for the energy column, in units of wavenumbers, all figures give absolute number of trajectories. *: Trajectories at 37600 cm–1 were computed without properly detecting consistency failure and so some “unsuccessful” trajectories are erroneously categorized as “no reaction”. Assuming a failure rate equal to that of 37688 cm–1 , the error will skew branching ratios down by no more than 5% of the expected value. 76 100 Fraction of reactive MD trajectories Direct Radical 10−1 Roaming / mol. products Roaming / total products 10−2 35000 37500 40000 42500 45000 Energy / cm-1 Figure 3.4: Molecular dynamics channels (reactive) Final states of reactive molecular dynamics trajectories. Trajectories not reacting within 12.1 ps are excluded from the figure and the relative abundance of each channel is shown (see Figure 3.5 for all trajectories). The magenta curve, “Roaming / mol. products”, shows the fraction of molecular products that arose from roaming: roaming/(roaming + direct). The black curve gives roaming as a fraction of all products: roaming/(roaming +direct+ radical). Error bars giving the two standard-deviation estimate for the proportions plotted would be smaller than the symbols used. Vertical lines indicate energies for which geodesics were computed. 77 100 Direct Fraction of total MD trajectories Radical 10−1 No reaction Roaming 10−2 10−3 10−4 35000 37500 40000 42500 45000 Energy / cm-1 Figure 3.5: Molecular dynamics channels Disposition of molecular dynamics trajectories as a function of excitation energy. In con- trast to Figure 3.4, the figure includes trajectories that do not react (“No reaction”), but does not include those trajectories that failed for numerical reasons (see Table 3.10). Er- ror bars giving the two standard-deviation estimate for the proportions plotted would be smaller than the symbols used for all points except the roaming fraction at 34500 cm–1 , which is 4.5 · 10–4 ± 2.1 · 10–4 . The vertical lines indicate the energies for which geodesics were also computed. 78 3.3.3 An estimate of the energy-dependent incubation period We were interested in computing the times to reaction for each of the dissociation channels: direct, radical, and roaming. To determine this “incubation period”, we use a scheme similar to Houston and coworkers [58]: counting frames of the stored configurations in reverse until the criteria for dissociation no longer hold. We define our criteria in terms of the same space we visualized in Figure 3.2. For trajectories going to radical products, we take the last frame (counted in reverse) for which rH––H > 9 a0 . This corresponds to just after the last time the system exits the roaming region in terms of hydrogen separation. For trajectories going to molecular products (directly or by way of roaming) we take the last frame, again counted in reverse, for which ~rCO – ~rH > 7 a0 . 2 This is well past the moment that the trajectory exits the bird’s nest, but the delay is necessary to avoid mis-classifying roaming trajectories as already dissociated. From Figure 3.3, it can be seen that we would have made such an error for roaming trajectories if we had simply taken the limits of the bird’s nest to demark the moment of dissociation. To treat the large number of trajectories that did not dissociate, we computed the time by averaging the rate (inverse reaction time). Those trajectories that did not dissociate required effectively infinite time therefore contribute a rate of 0 to the mean. In Figure 3.6 we plot the inverse of the mean of inverses, that is the time. The incubation period for each channel at each energy is computed as: N 1 X –1 t–1 incubation = ti (3.3) N i where the {ti } are the set of N reaction times as defined in the previous paragraph. The major salient observation is that roaming is consistently a slower event than direct or radical dissociation. At present we have not been able to quantitatively connect this fact to the analysis afforded by geodesics, but it is an area for further study. A technique we tried, but found to be unsuccessful for estimating the rate of reaction is instructive in its failure. Houston and coworkers [58] estimated the total rate of formaldehyde decomposition by fitting the distribution of lifetimes, t, to the distribution: ρ(t) ∝ exp(–kt) (3.4) 79 106 Radical Direct Incubation period / h¯ /EH Roaming 105 104 103 35000 37500 40000 42500 45000 Energy / cm-1 Figure 3.6: Rates of reaction Incubation periods in atomic units for each reaction channel as a function of excitation energy. We treat non-reactive trajectories as having effectively 0 rate of reaction when contributing to the mean. 80 direct channel, 38000 cm–1 roaming channel, 36223 cm–1 Figure 3.7: Failure of statistical distribution of reaction times In both figures, the ordinate is natural log of the number of trajectories as a function of lifetime in atomic units (along the abscissa). If eq. 3.4 were a good description of the rates, then the slope of the line of best fit would be –k, the rate constant for the mechanism. The plot on the left is for the direct channel at 38000 cm–1 and shows the prediction succeeding quite well. The figure on the right is for the roaming channel at 36223 cm–1 and is showcases one of the many failure modes of the channel; this instance is particularly spectacular as it predicts a negative rate, which is nonsense. Such a distribution results from the simple assumptions that [79]: • The system is ergodic—that all states with sufficient energy will eventually react. • All vibrational states have equal probability of reacting per unit time; i.e. vibra- tional energy redistribution is fast relative to the reaction. While this method yielded credible fits at most energies for the direct and radical channels, it was not suitable for roaming, even at energies for which there were many samples available. The failure points to the non-trivial consequences roaming has for theories of unimolecular dissociation. See Figure 3.7. 81 3.3.4 Existence of high JCO roamers In formaldehyde, the spectroscopic signature that first hinted at the existence of roam- ing was the presence of a low-energy, “cold”, shoulder in the product CO rotational distribution [9]. In retrospect, this comes as no surprise. One would certainly imagine that, given the long excursion taken by the roaming hydrogen, its plunging reentry and abstraction of the other hydrogen would leave the product H2 highly vibrationally excited. By necessity of energy conservation, the CO would be less rotationally excited than if the dissociation had proceeded via the tight transition state to molecular products, which features far less dramatic hydrogen-hydrogen separation. This “cold shoulder” has been used in the identification of roaming in other systems [7, 19] and it is tempting to assume that the phenomenon and the observation are inseparably coupled. In the present case, we find just such a “cold shoulder” in the JCO distribution for products of molecular dynamics. We estimate JCO via the crude quantization: ~ LCO ≈ ¯h JCO (3.5) where ~LCO is the moment of inertia of CO about its center of mass. We then take j k JCO = ~LCO /¯h (3.6) where b·c is the floor function. This analysis is certainly sufficient as a heuristic and for order-of-magnitude observations, but not for a detailed analysis that invoked small differences. The results of such an analysis from three representative energies are shown in Figure 3.8. Our approximation represents a small deviation from full expression: ~L2 = JCO JCO + 1 ¯h  CO Defining ~L∗CO ≡ ~LCO /¯h, a better semi-classical expression would be: 1 q  JCO = ~ ∗2 LCO + 1/4 – 2 Better, but only marginally better, especially for large ~L∗CO , which we have here. Indeed for ~L∗ = 10, the fractional deviation is only 5% between our approximation and the CO exact result prior to applying the floor. A more rigorous analysis would also demand a 82 Figure 3.8: A “cold” shoulder in the JCO distribution Histogram of JCO made by combining the molecular products from three energies: 37000 cm–1 , 37600 cm–1 , and 38000 cm–1 , corresponding to a 330 nm photolysis laser. Molecular dissociation accounted for 23414 of 119997 trajectories (39999 at each energy). The black curve is symmetric about its mean and serves as a guide to the eye. The low-energy shoulder to the distribution is often taken as an indicator of roaming. We cannot tell from the above figure, but there do exist roaming trajectories giving rise to product states with JCO > 30. 83 Energy / cm–1 35000 36000 37000 38814 41010 45000 JCO ≥ 30 0.188 0.259 0.309 0.377 0.398 0.266 JCO ≥ 60 0 0.0058 0.0133 0.0115 0.0206 0.0185 Table 3.11: Fraction of roaming trajectories with high JCO The table above gives the fraction of roaming trajectories with JCO ≥ 30 and JCO ≥ 60 for several energies. For the given energies, at least 15% of the roaming trajectories hide in the symmetric, JCO ≥ 30 portion of the distribution comparable to Figure 3.8. Up to 2% of the roaming trajectories have JCO ≥ 60. sampling scheme that properly treated the quantum nature of the system; here we use JCO simply as a proxy for the magnitude of the rate of angular rotation. Seeing the low-energy component of the distribution in Figure 3.8 was one of our first global indicators that the surface supplied to us functioned as it should. When we examined the trajectories corresponding to low JCO , we observed that they manifestly roamed. Our early definitions of roaming even exploited this, calling all trajectories with JCO ≤ 15 “roaming” and all those with JCO ≥ 30 “directly dissociating”. The excluded middle was an acknowledgment of the inherent the ambiguity in the definition of roaming though was of little consequence as we were not seeking branching ratios at that stage. However, we discovered that this simple classification doesn’t tell the whole story. In Section 2.3.3, we give a definition of roaming that relies only on passage through and return from a special region of configuration space. The trajectories satisfying the our definition based on the “roaming region” do indeed “look” like roaming, but some exhibit decidedly higher JCO . As seen in Figure 3.9, there are roaming trajectories with JCO even higher than 60! See Table 3.11 for the explicit fractions. This surprising and complicating fact must be borne in mind when describing and defining roaming or we risk missing large components of the distribution of roaming dynamics. Later, we will see that roaming is indeed enabled by just such a wide distribution in features. 84 Figure 3.9: Roamers exhibiting high JCO Distribution of product JCO for roaming trajectories at all microcanonical energies, Eµ , studied. Each bin along the abscissa is a range of 5 quanta, e.g. 0 corresponds to 0–4, 20 to 20–24, etc. While slow rotation of the CO fragment has been regarded as a hallmark of roaming behavior, clearly this simplistic, energy-conservation story does not capture the range over which energy is deposited. 3.4 Geodesics are consistent with estimate In Section 2.5, we established a method to estimate geodesic total and vibrational lengths. Though we deliberately do not typically analyze geodesics over their entire length (see the discussion to come of the “bird’s nest” in Section 3.6.1), our calculations yield accurate estimates for lower-bounds on both the vibrational and total length of the entire path. In Figure 3.10, we show distributions of the total and H2 vibratonal length computed at EL = 37600 cm–1 . From eq. 2.105, we have an estimate of total length, `Total : √ `Total & 585 a0 me (3.7) and the H2 vibrational length, `H2 –Vib , as: √ `H2 –Vib & 30 a0 me (3.8) Comparing to Figure 3.10, we see that both estimates are indeed near the lower bound √ of their respective distributions. Quantitatively, `Total = 585 a0 me corresponds to the √ 0.166th percentile of ρ(`Total ) and `H2 –Vib = 30 a0 me corresponds to the 0.067th 85 ×10−3 ×10−3 Probability Density ×a0 me Probability Density ×a0 me 10 √ √ 2.5 8 2.0 1.5 6 1.0 4 0.5 2 0.0 0 0 200 400 600 800 0 40 80 120 160 √ 200 240 √ 10001200 ℓTotal / a0 me ℓH2 vib / a0 me 8 Probability Density 6 4 2 0 0.05 0.10 0.15 0.20 0.25 0.30 ℓH2 vib / ℓTotal Figure 3.10: Distribution of lengths over full geodesic paths Distributions composed of 19321 direct geodesics computed at EL = 37600 cm–1 . Lengths calculated as described in Sections 2.4.1 and 2.4.2 over the full extent of the paths. The lower-middle distribution is of the ratios of the total and vibrational length for each geodesic. The dashed, vertical lines give our (quite accurate) estimates for the lower bounds of each distribution. Note that our estimate for `H2 vib /`Total is not intended to be a lower bound as we take the low estimates for both distributions rather than a lower bound for the numerator and an upper bound for the denominator. 86 percentile of ρ(`H2 –Vib ). These extremes makes sense because we ignored many of the motions involved; modulo our physically reasonable guesses for ∆~rH –CO and 2 k∆~r k, the estimates should be lower bounds! Combining the two results allows us to estimate the vibrational fraction to be: `H2 –Vib /`Total ≈ 30/585 = 0.05 (3.9) which is also within the tail of the distribution shown in Figure 3.10, the 0.756th percentile. 3.5 Geodesics are inherent dynamics In Section 2.1.2 we defined the inherent dynamics of a system as a filtered view of its classical dynamics—revealing the essential motions stripped away of noise. A sketch of the relationship between both such dynamics is shown in Figure 3.11. We then make the case (in Section 2.2) that geodesics in the landscape ensemble are these inherent dynamics. While the group has been computing geodesics and using them to understand the inherent dynamics of systems under study for over a decade, the geodesics themselves have resisted direct visualization except in the two-degree of freedom case [48]. The systems treated in the past simply have too many degrees of freedom to allow their rendering after the fashion of Figure 3.11. Unlike simple liquids, which have many hundreds or thousands of degrees of freedom, formaldehyde has only 6—still non-trivial, but perhaps geodesics on H2 CO are amenable to visualization. The general scheme for roaming reactions given in eq. 3.2 offers three suggestive coordinates. Recall that we have ABC forming A + BC by way of roaming C. Specifically CO is A and the two H are B and C. Forming coordinates from the inter-center-of-mass distances for each of the ABC centers gives three coordinates: AB, BC, CA, which we can plot simultaneously. Figure 3.12 shows a single roaming molecular dynamics trajectory overlaid with two geodesics, one that proceeds directly from reactants to products and another that is constrained to pass through one of the points along the trajectory in the roaming region—a geodesic that roams. The degree of similarity between Figure 3.11 (a cartoon) and Figure 3.12 (real data from a physical surface) and is remarkable—the geodesics 87 Figure 3.11: Inherent dynamics pictograph We sketch the inherent dynamics (dotted black curve) of a process connecting (R)eactants to (P)roducts and show the relationship to the classical dynamics (solid red). The classical curve shows the oscillations of the molecular dynamics trajectories within the reactive tube of possible paths (solid black, bounding curves). The Inherent dynamics are the smooth, dotted path that washes out the non-essential features of the molecular dynamics. 88 clearly match our model of the inherent dynamics. The roaming geodesic, in blue, exits the bird’s nest with the trajectory and then swings towards the products. Even the direct geodesic, which is not constrained to pass through the roaming region, captures the essence of the trajectory after dissociation. This image, or one like it was the inspiration for the the naming of the “bird’s nest” (see Section 3.3.1). It is visible as the dark scribble in the center and results from pre-dissociation vibration of the molecule. While this picture is evocative, we will not dwell on images like it. As we discuss in Section 3.6.3, there is naught to be gained by restricting ourselves to dynamically correlated7 endpoints as we do here. Indeed we can more widely explore the energy landscape when freed from such constraints. Figure 3.12 is useful for confirming our notion of what geodesics represent, but the power in using geodesics lies in the way they globally sample the energy landscape and comes from the inferences we can draw by studying them in aggregate. 7 Thisdynamical correlation refers to the fact that each of the boundary configurations are connected by a single classical trajectory. 89 14 12 10 8 CA 6 4 2 0 15 10 AB 5 1 0 4 3 2 0 7 6 5 BC Figure 3.12: Geodesic as inherent dynamics of formaldehyde The black curve is a classically computed molecular dynamics trajectory, which roams; Eµ = 37600 cm–1 . The geodesics, direct (in red) and roaming (in blue), were computed as part of our previous study [1] with EL = 34928.7 cm–1 . Incidentally, this is very nearly equal to the value of the potential at the point in the roaming region. The red curve is an unconstrained geodesic between the starting and ending points of the molecular dynamics trajectory. The blue curve is the geodesic between the same points, constrained to pass through a point on the trajectory in the roaming region. The axes are defined in terms of the center of mass separations between components of the general ABC system defined in eq. 3.2. The BC axis is H – H separation and the AB and CA axes are CO – H separation for each hydrogen, B and C. 90 3.6 Observations from geodesics Having established that the geodesics we compute are reasonable in their magnitude and that, in at least in some sense, they match our mental model of inherent dynamics, we can interrogate them to understand the dissociation dynamics of formaldehyde. Recall that geodesics are global measures of the surface that do not depend on the fine details of minima and transition states; rather they give us information about the connectivity and structure of the large-scale features of the energy surface. For each of the six landscape energies we attempted to compute at least 12000 geodesics of each class (see Table 3.12). On our in-house cluster (ted), which mostly runs Intel Xeon E5440 processors at 2.83 GHz, the average time to compute a single, optimized geodesic is approximately 1 core-hour8 . Nodes have at least 512 MB of memory per core and computation is always processor- rather than memory-bound. The geodesics computed for the present study represent on the order of a quarter of a million core-hours of computational effort. Table 3.12 gives the rate of success for our algorithm in computing the different classes of geodesics with energy. The broad observation is that direct and radical geodesics are generally successfully completed at all energies. Roaming geodesics, on the other hand vary in the their rate of completion: 69–76% at low energies and not approaching 100% until the upper end of our scale. The algorithm will fail if there is no path connecting the endpoints which respects the landscape criterion. The prevalence of failures, then suggests9 that at the lower energies under study, configuration space is disconnected—and that the dynamics are not ergodic. This is a non-trivial observation: surely the dynamics are not ergodic at energies so low that the system is confined to the reactive well. However, even the lower end of our energy range is nearly 2000 cm–1 above the radical dissociation threshold, Erad –1 µ = 33240 cm . At these, relatively high, energies non-ergodicity is surprising. We must draw this inference because the landscape energy is equal to the micro- canonical energy, EL = Eµ . By choosing the landscape energy, EL , to be equal to the microcanonical energy, Eµ , we have the property that all configurations that result 8 Using more modern hardware, such as the cluster at Brown’s Center for Computation and Visualiza- tion improved this time by a rough factor of two. 9 provided that our algorithm is free from more-fundamental sources of failure 91 Energy / cm–1 direct radical roaming 35000 11998 11993 8256 36000 12000 12000 8352 37600 ∗19321 12000 8816 38814 12000 12000 9135 41010 11998 12000 11949 45000 12000 11997 11712 Table 3.12: Success rate of computed geodesics by channel and energy Here we give the number of successfully computed geodesics by channel and energy. *: For all channel/energy combinations except direct geodesics at 37600 cm–1 , exactly 12000 geodesics were attempted. For direct geodesics at 37600 cm–1 , 19321 were attempted; all were completed. from classical trajectories computed at Eµ are members of the corresponding landscape ensemble at EL = Eµ . Therefore, there cannot be regions of the surface explored by molecular dynamics or by the random walk sampling the roaming region which are energetically permitted in the landscape ensemble. This disconnection is consistent with our observation, in Section 3.6.3, that the roaming geodesics for dynamically correlated boundary points have a broader distribution than those that are not. The disconnect also points back to the failure of a simple statistical theory to capture the rate of reaction for roaming events in Section 3.3.3; indeed one of the required assumptions for the statistical theory was a connected, ergodic space. 3.6.1 Implications of the bird’s nest The next property of the geodesics we will examine is their length. This is natural as geodesics are defined in terms of a minimization condition on their length and the length gives information about the relative amounts of motion involved in the different channels. Throughout this chapter, we’ve been hinting at the significance of the region about the reactive well, which we call the bird’s nest. One of the striking observations revealed by Figure 3.2 is how different are trajectories from each channel. All go their own way and have little overlap outside of the region labeled bird’s nest. That the trajectories spend much time together in this tangle of pathways (resembling, to the author’s eyes, the nest of a bird) is key to our analysis. 92 By construction, the geodesics for different channels go to manifestly different regions of configuration space—indeed we demand that roaming geodesics detour before heading towards molecular products. It would then be no surprise if the lengths were different. To make a just comparison, we therefore confine our attention to a special region of overlap—the bird’s nest. We see that the system oscillates there, about the global minimum, for some time no matter the products or route taken to them. This is the incubation period during which energy is redistributed among the different degrees of freedom and the ultimate trajectory of the system is determined. The bird’s nest provides a region where we can ask the same questions of different classes of paths and make equitable comparison between them. This method does not require us to identify a reaction coordinate, but only to locate a region where all trajectories spend significant time before diverging. To use the bird’s nest when analyzing geodesics, we identify as tb the moment when a geodesic first exits the bird’s nest and then measure the length (or other property) of the geodesic only until departure. For example, we modify the form of the total length as given in eq. 2.67 to be: Z t b `Total = dτ [2 · T(τ )]1/2 (3.10) 0 where the limits of integration have changed to only include the portion of the geodesic in the bird’s nest. We make similar modification to our expressions for the component kinematic lengths (eqs. 2.73 and 2.88) and for the boundary fraction (eqs. 2.90 and 3.11). Given that trajectories for each channel overlap strongly in the bird’s nest, there is no reason so expect the geodesics for different channels to have markedly different properties there. Any differences we do observe then, should be attributed to the unique dynamical features associated with each pathway. 3.6.2 Key observations at a single energy We can use geodesics to understand the differences between the dynamical pathways available to the formaldehyde molecule as it dissociates. The distributions of total lengths of geodesics give us information about the relative distances the system has to traverse to transform from one species into another. By partitioning the kinetic energy into other, physically interesting degrees of freedom such as diatomic vibration or 93 rotation we can monitor the role that those motions have in the reaction as well. For example, the distribution of `H2 vib. along geodesics gives an indication of the amount of H2 stretching required within the bird’s nest along this minimum-motion path. We will also examine the amount of time spent encountering obstacles, the boundary fraction, which indicates how constrained are the paths taken by geodesics. We will come to the energy-dependent properties of geodesics in Section 3.6.4, but it is instructive and will allow us to orient our thinking, to first consider a single, extremal landscape energy. In this section, we take up geodesics10 with EL = 35000 cm–1 , an energy soon after the onset of roaming in formaldehyde to see what they can tell us about how and why roaming occurs. Broad roaming distribution in length at low energy Figure 3.13 serves as our principle point of departure, showing the distributions of total kinematic lengths lengths in the bird’s nest for each reaction channel. Here, even when confined exclusively to the bird’s nest, we see a striking difference between the roaming paths and either the direct or radical paths. While the distributions for direct and radical paths are highly peaked, the roaming distribution is smeared across the abscissa. This broad distribution of minimal transit lengths points to a plurality of routes to the roaming region. We are able to take a more detailed look at the distributions shown in Figure 3.13 by examining the component kinematic lengths in Figure 3.14. The internal degrees of freedom plotted here show the same motif played out in each partitioning of the kinetic energy. Direct and radical path-length distributions are sharply peaked near zero and the roaming path distributions are considerably broader with longer tails. These component length distributions do not give us information about the relative product yields in formaldehyde photodissociation. Rather, the component length distributions yield information about the extent to which kinetic energy is focused in into different degrees of freedom during the incubation period prior to dissociation. For example, it is remarkable that there is more H2 stretching for geodesics heading to 10 Though the distributions we examine in this chapter are quite similar to those in our paper [1], they are computed with a single landscape and geodesics energy, EL = Eµ = 35000 cm–1 . In that paper, the landscape energy was lower than the microcanonical excitation energy, EL = 34928.7 cm–1 < Eµ = 37000 cm–1 . 94 Probability Density × a0 me × 103 12 Radical 10 Direct √ 8 Roaming 6 4 2 0 150 300 450 600 750 √ 35000 cm-1 : Length / a0 me Figure 3.13: Total length distributions Total kinematic length probability distributions for geodesics computed at EL = 35000 cm–1 within the bird’s nest. We have no reason to suspect that, when confined to this region, the geodesics for each channel would be different from each other. However, the distributions for geodesics directly between reactants and molecular products (direct) and from reactants to radical products (radical) are substantively more peaked than for those that go to molecular products by way of the roaming region (roaming). Though they appear nearly indistinguishable to those distributions in our recent paper [1] (where we used EL = 34928.7 cm–1 unequal to Eµ = 37000 cm–1 ) the distributions in the figure were calculated with landscape and microcanonical energies equal, EL = Eµ = 35000 cm–1 . 95 Probability Density × a0 me × 103 15 CO CO Rotation Vibration 10 √ 5 0 H2 H2 15 Rotation Vibration 10 5 0 0 100 200 300 0 100 200 √ 300 35000 cm-1 : Length / a0 me Figure 3.14: Component length distributions Probability distributions for component kinematic lengths computed at EL = 35000 cm–1 . The figure provides a more detailed look at Figure 3.13. As there, direct geodesics are in dotted red, radical geodesics are in dashed blue, and roaming geodesics are in solid black. Again, the geodesics are only measured while in the bird’s nest around the reactive well. As an aside, even though the chemically distinct species indicated by our labels (H2 , CO) do not meaningfully exist, we can still measure motion in the corresponding degrees of freedom, e.g.: H – H stretching or change in C – O orien- tation. As in Figure 3.13 for total kinematic length, the roaming geodesics have a much broader range of motions available to them. Though they appear nearly indistinguish- able to those distributions in our recent paper [1] (where we used EL = 34928.7 cm–1 unequal to Eµ = 37000 cm–1 ) the distributions in the figure were calculated with landscape and microcanonical energies equal, EL = Eµ = 35000 cm–1 . 96 the roaming region than to radical products. This suggests that roaming is not simply aborted radical dissociation. Even though geodesics cannot see into the future, they reveal sharp distinction between the bird’s-nest-dynamics for the roaming channel and for direct or radical dissociation. The distributions in Figures 3.14 and 3.13 indicate that of the three channels, only roaming admits a wide array of partitionings of kinetic energy. In contrast, the direct and radical channels have narrowly defined requirements on the pre-dissociation distribution of kinetic energy. Boundary fraction We can understand why the roaming channel has this broad distribution of kinetic energies through distribution of boundary fractions, hΘi. Recall, from eq. 2.90 in Section 2.4.4 that we define the boundary fraction as: R tb dτ [2TTotal ]1/2 Θ[R(τ ); EL ] hΘi = 0 (3.11) 1/2 Rt 0 dτ [2TTotal ] e where tb is the moment of exit from the bird’s nest and where Θ[R; EL ] is finite for the portions of the geodesic, R(τ ) encountering an obstacle: e e  1 E – V(R) < δE L L Θ[R; EL ] = e e 0 otherwise We take the fractional deviation from the landscape energy, δ = 10–4 ; reducing δ to 10–5 produces no changes to the distribution. In Figure 3.15 we find another stark difference between roaming and the direct or radical pathways. All three classes of pathway have a peak in the density at zero; i.e. for each channel, there are at least some geodesics that escape from the bird’s nest via the straight-line, no-obstacles path. However, is much more pronounced for direct and radical paths than for roaming. Of the radical distribution, 56% of the density is identically 0, as is 74% for direct dissociation11 . In contrast, less than 14% of the roaming geodesics escape the bird’s nest without encountering an obstacle. Aside from this binary observation, we see that the distribution for roaming geodesics is heavily 11 See Table 3.15 for these figures as a function of landscape energy. 97 3.0 Direct 2.5 Roaming Probability Density 2.0 Radical 1.5 1.0 0.5 0.0 0.25 0.50 0.75 1.00 35000 cm-1 : Boundary Fraction, hΘi Figure 3.15: Boundary fraction Probability distributions for the fraction of length spent tracing boundaries of the potential surface within the bird’s nest for geodesics computed at EL = 35000 cm–1 . The distributions for all three channels include some geodesics that make no contact with obstacles before exiting the bird’s nest. This delta function contribution is shown here with finite width because of our binning. In contrast to direct and radical geodesics, the majority of the density for roaming geodesics is finite, and skewed towards contacting obstacles half of the time. Indeed the mean boundary fraction for roaming geodesics is 0.45, whereas for radical and direct geodesics the figures are respectively 0.15 and 0.057 (see Table 3.16). Though they appear nearly indistinguishable to those distributions in our recent paper [1] (where we used EL = 34928.7 cm–1 unequal to Eµ = 37000 cm–1 ) the distributions in the figure were calculated with landscape and microcanonical energies equal, EL = Eµ = 35000 cm–1 . 98 weighted towards higher boundary fraction: the most probable being hΘi = 0.70±0.05. Winding around obstacles in this way is certainly consistent with the longer paths evidenced in Figure 3.13. It is important to note that the obstacles encountered by roaming geodesics are regions of quite high potential energy; they must be of the order of the landscape energy, which, at 35000 cm–1 , is already larger than the radical dissociation threshold. Taken together, the observation of broad length distributions and obstacle-lined pathways to roaming offer a picture of the potential energy landscape where roaming occurs. By comparison to radical or direct dissociation, the paths are tortured and constrained. These obstacles give rise to a broad, unfocused distribution kinematic lengths. The distinction between paths leading to direct or radical dissociation and those leading to roaming is how much more narrowly constrained are the former than the wandering latter. The relative freedom afforded roaming paths at this energy points to an entropic advantage for the channel. We will explore this notion of an entropic advantage quantitatively in Section 3.6.5. 3.6.3 The null effect of dynamical endpoint correlation The results in this section were computed in conjunction with our earlier paper [1] and correspond to a different set of geodesics than the rest described in this chapter. The major difference is that, unlike everywhere else, the microcanonical and landscape energies are not the same. Rather the landscape energy, EL = 34928.7 cm–1 , is considerably lower than the microcanonical energy, Eµ = 37600 cm–1 . In that paper [1] and for this subsection, we are nevertheless able to draw useful conclusions about the nature of geodesics on the formaldehyde energy surface. If, in this section, the numbers of molecular dynamics trajectories or computed geodesics seem unfamiliar, this is also why. We compute geodesics as described in Section 2.3.4: from reactants to molecular and radical products, perhaps by way of the roaming region. In all cases, we uses endpoints that are not dynamically correlated—that are not connected by classical trajectories12 . We do this to increase the number of geodesics we can compute from a 12 Whileall the boundary- (and intermediate roaming-) points are computed via classical trajectories, the boundaries for a given geodesic do not necessarily come from the same classical trajectory. 99 unpaired paired 92 ± 36 (49,060) 94 ± 36 (5,965) direct 95 ± 37 (1,491) roaming 358 ± 146 (36,608) 224 ± 117 (275) Table 3.13: Comparison to dynamically correlated (paired) geodesics Total lengths are reported in units of a0 m1/2 e as mean ± standard-deviation with the number of samples (in parentheses). In keeping with the rest of our analysis, lengths are computed only within the bird’s nest. given number of trajectories (N trajectories yield N2 /2 endpoint pairs). To verify that that our results are unaffected by this randomization, we also compute direct and roaming geodesics between “paired” endpoints—configuration space points linked by a single molecular dynamics trajectory. For direct geodesics we select 6,000 of the 7,595 directly dissociating trajectories13 at Eµ = 37600 cm–1 , and attempt to compute geodesics between their starting and ending configurations. Our algorithm successfully locates geodesics for 5,965 of them. In the case of roaming trajectories— which are rare events—the requirement of dynamical connection drastically reduces our sampling statistics14 . Of the 381 roaming trajectories13 at Eµ = 37600 cm–1 , 292 have points in the roaming region that also satisfy the landscape criterion, EL = 34928.7 cm–1 . For each roaming trajectory, we randomly select a single configuration in the roaming region and satisfying the landscape criterion and then pair this with the starting and ending configurations of the trajectory. The results of theses calculations are summarized in Table 3.13 in the column for paired geodesics. To determine if the breadth of the paired, direct distribution is an artifact of our statistics, we select a random subset of 1/4 of the geodesics and repeat the measurement. If the breadth is statistical, we expect the standard deviation to double. However, we find that the mean and variance are essentially identical, suggesting that the origin of the width is indeed physical. Our primary finding is that dynamical correlation has no impact on the distribution 13 Thenumber of trajectories is different than that reported in Table 3.10 because this is a different set of molecular dynamics trajectories; see above. 14 And for this reason do we go through all the trouble described in Section 2.3.3 to sample the roaming region via a random walk. 100 of total length for direct geodesics. The data suggest that there is perhaps a modest impact for the distribution of total lengths for roaming geodesics. That the dynamically correlated (paired) endpoints produce a broader distribution of roaming lengths than the unpaired endpoints may be tied to in the degree to which the space is connected (see Section 3.6). Our algorithm successfully finds roaming geodesics between uncorrelated endpoints less often than for correlated endpoints. The success rate for paired geodesics here is 94%, while we would expect closer to 70% for the same energy with unpaired endpoints (see Table 3.12). It is likely that all sub-spaces of the roaming region are not uniformly connected. The possibility of disconnect aside, qualitatively, we see that the distribution of lengths of roaming geodesics is much broader than for direct geodesics even when computed between dynamically linked endpoints. This is the essential observation upon which our analyses rest and provided that we are consistent in our methods, we expect the impacts of dynamical correlation to be minimal. 3.6.4 Trends with increasing landscape energy At low energies, we observe that the geodesics corresponding to roaming pathways have a broad distribution of kinematic lengths. Direct and radical geodesics seem much more narrowly constrained and have sharply peaked distributions in kinematic length. We conclude then that roaming has an entropic advantage over the other dissociative channels at low energy. In this section we explore what happens as energy increases. Figure 3.16 gives the distributions for the total kinematic length for geodesics in the bird’s nest at the upper end of our energy scale, 45000 cm–1 (in analogy to Figure 3.13, which shows low-energy results). The roaming distribution at high energy could not be more changed from its low-energy form. It has narrowed and the mean kinematic length has shortened considerably. The distribution of kinematic lengths for roaming geodesics at 45000 cm–1 more closely resembles the radical distribution at all other sampled energies than it resembles the roaming distribution at 35000 cm–1 . It seems that the singular aspect of the roaming pathway has diminished at this high energy. In Figure 3.17, we see distributions of kinematic length for each of the three channels at all of the energies under study. The distributions for energies intervening between 35000 cm–1 and 45000 cm–1 effectively interpolate between the extremal distributions. 101 Probability Density × a0 me × 103 12 Roaming 10 Radical √ 8 Direct 6 4 2 0 150 300 450 600 750 √ 45000 cm-1 : Length / a0 me Figure 3.16: Total length distributions in the high-energy limit Probability distributions for total kinematic lengths of geodesics calculated at EL = 45000 cm–1 within the bird’s nest. The figure is a high-energy analog of Figure 3.13, which was for EL = 35000 cm–1 . In stark contrast to the low-energy figure, here we see the distribution of roaming kinematic collapsed towards those of radical and direct dissociation. The primary observation is that with increase in landscape energy, the roaming distribu- tions of lengths converge towards the distribution of radical lengths. The distributions for the radical and direct pathways are largely unchanged. It appears as though the entropic advantage possessed by the roaming channel at low energy is lost as the energy increases. The information in the figure is recapitulated in Table 3.14, which gives the mean and standard deviation of the distributions by energy and channel. Though it is not easy to discern from Figure 3.17, the tabulated lengths show that the mean length of the direct geodesics does indeed decrease with rising landscape energy. In the component length distributions (Figure 3.18), we see much the same high 102 Figure 3.17: Convergence of total length distributions Total kinematic length distributions as a function of energy for geodesics while confined to the bird’s nest. The distributions of lengths of direct geodesics are shown in dotted red. The dashed blue curves give the radical distributions. Roaming distributions are the black fields with solid, yellow bounds. While the direct and radical distributions are relatively static with increasing energy, the roaming distribution undergoes a dramatic shift towards a narrower distribution with smaller mean. The collapse towards the radical length distribution signals that the roaming channel no longer has the entropic advantage it did at lower energies. 103 Energy / cm–1 direct radical roaming 35000 95.79 ± 38.47 149.2 ± 79.05 300.0 ± 132.4 36000 93.37 ± 39.35 148.9 ± 78.24 283.2 ± 131.1 37600 86.29 ± 36.49 141.8 ± 73.58 283.1 ± 133.7 38814 89.89 ± 39.21 144.1 ± 73.81 273.7 ± 135.3 41010 89.80 ± 40.18 141.2 ± 72.64 220.2 ± 124.7 45000 86.64 ± 38.83 126.9 ± 69.56 158.1 ± 91.4 Table 3.14: Mean total kinematic length for geodesics by channel and energy We give the mean and standard deviation for all the distributions of total kinematic length given in Figures 3.17 (and 3.13 and 3.13). As there, these are lengths for the geodesics while in the bird’s nest. Kinematic lengths are reported as mean ± standard- deviation in units of a0 m1/2 e . From the table as from the figures, we see the pronounced contraction of the distribution of roaming kinetic lengths with increasing energy. energy trend. The roaming distributions have moved toward the radical distribution. For the H2 degrees of freedom, the roaming and radical distributions are nearly coincident. While the overlap for the CO degrees of freedom is less pronounced, comparison with the distribution at 35000 cm–1 , (Figure 3.14) shows that the reorganization is still quite dramatic. The picture coming together is that at high energies, roaming dynamics during the predissociation incubation period mirror those of radical dissociation, particularly in the terms of the motions of the hydrogen atoms. We can again look to the distribution of boundary fractions to understand the way that the energy surface influences these dynamics. Figure 3.19 shows the high energy result in detail and Figure 3.20 shows all the energies in our study. At 45000 cm–1 , the distribution of roaming boundary fractions shows marked depression, a trend also visible in Table 3.16. From the figures and from Table 3.15 we also see that much of the density has moved into the delta peak at hΘi = 0. The roaming boundary fraction density at high energy, most closely resembles the distribution for the radical boundary fraction at 35000 cm–1 (Figure 3.15). The large obstacles that broadened the roaming length distributions at lower energies are gone and have lost their influence. 104 Probability Density × a0 me × 103 15 CO CO Rotation Vibration 10 √ 5 0 H2 H2 15 Rotation Vibration 10 5 0 0 100 200 300 0 100 200 √ 300 45000 cm-1 : Length / a0 me Figure 3.18: Component length distributions in the high-energy limit Offering a more detailed view of Figure 3.16, we show the component kinematic lengths for geodesics computed at EL = 45000 cm–1 in the bird’s nest. Direct geodesics are in dotted red, radical geodesics are in dashed blue, and roaming geodesics are in solid black. Comparing the roaming distributions above to their low-energy analogs in Figure 3.14, we see the same dramatic change that was evident in the distribution of total kinematic lengths in Figure 3.17. Here, as in Figure 3.16, the roaming distributions now bear striking similarity (rather than difference) to the radical distribution. 105 3.0 Direct 2.5 Radical Probability Density 2.0 Roaming 1.5 1.0 0.5 0.0 0.25 0.50 0.75 1.00 45000 cm-1 : Boundary Fraction, hΘi Figure 3.19: Boundary fraction in the high-energy limit Probability distributions for the fraction of length spent tracing boundaries of the potential surface within the bird’s nest for geodesics computed at EL = 45000 cm–1 . In this high-energy limit, the contrast evident in Figure 3.15, between the roaming distribution and those of the direct and radical channels, has faded. The fraction of each distribution contributing to the delta function at hΘi = 0 has grown for all channels, but by the largest amount for the roaming channel. The portion of the density identically 0 for roaming geodesics has climbed to 0.48; for radical and direct geodesics the figures are respectively 0.76 and 0.84 (see Table 3.15). 106 Figure 3.20: Convergence of distribution of boundary fractions Boundary fraction distributions as a function of energy for geodesics while confined to the bird’s nest. The distributions for direct geodesics are shown in dotted red. The dashed blue curves give the radical distributions. Roaming distributions are the black fields with solid, yellow bounds. Similarly to the total length distributions shown in Figure 3.17, the direct and radical distributions change little as energy increases, but the roaming distribution essentially inverts it’s center: over the energy range we study, the mean boundary fraction for roaming shifts from hΘi = 0.45 at EL = 35000 cm–1 to hΘi = 0.17 at EL = 45000 cm–1 . At higher landscape energies, there is a paucity of large obstacles for the roaming geodesics to impact as they make their way to the roaming region. The high-energy landscape encountered by roaming paths is more similar to that of the radical and direct paths than it is different. 107 Energy / cm–1 direct radical roaming 35000 0.741 0.556 0.134 36000 0.766 0.540 0.177 37600 0.828 0.587 0.182 38814 0.799 0.593 0.210 41010 0.810 0.640 0.293 45000 0.843 0.758 0.476 Table 3.15: Portion of identically zero boundary fraction vs. energy As shown in Figures 3.15, 3.19, and 3.20, all of the distributions of boundary fraction, hΘi, have some non-trivial portion of the density identically 0. That is, at all energies, for all channels, some geodesics exit the bird’s nest without encountering an obstacle. The table above gives the proportion of each distribution that makes up the delta function contribution at hΘi = 0 by channel and energy. Energy / cm–1 direct radical roaming 35000 0.0572 0.149 0.446 36000 0.0539 0.155 0.391 37600 0.0383 0.134 0.375 38814 0.0450 0.131 0.351 41010 0.0438 0.111 0.284 45000 0.0369 0.0707 0.165 Table 3.16: Mean boundary fraction vs. energy For the boundary fractions shown in Figures 3.15, 3.19, and 3.20, we give the means of the distributions as a function of energy by channel. Evident are modest and barely perceptible shifts for the radical and direct channels as well as the transformation in the boundary fraction distribution for roaming geodesics. 108 Energy / cm–1 Sdirect /kB Sradical /kB Sroaming /kB 35000 4.6448 5.3179 5.9962 36000 4.6744 5.3312 5.9507 37600 4.6439 5.2765 5.9977 38814 4.7565 5.2986 5.9996 41010 4.7763 5.2935 5.8328 45000 4.7739 5.2168 5.4787 Table 3.17: Entropy of total length distributions vs. energy 3.6.5 Convergence of entropy of distributions with energy We suspect that the broad, roaming distributions in kinematic length are of higher entropy than the sharply peaked distributions for the direct or radical channels. Recall that the entropy of a continuous distribution, ρ(`), is given by (eq. 2.123): Z S/kB = – d`ρ(`) log [ρ(`)] (3.12) Recall also, for dimensioned ρ(`), there is always an undetermined and constant off- set, which render the absolute magnitudes of entropy without meaning, though the differences do convey information. Following the method in Section 2.8 to estimate the entropy of a sampled distribution, we compute and compare the entropies of the length distributions as a function of energy. Figures 3.21 and 3.22 give the results of such analysis for total length and component distributions within the bird’s nest respectively; Table 3.17 gives the numeric results for the total lengths. The qualitative observations of the previous section are largely borne out quantita- tively. Firstly, the roaming distributions for total length, have entropy of substantively larger absolute magnitude than the distributions for direct or radical dissociation. This is consistent with our notion of roaming having some kind of entropic advantage. We also see that the roaming channel has the greatest shift in entropy. Over the 10000 cm–1 that span our energies of study, roaming entropy drops by roughly 0.5 entropy units. Over the same energy window, the direct and radical channels have much smaller change: on the order of 0.1 units. The magnitude of the change for the roaming channel is substantial. We also see the roaming entropy converging towards that of the radical distribution for total length. In the component distributions for the CO and H2 degrees 109 of freedom, we see much the same story with higher roaming entropy, and entropies for the roaming channel converging towards that of the radical channel. 110 6.00 Roaming 5.75 5.50 Radical S/kB 5.25 5.00 4.75 Direct 35000 37500 40000 42500 45000 Energy / cm-1 Figure 3.21: Entropy of total length distributions Absolute magnitude of entropy of total kinematic length distributions of geodesics in the bird’s nest. Here we see that the entropy of the roaming distributions is greater than for either of the other channels. We also see that change in magnitude is much larger for roaming than for the direct pathway or the radical paths. As the entropy of the roaming distribution decreases with energy, we see that the entropy of the roaming distribution does indeed approach that of the radical distribution. 111 6 5 4 3 2 CO Rotation CO Vibration S/kB 6 5 4 H2 Rotation 3 2 H2 Vibration 35 40 45 35 40 45 3 EL / 10 cm -1 Figure 3.22: Entropy of component length distributions Entropies for distributions of component lengths for the three channels while in the bird’s nest: roaming in solid black, radical in dashed blue, and direct in dotted red. Distributions are formed for the lengths, `α as described in Sections 2.4.2 and 2.4.3. Again, we use the component kinematic lengths to provide more detail to the picture from the total kinematic length. As in Figure 3.21, we see that the absolute magnitude of the entropy of the roaming distributions is higher than for geodesics for either the direct or radical channels. We also see that in the CO as well as H2 degrees of freedom, entropy of the roaming distributions approaching that of the radical channel. The distinction between the CO and H2 degrees of freedom is particularly of note—such a divergence is usually not present. The CO entropies tell much the same story as the entropy of the total distributions. However, in the H2 components, the direct pathway is widely separated from the radical and roaming channels. We also see decline in the H2 entropies for the direct pathway at low energies before leveling off; roaming entropies, in contrast, are roughly constant at low energies before declining at higher energies. 112 3.7 Roaming as an entropically mediated phenomenon The foregoing observations, based on the statistical properties of geodesics on the formaldehyde energy surface lead us to conclude that, at least in formaldehyde, roaming is an entropically mediated phenomenon. At lower energies, below 38814 cm–1 , dynamical effects dominate—the competition between the radical and direct channels is evident in this energy range in Figure 3.4. Once the dynamical effects have stabilized, statisitical effects become dominant. At energies of 38814 cm–1 and highter, the branghing ratios for the direct and radical channels are roughly constant, but roaming begins a decline, roughly halving in fre- quency up to 45000 cm–1 . Just as the incidence of roaming drops from 38814 cm–1 , so too does the entropy of the distribution of lengths of roaming geodesics decline; see Figure 3.21. The entropy of distributions of geodesic lengths in the bird’s nest predict the direction of change in the branching ratio to roaming. This is a remarkable result. Even though the geodesics have no knowledge of the future course the system will take, their distribution through the bird’s nest reveals the relative change in frequency of roaming. CHAPTER 4 CONCLUDING REMARKS At the outset, our goal in this work was to contribute to the understanding of the story of roaming in formaldehyde. We take the perspective that such understanding means apprehending the dynamical peculiarities that give rise to roaming and comes from characterizing the inherent dynamics of the formaldehyde system. The geodesic formalism enables us to globally probe the potential energy landscape of formaldehyde. Through this window, we see that roaming dynamics are of greater entropy—are far less constrained—than the pathways giving rise to direct or radical dissociation. Our observation may shine some light on why roaming seems to be prevalent in many systems. From the landscape perspective, all that is required for roaming-like (broadly distributed) geodesics is a sufficiently flat, asymptotic region of the potential in a system of high enough dimensionality that arriving there is an entropic (navigational) challenge. It is then not at all surprising that roaming is a recurrent phenomenon. 4.1 New methods, new observations We have provided evidence that roaming in formaldehyde is an entropically mediated mechanism. The roaming system has a far greater range of dynamical options than when dissociating via either the direct pathway to molecular products or by the radical channel. This entropic advantage correlates with the incidence of roaming in formaldehyde. Indeed, as the entropy difference between distributions of geodesic lengths for the radical and roaming channels decreases with energy, the frequency of roaming in the 113 114 system also decreases. In our work, we also expand the range of dynamical systems to be treated with geodesics to include small molecules. We derive highly accurate lower bounds on the lengths of these small-molecule geodesics. We also visualize geodesic paths such that their correspondence to the inherent dynamics is clear. Our study is the first time geodesics have been used in a system with heterogeneous masses. We confront and resolve the attendant problems associated with center of mass conservation and present two other general methods for doing so. 4.2 Future directions Clearly the first angle of attack for extending our work involves quantitatively connecting the entropic relationships we have established to other observables. It may be that the mutual information between the distributions of roaming, radical, and direct geodesic lengths can be tied to the relative incubation periods or branching ratios for each channel. Such a connection would be of interest, not just to those theorists and experimentalists desiring to calculate these properties, but also to those wishing to understand the ways that geodesics carry information about the dynamics of small-molecule systems. The theory of large deviations [80] or even ideas of surprisal analysis [81] may be fruitful sources of direction. Another important avenue of inquiry is repeating some of this work in other roaming systems. Acetaldehyde, CH3 HCO, is an obvious target because of its structural similarity to formaldehyde and because a global potential surface has already been developed [33]. As an aside, such a potential is not required as our methods are amenable to “on-the-fly” calculations of potentials and gradients from first principles. While the notion of roaming as aborted radical dissociation is not useful for un- derstanding the dynamical influences at play, it does make a useful point about the energetics involved: roaming fragments have insufficient translational energy to es- cape the parent. Surely, during the incubation period prior to dissociation, energy is distributed throughout different degrees of freedom of formaldehyde. This internal energy redistribution is part of what controls whether or not the system will roam. By centering our analysis in the bird’s nest, we have focused our attention squarely on 115 this critical period. One might wonder what geodesics can say about the organization of energy flow during this time. Even though geodesics are static objects and exist in configuration-space, where there is no notion of time or speed, it may be that changes in their direction are correlated with energy transfer. Some hope for this thought is provided by Miller’s work on the Reaction Path Hamiltonian [67], where the curvature of an intrinsic reaction coordinate appears as a term in the kinetic energy. In recent years, there has been some hand-wringing [82, 83] over the notion of “post transition state bifurcations”—instances where multiple, chemically distinct products arise after traversing what can be identified only as a single transition state. The authors are surely sensible in pointing out that dynamical factors and not just potential barriers and basins are influencing the outcomes of these reactions. However, they may be premature in suggesting that the apparent futility of studying the structure of the surface leaves us no tools other than trajectory simulations. As we have seen, even though individual geodesics cannot forecast the future, by globally characterizing the H2 CO surface and analyzing their properties while confined to the bird’s nest we are still able to discriminate between the reactive channels, which are yet-to-be-chosen. The entire history and future of a classical trajectory are specified by a single point in phase space and knowledge of the potential. Because geodesics offer a global perspective on that potential, they may be well positioned to shed light on what happens after a transition state by looking before it. Bibliography [1] D. V. Cofer-Shabica, R. M. Stratt. What is special about how roaming chemical reactions traverse their potential surfaces? Differences in geodesic paths between roaming and non-roaming events. The Journal of Chemical Physics 2017, 146, 214303. [2] J. M. Budarz, M. P. Minitti, D. V. Cofer-Shabica, B. Stankus, A. Kirrander, J. B. Hastings, P. M. Weber. Observation of femtosecond molecular dynamics via pump-probe gas phase x-ray scattering. Journal of Physics B: Atomic Molecular and Optical Physics 2016, 49. [3] D. V. Cofer-Shabica, R. M. Stratt, The geometries of potential energy landscapes imply dynamical signatures for roaming reactions, American Chemical Society, 250th National Meeting, Boston, MA, 2015, PHYS 554. [4] M. P. Minitti, J. M. Budarz, A. Kirrander, J. Robinson, T. J. Lane, D. Ratner, K. Saita, T. Northey, B. Stankus, V. Cofer-Shabica, J. Hastings, P. M. Weber. Toward structural femtosecond chemical dynamics: Imaging chemistry in space and time. Faraday Discussions 2014, 171, 81–91. [5] D. V. Cofer-Shabica, Wandering Molecules, Brown University, Research Mat- ters, Providence, RI, 2014. Invited Talk. https://www.youtube.com/watch?v= X3xyMP9EAco. [6] D. Townsend, S. A. Lahankar, S. K. Lee, S. D. Chambreau, A. G. Suits, X. Zhang, J. Rheinecker, L. B. Harding, J. M. Bowman. The Roaming Atom: Straying from the Reaction Path in Formaldehyde Decomposition. Science 2004, 306, 1158–1161. 116 117 [7] J. M. Bowman, B. C. Shepler. Roaming Radicals. Annu. Rev. Phys. Chem. 2011, 62, 531–553. [8] C. Wang, R. M. Stratt. Global perspectives on the energy landscapes of liquids, supercooled liquids, and glassy systems: Geodesic pathways through the potential energy landscape. The Journal of Chemical Physics 2007, 127, 224504. [9] R. D. van Zee, M. F. Foltz, C. B. Moore. Evidence for a second molecular channel in the fragmentation of formaldehyde. J. Chem. Phys. 1993, 99, 1664–1673. [10] E. Wigner. The transition state method. Trans. Faraday Soc. 1938, 34, 29–41. [11] W. M. Gelbart, M. L. Elert, D. F. Heller. Photodissociation of the formaldehyde molecule: does it or doesn’t it? Chemical Reviews 1980, 80, 403–416. [12] K.-M. Weitzel. On the distinction between tight and loose transition states in unimolecular dissociations. Chemical Physics Letters 1991, 186, 490 – 494. [13] X. Li, J. M. Millam, H. B. Schlegel. Ab initio molecular dynamics studies of the photodissociation of formaldehyde, H2 CO −−→ H2 + CO: Direct classical trajectory calculations by MP2 and density functional theory. The Journal of Chemical Physics 2000, 113, 10062–10067. [14] J. Rheinecker, X. Zhang, J. Bowman. Quasiclassical trajectory studies of the dy- namics of H2CO on a global ab initio-based potential energy surface. Molecular Physics 2005, 103, 1067–1074. [15] P. Zhang, S. Maeda, K. Morokuma, B. J. Braams. Photochemical reactions of the low- lying excited states of formaldehyde: T1/S0 intersystem crossings, characteristics of the S1 and T1 potential energy surfaces, and a global T1 potential energy surface. J. Chem. Phys. 2009, 130, 114304. [16] S. A. Lahankar, S. D. Chambreau, X. Zhang, J. M. Bowman, A. G. Suits. Energy dependence of the roaming atom pathway in formaldehyde decomposition. The Journal of Chemical Physics 2007, 126, 044314. 118 [17] S. A. Lahankar, V. Goncharov, F. Suits, J. D. Farnum, J. M. Bowman, A. G. Suits. Further aspects of the roaming mechanism in formaldehyde dissociation. Chem. Phys. 2008, 347, 288–299. [18] A. C. Terentis, S. E. Waugh, G. F. Metha, S. H. Kable. HCO (N,Ka,Kc,J) distributions from near-threshold photolysis of H2CO (J,Ka,Kc). The Journal of Chemical Physics 1998, 108, 3187–3198. [19] J. M. Bowman. Roaming. Mol. Phys. 2014, 112, 2516–2528. [20] P. L. Houston, S. H. Kable. Photodissociation of acetaldehyde as a second example of the roaming mechanism. Proc. Natl. Acad. Sci. U. S. A. 2006, 103, 16079– 16082. [21] B. R. Heazlewood, M. J. T. Jordan, S. H. Kable, T. M. Selby, D. L. Osborn, B. C. Shepler, B. J. Braams, J. M. Bowman. Roaming is the dominant mechanism for molecular products in acetaldehyde photodissociation. Proceedings of the Na- tional Academy of Sciences USA 2008, 105, 12719–12724. [22] M. P. Grubb, M. L. Warter, A. G. Suits, S. W. North. Evidence of Roaming Dynamics and Multiple Channels for Molecular Elimination in NO3 Photolysis. J. Phys. Chem. Lett. 2010, 1, 2455–2458. [23] M. P. Grubb, M. L. Warter, H. Xiao, S. Maeda, K. Morokuma, S. W. North. No Straight Path: Roaming in Both Ground- and Excited-State Photolytic Channels of NO3 −−→ NO + O2 . Science 2012, 335, 1075–1078. [24] B. Fu, J. M. Bowman, H. Xiao, S. Maeda, K. Morokuma. Quasiclassical Trajectory Studies of the Photodissociation Dynamics of NO3 from the D-0 and D-1 Potential Energy Surfaces. J. Chem. Theory Comput. 2013, 9, 893–900. [25] D. L. Osborn. Exploring Multiple Reaction Paths to a Single Product Channel. Adv. Chem. Phys. 2008, 138, 213–265. [26] L. B. Harding, S. J. Klippenstein. Roaming Radical Pathways for the Decomposition of Alkanes. J. Phys. Chem. Lett. 2010, 1, 3016–3020. 119 [27] A. Li, J. Li, H. Guo. Quantum Manifestation of Roaming in H + MgH −−→ Mg + H2 : The Birth of Roaming Resonances. J. Phys. Chem. A 2013, 117, 5052–5060. [28] A. S. Mereshchenko, E. V. Butaeva, V. A. Borin, A. Eyzips, A. N. Tarnovsky. Roaming- mediated ultrafast isomerization of geminal tri-bromides in the gas and liquid phases. Nature Chemistry 2015, 7, 562–568. [29] B. Carpenter. Intramolecular Dynamics for the Organic Chemist. Accts. Chem. Res. 1992, 25, 520–528. [30] S. J. Klippenstein, Y. Georgievskii, L. B. Harding. Statistical Theory for the Kinetics and Dynamics of Roaming Reactions. J. Phys. Chem. A 2011, 115, 14370–14381. [31] L. B. Harding, S. J. Klippenstein, A. W. Jasper. Ab initio methods for reactive potential surfaces. Phys. Chem. Chem. Phys. 2007, 9, 4055–4070. [32] L. B. Harding, Y. Georgievskii, S. J. Klippenstein. Roaming Radical Kinetics in the Decomposition of Acetaldehyde. J. Phys. Chem. A 2010, 114, 765–777. [33] B. C. Shepler, B. J. Braams, J. M. Bowman. “Roaming” Dynamics in CH3CHO Photodissociation Revealed on a Global Potential Energy Surface. J. Phys. Chem. A 2008, 112, 9344–9351. [34] D. U. Andrews, S. H. Kable, M. J. T. Jordan. A Phase Space Theory for Roaming Reactions. J. Phys. Chem. A 2013, 117, 7631–7642. [35] B. C. Shepler, Y. Han, J. M. Bowman. Are Roaming and Conventional Saddle Points for H2CO and CH3CHO Dissociation to Molecular Products Isolated from Each Other? J. Phys. Chem. Lett. 2011, 2, 834–838. [36] L. B. Harding, S. J. Klippenstein, A. W. Jasper. Separability of Tight and Roaming Pathways to Molecular Decomposition. J. Phys. Chem. A 2012, 116, 6967–6982. [37] K. Fukui. Formulation of the reaction coordinate. J. Phys. Chem. 1970, 74, 4161– 4163. [38] H. Xiao, S. Maeda, K. Morokuma. Global ab Initio Potential Energy Surfaces for Low-Lying Doublet States of NO3. J. Chem. Theory Comput. 2012, 8, 2600–2605. 120 [39] Z. Homayoon, J. M. Bowman. Quasiclassical Trajectory Study of CH3NO2 De- composition via Roaming Mediated isomerization Using a Global Potential Energy Surface. J. Phys. Chem. A 2013, 117, 11665–11672. [40] H. Waalkens, S. Wiggins. Geometrical models of the phase space structures governing reaction dynamics. Regular and Chaotic Dynamics 2010, 15, 1–39. [41] F. A. Mauguière, P. Collins, G. S. Ezra, S. C. Farantos, S. Wiggins. Multiple transition states and roaming in ion-molecule reactions: A phase space perspective. Chem. Phys. Lett. 2014, 592, 282–287. [42] F. A. Mauguière, P. Collins, G. S. Ezra, S. C. Farantos, S. Wiggins. Roaming dynam- ics in ion-molecule reactions: Phase space reaction pathways and geometrical interpretation. J. Chem. Phys. 2014, 140. [43] F. A. Mauguière, P. Collins, Z. C. Kramer, B. K. Carpenter, G. S. Ezra, S. C. Faran- tos, S. Wiggins. Phase Space Structures Explain Hydrogen Atom Roaming in Formaldehyde Decomposition. The Journal of Physical Chemistry Letters 2015, 6, 4123–4128. [44] F. A. Mauguière, P. Collins, Z. C. Kramer, B. K. Carpenter, G. S. Ezra, S. C. Farantos, S. Wiggins. Roaming: A Phase Space Perspective. Annual Review of Physical Chemistry 2017, 68, 499–524. [45] K. Müller, L. D. Brown. Location of saddle points and minimum energy paths by a constrained simplex optimization procedure. Theoretica chimica acta 1979, 53, 75–93. [46] C. Wang, R. M. Stratt. Global perspectives on the energy landscapes of liquids, supercooled liquids, and glassy systems: The potential energy landscape ensemble. The Journal of Chemical Physics 2007, 127, 224503. [47] W. E, E. Vanden-Eijnden. Transition-Path Theory and Path-Finding Algorithms for the Study of Rare Events. Annu. Rev. Phys. Chem. 2010, 61, 391–420. [48] C. N. Nguyen, J. I. Isaacson, K. B. Shimmyo, A. Chen, R. M. Stratt. How dominant is the most efficient pathway through the potential energy landscape of a slowly diffusing disordered system? The Journal of Chemical Physics 2012, 136, 184504. 121 [49] C. N. Nguyen, R. M. Stratt. Preferential solvation dynamics in liquids: How geodesic pathways through the potential energy landscape reveal mechanistic details about solute relaxation in liquids. The Journal of Chemical Physics 2010, 133, 124503. [50] D. Jacobson, R. M. Stratt. The inherent dynamics of a molecular liquid: Geodesic pathways through the potential energy landscape of a liquid of linear molecules. The Journal of Chemical Physics 2014, 140, 174503. [51] Q. Ma, R. M. Stratt. Potential energy landscape and inherent dynamics of a hard- sphere fluid. Phys. Rev. E 2014, 90, 042314. [52] L. Frechette, R. M. Stratt. The inherent dynamics of isotropic- and nematic-phase liquid crystals. The Journal of Chemical Physics 2016, 144, 234505. [53] D. Wales, Energy Landscapes: Applications to Clusters, Biomolecules and Glasses, of Cambridge Molecular Science, Cambridge University Press, 2003. [54] X. Zhang, S. Zou, L. B. Harding, J. M. Bowman. A Global ab Initio Potential Energy Surface for Formaldehyde. The Journal of Physical Chemistry A 2004, 108, 8980–8986. [55] X. Zhang, J. L. Rheinecker, J. M. Bowman. Quasiclassical trajectory study of formaldehyde unimolecular dissociation: H2CO → H2+CO, H+HCO. The Jour- nal of Chemical Physics 2005, 122, 114313. [56] H. W. Kuhn, A. W. Tucker, Nonlinear Programming in Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, California, pp. 481–492. [57] C. Eckart. Some Studies Concerning Rotating Axes and Polyatomic Molecules. Phys. Rev. 1935, 47, 552–558. [58] P. L. Houston, R. Conte, J. M. Bowman. Roaming Under the Microscope: Trajectory Study of Formaldehyde Dissociation. The Journal of Physical Chemistry A 2016, 120, 5103–5114. 122 [59] G. H. Peslherbe, H. Wang, W. L. Hase, Monte Carlo Sampling for Classical Trajec- tory Simulations, Vol. 105 of Advances in Chemical Physics, John Wiley & Sons, Inc., 1999, pp. 171–201. [60] H. Goldstein, Classical Mechanics, of Addison-Wesley series in physics, Addison- Wesley Publishing Company, Reading, MA, USA, 1980. [61] M. Abramowitz, I. Stegun, Handbook of Mathematical Functions: With Formulas, Graphs, and Mathematical Tables, of Applied mathematics series, Dover Publica- tions, 1964. [62] M. Galassi, J. Davies, J. Theiler, B. Gough, G. Jungman, P. Alken, M. Booth, F. Rossi, GNU Scientific Library Reference Manual 3rd ed., Network Theory, Ltd., 2009, http://www.gnu.org/software/gsl. [63] U. M. Ascher, L. R. Petzold, Computer Methods for Ordinary Differential Equa- tions and Differential-Algebraic Equations, SIAM: Society for Industrial and Ap- plied Mathematics, 1998. [64] J. W. Brady, J. D. Doll, D. L. Thompson. Cluster dynamics: A classical trajectory study of A*n → An-1+A. The Journal of Chemical Physics 1981, 74, 1026–1028. [65] M. Araujo, B. Lasorne, M. J. Bearpark, M. A. Robb. The photochemistry of formalde- hyde: Internal conversion and molecular dissociation in a single step? Journal of Physical Chemistry A 2008, 112, 7489–7491. [66] C. Cerjan, W. Miller. On finding transition-states. Journal of Chemical Physics 1981, 75, 2800–2806. [67] W. H. Miller, N. C. Handy, J. E. Adams. Reaction path Hamiltonian for polyatomic molecules. The Journal of Chemical Physics 1980, 72, 99–112. [68] J. Nelder, R. Mead. A simplex-method for function minimization. Computer Journal 1965, 7, 308–313. [69] S. van der Walt, S. C. Colbert, G. Varoquaux. The NumPy Array: A Structure for Efficient Numerical Computation. Computing in Science & Engineering 2011, 13, 22–30. 123 [70] S. Peyton Jones, et al.. The Haskell 98 Language and Libraries: The Revised Report. Journal of Functional Programming 2003, 13, 0–255. [71] M. Matsumoto, T. Nishimura. Mersenne twister: a 623-dimensionally equidis- tributed uniform pseudo-random number generator. ACM Transactions on Mod- eling and Computer Simulation 1998, 8, 3–30. [72] O. Tange. GNU Parallel - The Command-Line Power Tool. ;login: The USENIX Magazine 2011, 36, 42–47. [73] D. V. Cofer-Shabica, Source code from ‘Potential landscape perspectives on roam- ing: Insights on formaldehyde from geodesics paths’, C & Haskell source code, Brown University, 2018, doi:10.7301/Z0FT8JJ6. [74] D. Eberly, Derivative Approximation by Finite Differences, Tech. Rep., Geo- metric Tools, LLC, 2001, https://www.geometrictools.com/Documentation/ FiniteDifferences.pdf. [75] T. Cover, J. Thomas, Elements of Information Theory, Wiley, 2006. [76] H. Theil. The entropy of the maximum entropy distribution. Economics Letters 1980, 5, 145 – 148. [77] D. R. Yarkony. Nuclear dynamics near conical intersections in the adiabatic repre- sentation: I. The effects of local topography on interstate transitions. The Journal of Chemical Physics 2001, 114, 2601–2613. [78] D. G. Truhlar, B. C. Garrett. Variational transition-state theory. Accounts of Chem- ical Research 1980, 13, 440–448. [79] W. L. Hase, Dynamics of Unimolecular Reactions, Vol. 2 of Modern Theoretical Chemistry, Plenum Press, 1976, chapter 3. [80] H. Touchette. The large deviation approach to statistical mechanics. Physics Re- ports 2009, 478, 1–69. [81] R. D. Levine, R. B. Bernstein, Thermodynamic Approach to Collision Processes, Vol. 2 of Modern Theoretical Chemistry, Plenum Press, 1976, chapter 7. 124 [82] D. H. Ess, S. E. Wheeler, R. G. Iafe, L. Xu, N. Çelebi-Ölçüm, K. N. Houk. Bifur- cations on Potential Energy Surfaces of Organic Reactions. Angewandte Chemie International Edition 2008, 47, 7592–7601. [83] Y. J. Hong, D. J. Tantillo. Biosynthetic consequences of multiple sequential post- transition-state bifurcations. Nature Chemistry 2014, 6, 104–111. [84] F. W. Wiegel, Introduction to Path-integral Methods in Physics and Polymer Sci- ence, World Scientific Publishing Co Pte Ltd, Singapore, 1986. [85] J. G. Simmonds, A Brief on tensor analysis, of Undergraduate texts in mathematics, Springer-Verlag, New York, NY, USA, 1982. [86] G. Golub, C. Van Loan, Matrix Computations, of Matrix Computations, Johns Hopkins University Press, 2012. [87] W. K. Hastings. Monte Carlo Sampling Methods Using Markov Chains and Their Applications. Biometrika 1970, 57, 97–109. [88] Information Technology Laboratory, Secure Hash Standard (SHS), Tech. Rep. FIPS PUB 180-4, National institute for Standards and Technology, Gaithersburg, MD 20899-8900, 2012, http://csrc.nist.gov/publications/fips/fips180-4/ fips-180-4.pdf. APPENDIX A DERIVATION OF PATH-INTEGRAL FORM OF THE EQUATION FOR FREE DIFFUSION We present a derivation and discussion of Wang and Stratt’s path-integral expression for the diffusion propagator [8], which suggests geodesics on energy landscapes may be a way of getting at the inherent dynamics of a chemical system. The propagator is given as: Z R,t " Z t  2 # 1 dR G R0 → R, t = e D R(τ ) exp – e dτ , (A.1)    e e R0 ,0 e 4D 0 dτ e which specifies the probability of transitioning from state R0 to R in time t for a system under free diffusion (no forces) as an integral over the space of all paths, R(τ ). D is the e e diffusion constant and R is the configuration-space vector specifying the state of the e system. This appendix contains two parts: a derivation of the above expression and e then a review of the calculus of variations to derive the unconditionally-minimized, straight-line portion of the geodesics. 125 126 A.1 Derivation This section closely follows the first chapter of Weigel’s excellent introduction to path integrals [84]. In the absence of external forces1 , diffusion is a process characterized by the following dynamical relation: d G(R, t) = D∇2 G(R, t) , (A.2) dt e e where G(R, t) is the probability of finding a system in configuration R at time t and D is the diffusion constant, which governs the intrinsic rate of diffusion. e e With the initial condition G(R, t = 0) = δ(R – R0 ), one can find the solution for a Cartesian system of α degrees of freedom to be: e e e 2 " # (R – R 0 ) G(R0 → R, t) = (4πDt)–α/2 exp – e e , (A.3) e e 4Dt as can be verified by substitution. Suppose we are interested in a particular path2 between R0 and R. We could represent such a path by discretizing it over (N + 1) bits of time, each of duration ε, e e such that: (N + 1)ε = t. For notational ease, we define, ti = ε · i Ri = R t i ,  e e giving the discrete time and system configuration along the path. We also have tN+1 = t. This procedure is illustrated in figure A.1. To compute the probability of tracing such a path, we rely on the independence of the probabilities of taking each such step. The probability of the path is then the probability of propagating from R0 to R1 in time ε multiplied by the probability of e e 1 Studying free diffusion allows us to treat complicated systems from another perspective. We could solve the diffusion equation in the presence of a complicated potential, but this is neither easy nor transferable. Free diffusion is much more straight-forward. However, we still don’t get a free lunch: the potential will impose stringent boundary conditions in a yet unspecified manner. 2 This may seem an odd question. Why are we creating more work if we have solved already the diffusion equation? The answer lies in the formal expression to which we arrive if we study the solution as a function of paths. 127 Figure A.1: Discretizing a Path Note that while this figure traces the time evolution of a single spatial coordinate, paths of arbitrary dimension are amenable to this decomposition. Figure modified from [84]. propagating from R1 to R2 in time ε and so on. By inserting eq. A.3 for each step, we have: e e YN G(R0 → R, t; {R1 , R2 . . . . RN }) = G(Ri → Ri+1 , ε) e e e e e i=0 e e – Ri ) 2   iN+1 N (R i+1 = (4πDε)–α/2 h X exp – e  , (A.4) 4Dε e i=0 which gives the probability of diffusing from R0 to R in a time t via the N ordered points {R1 , R2 . . . . RN }. We could recover our original expression (eq. A.3) by integrating3 e e eq. A.4 over the domain of each Ri : e e e e Z Z Z G(R0 → R, t) = dR1 dR2 . . . dRN G(R0 → R, t; {R1 , R2 . . . . RN }) (A.5) e e e e e e e e e e Suppose we increased N until the path appeared smooth; in the continuous limit, 3 Recall the Gaussian integral: Z +∞ r –ax2 π dx e = –∞ a 128 N → ∞ and ε → 0. Focusing for the moment on the exponential, we can write4 : 2   N X i+1(R – R i ) lim exp – e = (A.6) 4Dε e ε→0 N→∞ i=0  !2  1 X N R i+1 – R i lim exp – ε = (A.7) 4D ε e e ε→0 N→∞ i=0 Z t dR !2   1 exp – dτ  , (A.8) 4D 0 dτ e where R(τ ) is a continuous function on [0, t] specifying the path. Now we can write: e G(R0 → R, t) = dR 2   (A.9) e e ! Z t iN+1 Z 1 Z Z –α/2 h lim (4πDε) dR1 dR2 . . . dRN exp – dτ  4D 0 dτ e ε→0 N→∞ e e e To express this more compactly, we define the following operator5 : Z R,t Z Z Z e D[R(τ )] ≡ lim (4πDε)–α(N+1)/2 dR1 dR2 . . . dRN , (A.10) R0 ,0 e ε→0 e e e e N→∞ which explicitly specifies the boundary values: R0 at t = 0 and R at time t. This is a path integral. With this notation in hand, we arrive at our original equation: e e Z R,t " Z t  2 # 1 dR G(R0 → R, t) = e D[R(τ )] exp – e dτ , (A.11) e e R0 ,0 e 4D 0 dτ e 4 Using the definition of a derivative: df f(t + ε) – f(t) = lim . dt ε→0 ε And the approximation for an integral: N X Z b lim f (a + i∆τ ) ∆τ ≈ dτ f(τ ) , N→∞ a i=0 where ∆τ = b–a N . 5 Nota bene, this “operator” is really just notational shorthand for the procedure we followed to arrive here. Take care that any subsequent manipulations respect with the discretization—they must! 129 which expresses the diffusion propagator as an integral over the space of all possible paths connecting the boundary values. From eq. A.11, we can observe that when diffusion is slow, D is small and therefore the paths that will dominate must minimize the integral within the exponential6 . Those paths will be the shortest paths. The key observation is the following: the extremization of any quantity of the form: Z I= dτ f(τ ) (A.12) can also be effected by extremizing7 : Z I0 = dτ H f(τ ) (A.13)   where H[x] is a strictly positive function with non-negative derivative, defined on the range of f(τ ). Thus, minimizing dR 2 Z t   dτ (A.14) dτ e 0 √ can be achieved by taking H[x] = x and minimizing s Z t  2 dR dτ (A.15) dτ e 0 Bringing the differential into the root, eq. A.15 reduces to Z tq Z t dR · dR = (A.16) dR 0 e e 0 e which is clearly the length of the path. A.2 Unconditional minimization In this section, we follow Goldstein [60] to find the form of the (shortest) paths, which unconditionally minimize the action. Again, we will work in Cartesian coordinates for 6 That is, the dominant paths will obey the classical mechanical principle of least action: Z δS R(τ ) = 0, S R(τ ) = dτ (2T) .     e e From here it is a simple extension to systems with unequal masses and/or non-Cartesian bases. 7 Consider the chain rule with the variational derivative. 130 an otherwise unconstrained system. See Jacobson and Stratt [50] for work dealing with free rotations and Frechette and Stratt [52] for a case where a constraint is applied to the free steps. Using calculus to find the extrema of a function, for instance h(x), is a familiar operation. By equating the derivative of h(x) with 0, we locate values of x for which h(x) is unchanging or stationary; these locations often correspond to local extrema of the function. The problem treated by the calculus of variations is analogous. We seek to extremize an object that takes a function (in our case, a path) as its argument. That is, we find some path such that the object is stationary with respect to variations in that function. From eq. A.11, we seek to minimize an integral over kinetic energy, which in general is a function of coordinates and their derivatives. Working in one dimension8 , we seek to find the path R(τ ) that extremizes: Z t J= dτ f R, R˙ (A.17)  0 That is, we seek R(τ ) such that J is unchanged for infinitesimal variations in R. We can represent the paths neighboring the correct (extremal) path, R(τ ) by adding an arbitrary function, η(τ ) in proportion to a small variable parameter, α: R(τ , α) = R(τ ) + αη(τ ) (A.18) As we are interested in R such that it connects R(0) to R(t), we require that: η(0) = η(t) = 0 (A.19) J in eq. A.17 is now a function of α as well: Z t J(α) = dτ f R(τ , α), R(τ ˙ , α) (A.20)  0 We can encode the requirement for stationary J in the following familiar form9 : dJ 0= (A.21) dα α=0 8 This procedure extends to systems with an arbitrary number of degrees of freedom. 9 We require that the derivative be evaluated at α = 0 because this corresponds to the unperturbed path we seek. 131 We can bring the α derivative inside the integral in eq. A.20 and applying the chain rule, we have: ! Z t dJ ∂f ∂R ∂f ∂ R˙ = dτ + dα 0 ∂R ∂α ∂ R˙ ∂α From eq. A.18, we have that ∂α ∂R = η and ∂ R˙ = η, ˙ the arbitrary function of τ and its ∂α time derivative. This leaves: Z t   dJ ∂f ∂f = dτ η+ η˙ (A.22) dα 0 ∂R ∂ R˙ Consider the second term in the integral: Z t ∂f dτ η˙ 0 ∂ R˙ This expression is susceptible by integration by parts 10 Re-arranging leaves us with: ∂f t Z t Z t   ∂f d ∂f dτ η˙ = η – dτ η 0 ∂ R˙ ∂ R˙ 0 0 dτ ∂ R˙ Because of the bounds on η, η(0) = η(t) = 0, the surface terms vanish! Inserting back into eq. A.22 and factoring out η gives: Z t    dJ ∂f d ∂f = dτ – η=0 (A.23) dα α=0 0 ∂R dτ ∂ R˙ where evaluating the derivative at α = 0 gives R and R˙ as functions of τ alone. Since η is an arbitrary function, the integral is stationary only if ∂f d ∂f – =0 (A.24) ∂R dτ ∂ R˙ which gives the conditions on R to minimize J11 . 10 Recall the general form, Z b Z b udv = uv|ba – vdu a a And identify:   ∂f d ∂f u= du = dτ ∂ R˙ dτ ∂ R˙ dv = ηdτ ˙ v=η 11 Notice that inserting the Lagrangian for f gives S = J, the action, and yields Lagrange’s equations of motion. 132 Minimizing The Force-Free Action  2 dR From eq. A.11, we have f = dτe in eq. A.24. Proceeding, we find that the path which minimizes eq. A.11 is specified by the differential equation: ¨=0 R (A.25) e which, when integrated and solved for its boundary values, yields the equation for a straight line (and the geodesic!):  τ τ  R(τ ) = R(0) 1 – + R(t) (A.26) e e t e t this amounts to linearly interpolating between R(0) and R(t) in time t for each of the 3N coordinates. To actually compute such paths we discretize12 eq. A.26 as follows: e e R – Ri Ri+1 = Ri + δs e f e i (A.27) e e |Rf – R | e e Where Rf = R(t) and δR is a small step-size. Small here is relative to features of the landscape. In practice, δR should be as large as it can be such that the lengths of e e computed geodesics are still converged to a maximum; see discussion in Section 2.2.2. 12 Consider R(τ + ∆) – R(τ ). e e APPENDIX B T WO MORE WAYS OF PRESERVING THE CENTER OF MASS DURING ESCAPE STEPS Vanilla Newton-Raphson When the procedure for free steps described in Section 2.2 (eq. 2.9) moves the system such that V(R(τ )) > EL , we need a way back to the boundary. That is, we seek the nearest solution to V(R(τ )) = EL . A classic strategy for solving this problem is a Newton-Raphson e root search. e Here, we expand the function of interest to first order and run downhill: V(R) ≈ V(R0 ) + ∇V · (R – R0 ) = EL (B.1) e e e R0 e e e Where R0 was the first offending step in eq. A.27. We make the guess that heading downhill via the steepest path will be optimal; this suggests a solution of the form: e (R – R0 ) = C ∇V R ≡ ∆R (B.2) e e e e0 e 133 134 Which immediately yields the constant1 : EL – V(R0 ) C= e 2 ∇V R0 e e This procedure is then iterated until the V(R) = EL boundary is found. It may look like eq. B.2 renders our problem solved, but as we prove in Section 2.2.1, this naïve e root search fails to preserve the center of mass and spuriously lengthens geodesics computed with it. In this appendix, we present two wildly different methods of coping with the problem. The first is the essence of simplicity, though limited in its generality. By comparison, the second technique is a swatting a fly with a laser cannon, but demonstrates a useful and general principle. B.1 Swatting a fly without moving: Follow the inverse- mass-weighted gradient Never let it be said that a little physical intuition won’t go a long way. It turns out that the techniques discussed in Section 2.2.1 and to be discussed in Section B.2 solve a much harder problem than is necessary—they will both project our center-of-mass perturbing motion from any configuration-space displacement. They are included because of their generality and because the result in Section 2.2.1 is what we actually implemented in our investigations of geodesics on formaldehyde. Unfortunately physical insight doesn’t always occur prior to pages of vector algebra or hundreds of thousands of core-hours of computer time2 . However, the result may still be useful in the future and so we provide a pedagogical exposition here; if we’re not careful, the lead-up will take longer than the proof. Our task is to follow a gradient downhill without perturbing the center of mass. That sounds awfully similar to the way that classical mechanics works; surely the general purpose projection operators are doing more than is required. If we re-write Newton’s 1 The solution should be familiar from Section 2.2, eq. 2.11, from which we have ζ = –C. 2 which is not to say that we could have avoided all those hours if the insight had struck earlier. See the notes on performance at the end of Appendix E on page 164; this section does not address the bottle-neck. 135 second law for Cartesian coordinates in terms of the influence of a potential [60], we have: F=M·a (B.3) e e = –∇V (B.4) e Where F is the force experience by the system due to the potential V, a is the system’s acceleration and M is the diagonal N × N mass matrix defined in eq. 2.69: e e (M)iµ,jν = δi,j δµ,ν mi i, j = 1 . . . N ; µ, ν = x, y, z (B.5) In the absence of external fields, classical mechanics never changes the center of mass3 . Suppose a dynamical system is momentarily at rest; in what direction does it first move? Instantaneously, it moves in the direction of its acceleration (later this becomes complicated by the fact that it is already moving). That acceleration is: a = –M–1 ∇V (B.6) e e where M–1 is simply the diagonal inverse-mass matrix given by: 1 M–1   = δi,j δµ,ν (B.7) iµ,jν mi Equation B.6 suggests that we can escape center-of-mass perturbation simply following the inverse-mass-weighted gradient. Could this really be? In Section 2.2.1, we arrived at eq. 2.35 right before showing that the “vanilla” Newton-Raphson root search displaced the center of mass: ? ~ ~0 = (B.8) M · ∇ V R n e e e where M~ is the 3 × 3N dimensional, block-diagonal matrix composed of N ordered, 3 × 3 blocks with mi /Mtotal along the diagonals. e ~ = mi 1 , i = 1 . . . N   ↔ M (B.9) e i Mtotal 3 This is a restatement of Newton’s first law. 136 If, instead of following the gradient, we followed the inverse-mass-weighted gradient suggested by eq. B.6, we would require that: ~ · M–1 · ∇V ~0 = M (B.10) n e e R e = ~1 · ∇V n (B.11) e e R e where ~1 is the 3 × 3N dimensional, block-diagonal matrix composed of N ordered, 3 × 3 identity blocks4 . e   ↔ ~1 = 1 , i = 1 . . . N (B.12) e i Equation B.11 is a sum along each coordinate, {x, y, z}, and can be rewritten as: N  ? X  0= ∇V Rn ; µ = x, y, z (B.13) i e e µ Which can be cast in a more familiar form as the requirement that the sum of forces in an isolated system vanish: N = ~Fi X 0= (B.14) i This is proven in many classical mechanics texts [60] and follows from ~F12 = –~F21 . Namely that the force exerted on one object by another is equal and opposite to the force exerted by the second on the first—Newton’s third law. A Newton-Raphson root search based around eq. B.6 will therefore preserve the center of mass of the system. Solving eq. B.1 with the suggested guess, ∆R = C · M–1 · ∇V R yields: e e e0 EL – V R  C= (B.15) ∇V R · M–1 · ∇V R e e e0 e e0 And so our center-of-mass respecting analogue of the steepest-descent Newton-Raphson search takes steps like: EL – V R  ∆R = · M–1 · ∇V R (B.16) ∇V R · M–1 · ∇V R e e0 e e e0 e e0 e 4 We T ~ . saw this matrix earlier in eq. 2.43 under the name C e 137 B.2 Laser-cannon: A general projection Suppose we modify eq. B.1 to respect a constraint? After all eq. B.2 was based on a guess about following the gradient. If, instead, we chose to follow a path constrained to be parallel to some other direction, n ˆ , eq. B.2 would have the following form: (R – R0 ) = Cn ˆ (B.17) e e and inserting into eq. B.1 would yield: V(R) ≈ V(R0 ) + C ∇V · n ˆ = EL (B.18) e e e R0 e Solving for C yields: EL – V(R0 ) C= e ∇ V R · n ˆ e e0 While this is well an good, it doesn’t quite solve our problem. We need to be able to impose a constraint that is orthogonal to some directions, those of center of mass motion. What if we follow the path of steepest descent subject to the constraint that it is orthogonal to some configuration-space vector which would move the center of mass, ξ?ˆ To construct such a step we take a brief detour to discuss dyads. The following treatment uses the notation of real-space vectors, but is applicable to vectors of arbitrary dimension (such as our configuration space vectors). This treatment is reprised and expanded in [85]. A dyad (or dyadic) is a 2nd order tensor formed from the direct product of 2 vectors. The dyadic ~u~v is defined such that the action of this tensor on a vector w ~ is: ~u~v(w ~ ) = ~u(~v · w ~) (B.19) The Cartesian components are (~u~v)α,γ = uα vγ or   u v u v u v  1 1 1 2 1 3 ~u~v = u2 v1 u2 v2 u2 v3    u3 v1 u3 v2 u3 v3 138 Given a unit vector, u ˆ , the definition in eq. B.19, makes it easy to construct an operator which projects another vector onto u ˆ . For example: ˆu u ˆ (w ~)=u ˆ (u ~ ) = Projuˆ w ˆ·w ~ (B.20) gives the projection of w ~ on u ˆ . See Figure B.1 for a geometric interpretation. Figure B.1: A projection Geometric realization of the projection of w ~ onto u ˆ . Modified from [85]. A useful property of projection operators is their idempotency. From the geometric significance, it must follow that repeat application of any projection, is equivalent to single application. We show that this holds for u ˆuˆ as follows: (u ˆuˆ )2 = u ˆuˆu ˆuˆ (B.21) =u ˆ (u ˆ )u ˆ·u ˆ (B.22) =u ˆ (1)u ˆ (B.23) =u ˆuˆ (B.24) By induction, we can reason that (u ˆuˆ )n = u ˆuˆ for all n ≥ 1. In addition to projection on u ˆ , we can construct the projection off of (or orthogonal to) u ˆ . Such a projection is given by: 1 – uˆuˆ (B.25)  The orthogonality of the two projections is easily shown: (1 – u ˆuˆ )u ˆuˆ (B.26) =u ˆ– u ˆ 2 (B.27)  ˆu ˆu =u ˆuˆ–u ˆuˆ (B.28) =0 (B.29) 139 Armed with such a projection operator, the analogy to eq. B.17 for an orthogonal downhill direction is clear. We now take a step down and project out all motion along ˆ ξ:   (R – R0 ) = C 1 – ξ ξ · ∇V R ˆˆ (B.30) e e e e e0 This, we again insert into eq. B.1, yielding:   V(R) ≈ V(R0 ) + C ∇V · 1 – ξ ξ · ∇V = EL ˆˆ (B.31) e e e R0 e e R0 e e Solving for C yields: EL – V(R0 ) C=  e ∇V R · 1 – ξˆξˆ · ∇V R e e0 e e e0 To enforce a stationary center of mass, we can introduce a coordinate transformation to separate center of mass motion from internal coordinates: PN m ~R N ~Rc.m. ≡ Pj=1 j j ≡ 1 X N mj~rj (B.32) m M j=1 j j=1 where mj is the mass of the jth particle. The internal coordinates are then: ~rj ≡ ~Rj – ~Rc.m. (B.33) Note that our system of 3N degrees of freedom is now specified with 3N+3 coordinates, which have the following 3 constraints: N mj ~Rj = 0 X (B.34) j=1 one for each spatial dimension. We encode these constraints as vector operations in configuration space as:   ξµ ≡ δµ,ν mj (B.35) e jν =⇒ ξ µ · R = 0 ∀ µ ∈ {x, y, z} (B.36) e e 140 What does ξ µ look like? By way of example: e   m1  0       0       m2     0    ξx =   0    .  e    ..      mN     0    0 The magnitude of ξ µ is the root of the sum of the squares of the masses: e v N u uX ξ µ = t m2j (B.37) u e j=1 and so the unit vector pointing along ξ µ is: e   δµ,ν mj ξˆµ = qP (B.38) jν N 2 j=1 mj Constructing the corresponding dyad, we have:   mj mk ˆ ˆ ξµ ξµ = PN δ δ (B.39) jγ,kα 2 µ,γ µ,α j=1 mj and the projection operator which eliminates center of mass motion along the µ direction is then:   mj mk 1 – ξˆµ ξˆµ = δγ,α δj,k – δµ,γ δµ,α PN 2 (B.40) jγ,kα e j=1 mj   mj mk = δγ,α δj,k – δµ,γ δµ,α PN (B.41) 2  m j=1 j 141 Where we have made use of: 1 = δγ,α δj,k  e jγ,kα and note that in eq. B.40 δγ,α δµ,γ δµ,α = δµ,γ δµ,α as the term is finite only for α = γ = µ with µ in the range of α and γ. To eliminate center of mass motion along each Cartesian direction, we form the product of all of them: Y  1–P ≡ 1 – ξˆµ ξˆµ (B.42)  e e µ e This product reduces to a sum using the following observation:      ˆ ˆ ˆ ˆ ˆ ˆ ξµ ξµ ξν ξν = ξµ ξµ δµ,ν which is true because ξˆµ and ξˆν are orthogonal. This leaves: ! 1–P = 1– X ξˆµ ξˆµ (B.43)  e e e µ Inserting eq. B.39 and using the same identities we did for eq. B.40, we arrive at:   mm 1 – P jγ,kα = δγ,α δj,k – PNj k 2  (B.44)  e e j=1 jm With M2 ≡ PN 2 the projection defined by eq. B.44 has the following symmetric j=1 mj structure in 3 spatial dimensions: m21   m1 m2 1– M – M 2 2 m21   m m 2 1– M – M 1 · · ·    2 2 m21   m1 m2 1 – –     M2 M2   2  1 – P = –  m1 m2 m 1– M 2   (B.45) M 2 2 2 e e   m m 1 2 m – 1 – 2     M2 M2 m22   m1 m2 – M 1– M     2 2 .. ..   . . 142 Returning to the Newton-Raphson problem, we proceed in analogy to eq. B.30 and posit that the appropriate solution is to follow the gradient while projecting out center of mass motion: (R – R0 ) = C 1 – P · ∇V R (B.46)  e e e e e e0 This, we again insert into eq. B.1, yielding:   V(R) ≈ V(R0 ) + C ∇V · 1 – P · ∇V = EL (B.47) e e e R0 e e e R0 e e Solving for C yields: EL – V(R0 ) C= ∇ V R · 1 – P · ∇ V R e e e0 e e e e0 Which we insert back into eq. B.46 to find: V R0 – EL 0  (R – R0 ) = – e 2 ∇ V R (B.48) 0 e e0 ∇ V R0 e e e e where we have defined ∇0 ≡ 1 – P · ∇ as the center-of-mass invariant gradient  operator and made use of the idempotent property of projections. e e e e Applying eq. B.48 iteratively leads to the relation (compare to eq. 2.50): V Rn – EL 0  Rn+1 = Rn – e 2 ∇ V R (B.49) 0 e en ∇ V Rn e e e e where R0 is the first R(t) with V R(t) > EL from eq. A.27. This procedure will escape   the forbidden region while preserving the center of mass. e e e APPENDIX C MICROCANONICAL SAMPLING FOR THE ANHARMONIC OSCILLATOR This section follows and is expanded from the work of Peslherbe, Wang, and Hase [59]. C.1 Normal Mode Hamiltonian In this section we give a description of the normal mode Hamiltonian and demonstrate how to microcanonically sample its degrees of freedom. For a system of n (separable) harmonic oscillators, the Hamiltonian of the system is given as: n n 1X 2 Pi + ωi2 Q2i X H= Ei = (C.1) 2 i=1 i=1 where the ωi are the frequencies of the oscillators and the normal coordinates, {Qi , Pi }, are related to the Cartesian coordinates and momenta, xi and pi , by the transformations: p Pi = √ i mi √ Qi = mi (xi – x0 i ) (C.2) where ~x0 is the equilibrium position of the oscillator. 143 144 For such a system, the distribution function to microcanonically sample phase space is an exactly obtainable object. We show its development in the next section. Assignment of Energies to Normal Modes Equivalent to choosing N values for Qi and Pi , we can assign to the N modes, N energies and N phases. In this section we treat the energies. Supposing that we wish to inject a total microcanonical energy E0 into the N normal modes, the probability that mode 1 has energy E1 is proportional to the number of ways the energy (E0 – E1 ) can be distributed to the remaining (N – 1) modes. This is given by the density of states for the oscillators. We then have: (E – E1 )N–1–1 P(E1 ) = R E 0 (C.3) 0 N–1–1 dE 0 (E0 – E1 ) 1 We can extrapolate to the general expression by noticing that the number of states is proportional to the remaining energy raised to the power of one less than the number of remaining oscillators. With the denominator ensuring normalization we have: Pi–1 N–1–i (E0 – Ei – j=1 Ej ) P(Ei ) = Pi–1 (C.4) R E0 – j=1 Ej Pi–1 N–1–i 0 (E0 – Ei – j=1 Ej ) dEi Sampling equation C.4 directly would be a quite difficult. However the cumulative distribution function allows sampling an arbitrary probability distribution as follows: Z x C(x) = P(x0 )dx0 = Ri (C.5) –∞ Where Ri ∈ [0, 1) is a uniformly distributed random variable. For this energy distribution specifically: Z Ei C(Ei ) = P(E0i )dE0i = Ri (C.6) 0 Integrating eqn. C.4 and solving for Ei yields:   i–1 Ej  1 – R1/(N–i) X   E i = E 0 – i (C.7) j=1 Equation C.7 can be used do assign the first (N – 1) modes. The Nth mode is assigned EN = E0 – N–1j=1 Ej with probability 1. P 145 Assignment of Random Phases to Normal Modes Having assigned the energies of the normal modes, we assign to each normal mode, a random phase. We uniformly sample the angle, by choosing a new random variable, Ri ∈ [0, 1), for each mode and taking: √ 2Ei Q0i =+ cos 2πRi (C.8)  ωi 0 Pi = – 2Ei sin 2πRi (C.9) p  Where Q0i and Pi0 are the initial trial coordinate and momentum for the ith normal mode. In this way we have specified the 2N coordinates of phase space with 2N random variables. We can verify that this yields the target energy by simple substitution into the normal-mode Hamiltonian, eqn. C.1: n 1X 2 H= Pi + ωi2 Q2i (C.10) 2 i=1 √ 1 n h i2 2Ei  2  + ωi2 X = – 2Ei sin 2πRi cos 2πRi (C.11) p 2 ωi i=1 n 1X 2Ei sin2 2πRi + cos2 2πRi h i = (C.12)  2 i=1 n X = Ei (C.13) i=1 =H (C.14) C.2 Construction of Normal Mode Coordinates In this section we construct a transformation to normal coordinates for an arbitrary potential. We proceed by approximating the Hamiltonian in the vicinity of a potential minimum as a collection of harmonic oscillators. Then we diagonalize the approx- imated Hamiltonian, finding the normal modes and their frequencies. Though the normal modes will be an approximation for an anharmonic potential, the coordinate transformation is exact. 146 In the neighborhood of a potential minimum1 , ~x0 , the Hamiltonian for an arbitrary potential with N degrees of freedom can be brought into the form of the normal mode Hamiltonian, eqn. C.1. We make a Taylor series expansion of the potential and truncate all terms of greater than quadratic order: N 1 X ∂ 2 V   V(~x) ≈ V|~x0 + xi – x0 i xj – x0 j (C.15) 2 ∂xi ∂xj i6=j ~x0 ∂V Here we use the fact that the potential has a minimum at ~x0 to guarantee ∂x = 0 ∀ i, i ~x0 which eliminates the first order terms. Suitable shifting of the zero of the potential eliminates the zeroth order (constant) term and application of equation C.2 transforms to mass weighted coordinates. To eliminate the cross terms, we diagonalize the hessian matrix, H, given by: 2 2 2 " # ∂ V ∂ V ∂xi ∂xj ∂ V –1/2 Hij = = = mi mj (C.16) ∂Qi ∂Qj ∂xi ∂xj ∂Qi ∂Qj ∂xi ∂xj 0 ~x0 ~x0 We discuss the evaluation of such an object in Section 2.7.3. Diagonalization [86, Chapter 8.2] yields a matrix, Ωij = δij ωi2 , containing the frequencies of the normal modes and the eigenvectors (modes) associated with them. The matrix L, whose columns are the eigenvectors, allows transformation from normal coordinates back to Cartesian coordinates as in equation C.18. L ≡ ~e1 , ~e2 , ~e3 , . . . (C.17)   where ~ei is the column eigenvector associated with the frequency ωi . In general (for systems with N > 2) there will be 6 normal modes with frequencies approaching 0 corresponding to net translation in three dimensions and net rotation about three axes (see also Section 2.7.3). With these new coordinates and frequencies in hand, we follow the procedure of section C.1 in the assignment of energies and phases. 1 More generally, this procedure can be applied anywhere ∇V = 0, whether a minimum, a maximum, or a saddle point. 147 Transformation Back to Cartesian Coordinates Once the normal mode phases and energies have been assigned (see sections C.1 and C.1), it is a simple matter to transform back to Cartesian coordinates. 1 ~ ~x =~x0 + M 2 LQ (C.18) 1 ~p =M 2 L~P (C.19) 1 1 where Mij2 = δij mi2 is the matrix giving the masses associated with each coordinate and ~ and ~P are vectors giving the previously assigned normal coordinates. Q C.3 Corrections for Anharmonicity and Selection of Total Angular Momentum The normal modes only represent an approximation for our true Hamiltonian and therefore there arises a spurious angular momentum and a deviation from the target energy, E0 . To correct for this, we iteratively apply the following transformation until convergence2 is reached. 1. Form the spurious angular momentum (this is simply the total angular momentum of the molecule), N ~js = X ~ri × mi~vi (C.20) i=1 Where the sum is over atomic centers and ~ri , ~vi , and mi give the Cartesian position, velocity, and mass of the ith atom. To correct for this term we add ~j = –~js to the molecule3 by computing the vector angular velocity: ω = I–1~j (C.21) where I–1 is the inverse of the moment of inertia tensor and adding this angular velocity to each center. ~vi0 = ~vi + ω × ~ri (C.22) 2 We place the following constraints on total energy, angular momentum, and center of mass translation: | E–E ~ ~ ~ ~˙ –8 E0 | < δE where E = H(x, p), |j| < δj, and |rcm | < δ rcm ; all tolerances are of the order of 10 . 0 ˙ 3 This procedure could be extended to select arbitrary angular momentum by adding the term ~j = ~j0 –~js where ~j0 is the desired angular momentum. 148 2. To correct the total energy, let E = H(~x, ~p) and scale the coordinates and momenta (velocities would be fine too) as: E0 (1/2)   ~x0 =~x0 + (~x – ~x0 ) (C.23) E  (1/2) E ~p0 =~p 0 (C.24) E I have observed that this procedure can yield a fixed point (an infinitely repeated sequence) for certain input parameters. Explanation of Anharmonic Correction The algorithm presented in the previous section may appear opaque, but this need not be the case. Step 1 simply treats the molecule as a rigid body. Equation C.20 is the expression for the total angular momentum and equation C.21 is an inversion of another standard expression for the angular momentum of a rigid body ~j = Iω (C.25) Where I is the moment of inertia tensor [60]. As for step 2, it is exact to the extent that the Hamiltonian is harmonic. Consider a system with a single harmonic degree of freedom (eqn. C.1 for N = 1) and target total energy E0 . For convenience, we choose a shifted coordinate system such that the minimum of the harmonic oscillator is at the origin. This has the effect of setting ~x0 = ~0 in eqn. C.23. Our present coordinates, Q, P yield an energy E: H(Q, P) = E 6= E0 (C.26) and we desire that the transformed coordinates, Q0 , P0 yield the energy E0 : ? H(Q0 , P0 ) = E0 (C.27) 149 Proceeding by substitution of the transformed coordinates into the Hamiltonian, we have: 1  02 2 P + ω 2 Q0  H(Q0 , P0 ) = (C.28) 2   1 2 E0   2 2 E0 = P +ω Q (C.29) 2 E E 1 2   E0 P + ω 2 Q2 h i = · (C.30) E 2   E0 = ·E (C.31) E = E0 (C.32) Which is as intended. APPENDIX D MONTE CARLO SAMPLING THE LANDSCAPE ENSEMBLE The first part of this appendix closely follows [64], whose methods we imitate. We seek to generate a set of points which samples the roaming region (see Section 2.3.3) of the formaldehyde potential energy surface in the landscape ensemble (Section 2.1.2). Such sets at the appropriate landscape energies are the buckets from which the intermediate, roaming points, are drawn when constructing roaming geodesics (Section 2.3.4). Metropolis-Hastings sampling [87] is a Monte Carlo procedure for evaluating en- semble averages of the form: dxN P(x)f x R  hfi = R Ne e (D.1) dx P(x) where f is a property of the N-dimensional space determined by x and P(x) is the e probability of a given state, x. The Metropolis algorithm is of particular use when P(x) e e is difficult to sample directly or to normalize. Its output is a set of states, xi , such that e  e the average in eq. D.1 can be replaced by: e 1X hfi = (D.2)  f xi N i e The set, xi , is generated by a biased random walk in the space which compares  the relative probabilities, P, for the current and succeeding states. In this way the e probability of being in a particular state at a particular step only depends explicitly on the prior step; such a sequence is called a “Markov chain”. 150 151 D.1 General requirements Given an initial state, x0 , a (potentially un-normalized) probability distribution, P(x), 0 and a propagator for generating trial moves, F : xn → xn+1 , then the sequence of the e e random walk is given recursively as follows: e e x0 = x0 (choosen or otherwise given) e e  0  P(x ) F(xn ) , with probability: min 1, en+1  P(xn ) (D.3) xn+1 = e e , otherwise  x n e e The probabilistic component of eq. D.3 can be handled by comparing a uniform, random variable on [0, 1), ξ, to the target probability, p, and accepting if ξ < p. If this is confusing, think of the limiting cases: p = 0 and p = 1. F, the propagator can take many forms, but here, we use a center-of-mass preserving variant of the one used in Brady et al. [64]. To generate trial moves for the random walk, we displace a single random atom, j, along a random coordinate, by a random amount, ∆~j =µ ˆj (ξ – 0.5)δx . Where ξ is a uniformly distributed, random variable on [0, 1), δx is a scaling factor and µ ˆj ∈ {ˆx, yˆ, zˆ} is along a randomly selected Cartesian axis of the jth atom. After each such step, an additional move to preserve the center of mass is made by moving all atoms by the same, appropriate amount. Thus the total move for each atom i is: mj   ∆~ri = µ ˆj (ξ – 0.5)δx δi,j – ; i = 1...N (D.4) M PN where M = i=1 mi is the total mass of the system. We encode this move through the equation: F(x) = x + ∆R (D.5) e e e where ∆R = (∆~r1 , ∆~r2 . . . ∆~rN ) is the configuration-space vector of all single-atom displacements defined in eq. D.4. e There is a bit of lore surrounding Markov chains, which indicates that an acceptance ratio (i.e., the fraction of accepted trial steps) of 50% is “best”. Conversations with Jimmie Doll (one of the authors of [64]) indicated that this is very rough and that a better guide is perhaps “bigger than 10% and less than 90%”. One can adjust the 152 scaling factor, δx , to achieve the desired acceptance ratio during a trial phase. However, once sampling for the set in eq. D.2, δx must remain fixed or the chain will fail to satisfy detailed balance and the intended distribution will not be sampled. We describe a naïve, but effective implementation of a scheme to pick δx in section a later section on page 154. D.2 Implementation specifics We now give the specifics of our implementation of the foregoing procedure to sample points from the roaming region of the formaldehyde potential energy surface. A Probability-like Function In the first section we stated that we could use un-normalized probability distributions. This is because the distribution, P only appears in eq. D.3 in ratio with itself. This is quite convenient because it obviates the need to compute what would be—in this case and many others—a complicated, many-dimensional configurational integral. Our definition of the roaming region is explained in detail in Section 2.3.3, but recall that the structure of the region is defined in terms of requirements that 1. The restoring force on the roaming hydrogen is of small magnitude: FH(0) < Fmax (D.6) 2. The hydrogens are separated by more than a minimum and less than a maximum distance: dmin < ~rHH(0) < dmax (D.7) Hydrogen-hydrogen separation is given by ~rHH(0) and we define the restoring force as: ~ (0) V(R) FH(0) = –ˆrH(0) –CM · ∇ (D.8) H e where   ~ (0) = ∂ ∂ ∂ ∇ H , , (D.9) ∂xH(0) ∂yH(0) ∂zH(0) 153 and ˆrH(0) –CM = ~rCM – ~rH(0) / ~rCM – ~rH(0) (D.10)  These criteria can be formally encoded in a function as follows: Proaming (R; Fmax , dmin , dmax ) ∝ Θ Fmax – FH(0) (D.11)  e ·Θ ~rHH(0) – dmin (D.12)  ·Θ dmax – ~rHH(0) (D.13)  where Θ is the step function:  1, x>0 Θ(x) = (D.14) 0, otherwise To sample the potential energy landscape ensemble ([46] and Section 2.1.2), we can use a similar scheme: Ppele (R; EL ) ∝ Θ EL – V(R) (D.15)  e e where EL is the landscape energy. Combining equation D.15 with D.11: P(R) = Ppele (R; EL ) × Proaming (R; Fmax , dmin , dmax ) (D.16) e e e allows us to sample the portion of the roaming region that is also in the landscape ensemble for a given EL . It should be noted that since our combined function has binary outputs, the proba- bilities in eq. D.3 will always be 0 or 1. This is a special case of the more general class of problems than the Metropolis-Hastings algorithm is capable of handling—within the allowed region, we seek uniform sampling rather than a variable density. Put another way, because we have no notion of “sort of roaming” or “sort of in the potential energy landscape ensemble”, the random walk will never do something “sort of bad” i.e., will never take “uphill” steps. When Brady and co-workers[64] implemented this scheme, it was to sample the energy shell of the microcanonical ensemble. While there is no notion of “sort of the right energy” in the microcanonical ensemble, they used a pre-limit form of the delta function for their probability, which allowed their walker to meander back and forth across the energy shell. 154 Picking a Scaling Factor For the linear propagator described in section D.1, we would like to optimize the scaling factor, δx , such that the acceptance ratio falls within the desired bounds. The binary probability function could introduce some added complications, but we avoid them under the following assumption: • The set to be sampled is dense; that is, points in the set are arbitrarily close to others in the set This condition allows us to assume that for sufficiently small steps, moves will always fall within the acceptance region. This is important because it allows us to take the δx → 0 limit as yielding an acceptance ratio of 1. In the case of a binary probability function, the other limit, δ → ∞, implies an acceptance ratio equal to the value hP(~x)i. In the case of the roaming region of formaldehyde, this average is 0 because the volume of the roaming region is finite1 . It would be convenient to assume that the set is connected; that is, it is possible to translate between any two points in the set while remaining in the set. This would actually be a stronger requirement than the first assumption and would allow us to assume that the entire set is reachable from any x0 . But, in the potential energy landscape ensemble (or microcanonical ensemble), we have no guarantees that such e an assumption will hold2 . The general scheme for selecting δx is this: 1. Guess an initial value for δx 2. Compute a Markov chain of length k while keeping track of the acceptance ratio 3. If the ratio is larger than desired, scale by (1 + α) to increase the length. Likewise, if the ratio is smaller than desired, scale by (1 – α) to decrease the step. 4. Go-to 2 and repeat M times. 1 orat least grows more slowly than the volume of the allowed regions of the landscape ensemble with EL . 2 Indeed, we strongly suspect that it does not; see Section 3.6. 155 If α is too large, the results will be unstable as δx oscillates in and out of the acceptable range. Too small and convergence will take a very long time. Similar observations can be made about k. We find acceptable results with α = 0.05 and k = 1000 at EL = 37600 cm–1 Using this plan for picking δx We achieved an acceptance ratio of 89.5% with a scaling factor of δx = 0.27562 a0 after 1.1 × 106 total steps. This step size was then used for sampling the roaming region at all energies. All parameters used to select δx are given in the table below. Parameter Value δx (guess) 0.01 a0 k 1000 target acceptance ratio 10% to 90% α 0.05 M 1100 δx (final) 0.27562 a0 acceptance ratio 89.5% The Formaldehyde Potential Given that we have no reason to expect that the roaming region is self-connected, we should initiate our random walk from as many x0 as possible. To seed the random walk we took configurations satisfying both the landscape and roaming criteria from e molecular dynamics trajectories (see Section 2.3.2) that went to molecular products (roaming trajectories). These criteria yielded 200–9000 points in the roaming region for each landscape energy, EL (see Table D.1 for details). Using each of the points as a different value for x0 , we generated Markov chains of length 105 , recording values every 100 moves. These points were then shuffled and used as intermediate roaming e points in the geodesic path-finding algorithm; see Section 2.3.4 156 EL / cm–1 acceptance ratio seed configurations 34500 0.827227 248 ∗35000 0.849454 2071 35500 0.865258 2588 ∗36000 0.875344 3379 36223 0.878148 4040 37000 0.889417 6025 ∗37600 0.895281 7627 37688 0.895886 7149 38000 0.898983 7168 38688 0.903714 8884 ∗38814 0.903394 8723 ∗41010 0.904021 7319 ∗45000 0.942785 6432 Table D.1: Energy-specific acceptance ratios and seed configurations We give acceptance ratios and the number of seed configurations for all energies for which we sampled the roaming region. For all walks, δx = 0.27562 a0 was fixed as determined at EL = 37600 cm–1 . For each seed configuration, we computed walks of 105 steps, recording a configuration every 100 moves. An asterisk (*) indicates energies for which geodesics were actually computed. APPENDIX E INTEGRATING EXTERNALLY- SUPPLIED FORTRAN CODE Joel Bowman’s group at Emory University provided us with the global potential energy surface (PES) for formaldehyde [54]. Briefly, the function was formed by splicing together several fits to two sets of high level (CCSD(T)/aug-cc-PVTZ and MR-CI/aug-cc-PVTZ) ab initio calculations. Bowman transmitted the function as a set of “zipped” FORTRAN source code files. When decompressed, the files represent about 5000 lines of code. For future checks of the integrity of these files, their sha256 checksums [88] are given here: % sha256sum *.f f708238435ffaa0ec30088327440d4ac6fcfdb3fe55d9e67ff08fc6e50525437 bonds.f 82f074e9c3d350e76c8ca152425b4c37258914bb6e1f12fd9014a9437f259cf0 getpot.f 3eea2fd4a3a5c52df4e803460700debfd00e59638653eb846c63839c03133f22 h2co-ccsdt.f 181db4ab7fe3aec03eb0ebaa6a3c397755d1b2ba6aceea31db744b5a9e10c39b h2co-mrci.f d4d01e238d90375e4e43fc0756e5a552c4cd15d66db3821693bd8b624d0c63a8 main.f % As the provided code was written in FORTRAN and the project implemented in C, some effort had to be expended to interpolate between the two. Primarily, we want to be able to call the subroutine Getpot, contained in the file getpot.f. 157 158 This function and relevant declarations are given below. The declarations are taken from the file main.f, which calls Getpot. REAL(8), DIMENSION(36,3) :: xx REAL(8), DIMENSION(36) :: xmass REAL(8) :: v,v1,v2,s INTEGER :: natom subroutine Getpot (natom,xmass,xx,v,v1,v2,s) This snippet declares two double precision arrays of dimension 36 × 3 and 36 × 1 called xx and xmass respectively as well as 4 double precision floating point values and an integer. The subroutine Getpot takes these variables as parameters. A call to getpot reads the contents of xx, xmass, and natom then sets the values of v, v1, v2, and s. The variable v contains the value of the potential at the position specified by xx; the other variables can be used internally. The following comment in the file h2co-mrci.f: c note h1 and h2 are not equivalent, it is assumed that c h2 is further away from the other atoms than h1 suggest that the two hydrogens used internally are not equivalent. Indeed, one sees differences on the order of a few percent when permuting the hydrogens if one is far removed from the formyl. Our best-guess interpretation of the comment is that the hydrogen most removed from the total center of mass should be input to Bowman’s function as ‘h2’. We respect this stricture by including a check on the condition in our wrapper around Bowman’s code. When it was not satisfied, we swap the hydrogens in the external function call while preserving our internal representation. In this way, the hydrogens are indeed identical to our wrapped function. The two arrays are of length 36 because the code was written with reuse in mind. The natom parameter is used to specify the portion of the arrays used. The code accepts positions in Cartesian coordinates (in the order H1 , O, C, H2 ), which are then converted to internal, rotationally and translationally invariant coordinates. The 4 atom system has 12 Cartesian coordinates and 12 – 6 = 6 internal coordinates. The internal coordinate 159 system in the Bowman-group code is the set of all inter-atomic distances1 . The sources were compiled to object files using the command gfortran -O3 -c bonds.f getpot.f h2co-ccsdt.f h2co-mrci.f. The flag -O3 indicates the highest level of compiler optimizations and the flag -c indicates that the files should be compiled, but not linked. Once compiled, the object files can be inspected with the command nm, e.g.: % nm --defined-only getpot.o 0000000000000000 T getpot_ 0000000000000010 d rech.1042 0000000000000008 d rehh.1049 0000000000000000 d reoh.1050 % Here, we can see that the function included in the file getpot.o is getpot_ (indi- cated by T). Note the added underscore; the naming convention is compiler-dependent2 . This is the name of the function we call from C code. To call this function in a C source file, we declare an external, void function using the name found above: extern void EXTGETPOT(const int * natom, const double xmass[36] ,double xx[3][36], double * v, double * v1 ,double * v2, double * s); Several things are noteworthy. First we constant, EXTGETPOT, so that the name of the FORTRAN function can be specified at compile time (for gfortran, from the GNU com- piler collection, the appropriate call is getpot_). Second, all variables in FORTRAN are passed by reference. Therefore C variables that are not already pointers (arrays are fine) should be passed as pointers. Finally, FORTRAN’s memory layout for multidimensional arrays is the transpose of C’s. For this reason, xx is declared as 3 × 36 instead of 36 × 3. With an external function declared, it can be used in code! The following function takes a gsl_vector* describing the position of all centers and returns the potential 1 Were there more than 4 centers, this set would over-determine the system in 3D. N(N–1) > 3N – 6 2 for N > 4. 2 The xl compilers used on IBM Blue Gene systems, for instance do not add the underscore. 160 energy of the configuration via a call to EXTGETPOT (perhaps getpot_) while respecting difference between h1 and h2. Comments in the Bowman group’s code suggest that the return value is in wavenumbers, but it is not (we verified this by comparison to their published values [54]). The value is returned in atomic units or Hartrees (1 Ha = EH = 2.19474 × 105 cm–1 ; see also Section 2.1.1). double potential_formaldehyde(const gsl_vector * r){ assert(r->size == 12); static const int natom = 4; static const double xmass[36]={ 1837.15 // H ,29156.95 // O ,21874.66 // C ,1837.15 // H }; //will initialize the first 4 compnents and leave the others //double xx[3][36]; static double v = 0, v1 = 0, v2 = 0, s = 0; static double xx[3][36]; //transpose of fortran's 36x3 for (size_t coor = 0; coor < 3; coor++){ for (size_t atom = 0; atom < 4; atom++){ xx[coor][atom]=gsl_vector_get(r,(atom*3)+coor); } } if (HOCHgetCMDist(r,formaldehydeOffset.H1) > HOCHgetCMDist(r,formaldehydeOffset.H2)){ double temp; const int H1 = 0; const int H2 = 3; 161 for (size_t coor = 0; coor < 3; coor++){ temp = xx[coor][H1]; xx[coor][H1] = xx[coor][H2]; xx[coor][H2] = temp; } } EXTGETPOT(&natom, xmass, xx, &v, &v1, &v2, &s); return v; } The following constants and helpers are also defined in a separate header file. #ifndef EXTGETPOT #define EXTGETPOT getpot_ #endif const int formaldehydeNumCenters = 4; const size_t formaldehydeNumDimensions = 12; const double formaldehydeMassTotal = 54705.91; const struct offset_t formaldehydeOffset = { .H1 = 0, .O = 3, .C = 6, .H2 = 9 }; const struct mass_t formaldehydeMass = { .H1 = 1837.15, 162 .O = 29156.95, .C = 21874.66, .H2 = 1837.15 }; const double formaldehydeMassArray[4] = { 1837.15, 29156.95, 21874.66, 1837.15}; const size_t formaldehydeOffsetArray[4] = {0,3,6,9}; Using the constants defined above, we compute the Hydrogen to center-of-mass separation in the following function: /* gives the distance of the offset-specified atom from the total center of mass. */ double HOCHgetCMDist(const gsl_vector *r, size_t offset){ const size_t space = 3; double cm_array[space]; gsl_vector_view cm = gsl_vector_view_array(cm_array, space); gsl_vector_set_zero(&cm.vector); for (int i = 0; i < formaldehydeNumCenters; i++){ gsl_vector_const_view v = gsl_vector_const_subvector( r,formaldehydeOffsetArray[i],space); gsl_blas_daxpy( formaldehydeMassArray[i] / formaldehydeMassTotal, &v.vector, &cm.vector); } 163 gsl_vector_const_view atom = gsl_vector_const_subvector( r, offset, space); gsl_vector_sub(&cm.vector, &atom.vector); return gsl_blas_dnrm2(&cm.vector); } The C file is compiled to object code in the usually way: % gcc -std=c99 -Werror -pedantic -Wall -W -Wmissing-prototypes \ -Wstrict-prototypes -Wconversion -Wshadow -Wpointer-arith -Wcast-qual \ -Wcast-align -Wwrite-strings -Wnested-externs -fshort-enums -fno-common \ -g -O2 -DHAVE_INLINE -I/share/apps/include -c potential_formaldehyde.c % A number of errors are checked by enabling the compiler warnings and is recom- mended by the GNU Scientific Library (GSL) project [62]. Note the -I flag, which specifies the location of the GSL headers; these may be in an alternate location. As of this writing, the version installed on ted is 1.14; version 1.99 is installed under /home/vale/toor/. With C and FORTRAN object files in hand, the program can be linked: % gcc bonds.o getpot.o h2co-ccsdt.o h2co-mrci.o \ potential_formaldehyde.o # ... other C objects follow \ -lgsl -lm -lgfortran -lgslcblas -L/share/apps/lib \ % The last line of flags link the GSL, C math, and GCC FORTRAN libraries respectively. The flag -lgslblas specifies using the GSL Basic Linear Algebra Subprograms (BLAS) library. Another, optimized library could also be used. We attempted using the TACC implementation, GotoBLAS23 , using the flags -lgoto2 -L/share/apps/GotoBLAS2, but performance was substantively harmed4 . The location of the GSL library must be 3 See https://www.tacc.utexas.edu/tacc-projects/gotoblas2. 4 Note to future (student) readers, it might be worth trying the ATLAS implementation too. 164 specified with an -L flag. Two quick notes about performance: Upon profiling the code we used to compute geodesics, we found that approximately 95% of the program’s effort was spent in the externally supplied FORTRAN code. This does not imply that the Bowman code is inefficient, but just that fitting the formaldehyde potential is computationally demanding. One result of this observation is that the program to find geodesics could have been written in just about any reasonably performant language and we would have retained the same order-of-magnitude performance as long as it was able to efficiently call the Bowman code. Also, modern compilers are good things to use as compiler technology has improved and can generate faster code. The version of gcc on ted is ancient—4.4 came out nearly a decade ago. By installing newer versions (e.g. 6.2) we were able to achieve an order of magnitude speed-up in pure C code written for analysis. E.1 Form of switching functions The switching functions employed by Bowman’s group in constructing the fit are of the form given below:     1 x≤0  w(x) = 1 – 10x3 + 15x4 – 6x5 0