thesis of doctorate degree Infrared Spectroscopy of Small Biomolecules in the Gas Phase Åke Andersson Department of Physics University of Gothenburg Gothenburg, Sweden 2025 Doctoral Dissertation in Physics Department of Physics University of Gothenburg 412 96 Gothenburg, Sweden Main supervisor: Prof. Vitali Zhaunerchyk, Department of Physics, University of Gothenburg Examiner: Prof. Dag Hanstorp, Department of Physics, University of Gothenburg Opponent: Prof. Timothy S. Zwier, Department of Chemistry, Purdue University © Åke Andersson, 2025 ISBN: 978-91-8115-306-4 (printed) ISBN: 978-91-8115-307-1 (online) URL: http://hdl.handle.net/2077/87128 Cover: The dominant conformer AAFA-1 of the neutral capped tetrapeptide Ace-Ala-Ala-Phe-Ala-NH2, with H-bonds drawn as dotted cyan lines. Printed by Stema Specialtryck AB, Borås Typeset in X LEATEX with Times Ten and Akzidenz-Grotesk ii Abstract Proteins are macromolecules essential to all forms of life. Their biological function is related to how they fold, which is determined by their primary sequence, but the exact relationship is not understood. Physical modeling offers a bottom-up answer to this question, but the models need calibration on experimental data of similar biomolecules. Infrared (IR) spectroscopy is a powerful tool for determining molecular structure, and in turn validating physical models. The infrared range corresponds to molecu- lar vibrations, which are intimately related to the geometric structure. Thus, by performing IR spectroscopy on biomolecules, it is in the aggregate possible to judge physical models. Simulations of molecules are best done in an isolated environment. To match this, the spectroscopic experiments should be performed in the gas phase, where molecules are isolated, and the intrinsic properties can be studied. The low density of the gas phase requires particular action spectroscopy techniques to be used. This thesis focuses on action spectroscopy techniques for biomolecules in the gas phase, and all theory required to describe it. Four such techniques are applied to small biomolecules: IR multiple photon dissociation (IRMPD) of ions, IRMPD of neutrals combined with ionizing vacuum ultraviolet radiation (IRMPD–VUV), ultraviolet (UV) resonance-enhanced multiple photon ionization (REMPI), and conformer-specific IR–UV ion dip spectroscopy derived from REMPI. In addition, an experimental setup for the last two techniques has been constructed. The established IRMPD technique is applied to proton-bound dimers of asparagine. Using partial isotope labeling, the IR resonances are assigned vibrational modes. Corresponding quantum chemical calculations are undertaken to predict the spec- trum and infer the structure. Also, the recently developed IRMPD–VUV technique is applied to the pentaala- nine peptide, in belief of it forming a miniature helix. Both experiment and theory indicate multiple populated conformers. Although their structures are not able to be determined, the chemical environment of the carboxyl (COOH) group can be inferred. Finally, the conformer-specific IR–UV ion dip spectroscopy technique is applied to phenylated polyalanine peptides. The phenyl group allows for REMPI and ion dip spectroscopy. In contrast to pentaalanine, the experiment shows only one con- former, which is supported by theory. iii Sammandrag Proteiner är makromolekyler som är nödvändiga för alla livformer. Deras biolo- giska funktion är kopplad till hur de veckas, vilket bestäms av deras primärsekvens, men det exakta sambandet är inte förstått. Fysikalisk modellering erbjuder en lös- ning från grunden i denna fråga, men modellerna behöver kalibreras med experi- mentella data från liknande biomolekyler. Infraröd (IR) spektroskopi är ett kraftfullt verktyg för att bestämma molekylär struktur och därmed validera fysikaliska modeller. Det infraröda området mot- svarar molekylära vibrationer, som är nära relaterade till den geometriska struk- turen. Genom att utföra IR-spektroskopi på biomolekyler är det sammantaget möjligt att bedöma fysikaliska modeller. Simuleringar av molekyler görs bäst i en isolerad miljö. För att matcha detta bör spektroskopiska experiment utföras i gasfas, där molekyler är isolerade och de inneboende egenskaperna kan studeras. Den låga densiteten i gasfasen kräver att särskilda tekniker för verkansspektroskopi används. Denna avhandling fokuserar på verkansspektroskopitekniker för biomolekyler i gasfas och all teori som krävs för att beskriva dessa. Fyra sådana tekniker tilläm- pas på små biomolekyler: IR flerfotondissociation (IRMPD) av joner, IRMPD av neutrala molekyler kombinerad med joniserande vakuumultraviolett strålning (IRMPD–VUV), ultraviolett (UV) resonansförstärkt flerfotonjonisering (REMPI), och konformerspecifik IR–UV-jon-dip-spektroskopi baserad på REMPI. Dessu- tom har en experimentell uppställning för de två sista teknikerna konstruerats. Den etablerade IRMPD-tekniken tillämpas på protonbundna dimerer av aspara- gin. Genom partiell isotopmärkning tilldelas IR-resonanserna vibrationsmoder. Motsvarande kvantkemiska beräkningar utförs för att förutsäga spektrumet och dra slutsatser om strukturen. Vidare tillämpas den nyligen utvecklade IRMPD–VUV-tekniken på peptiden pen- taalanin, i tro om att den ska bilda en miniatyrhelix. Både experiment och teori indikerar flera förekommande konformer. Även om deras strukturer inte kan bestämmas, kan den kemiska miljön kring karboxylgruppen (COOH) avläsas. Slutligen tillämpas den konformerspecifika IR–UV-jon-dip-spektroskopitekniken på fenylerade polyalaninpeptider. Fenylgruppen möjliggör REMPI och jon-dip- spektroskopi. I motsats till pentaalanin visar experimentet endast en konformer, vilket stöds av teorin. iv Preface This thesis is a compilation of papers written during my time as a PhD student. Chapter 1 provides a foundational overview accessible to university students, and may be skipped by readers familiar with spectroscopy. The remaining chapters follow the IMRaD standard for scientific writing. Abbreviations ADMP Atom-centered Density Matrix Propagation B3LYP Becke’s 3-parameter Lee–Yang–Parr (functional) BOMD Born–Oppenheimer Molecular Dynamics DFM Differential Frequency Mixing DFT Density Functional Theory FEL Free Electron Laser FELIX Free Electron Laser for Infrared eXperiments FWHM Full Width at Half Maximum GU University of Gothenburg HF Hartree–Fock IR InfraRed IRMPD InfraRed Multi-Photon Dissociation IVR Intramolecular Vibrational energy Redistribution MALDI Matrix-Assisted Laser Desorption/Ionization MCP Micro-Channel Plate (detector) MD Molecular Dynamics Nd:YAG Neodymium-doped Yttrium Aluminum Garnet OPA Optical Parametric Amplifier OPO Optical Parametric Oscillation PES Potential Energy Surface RDG Reduced Density Gradient REMPI Resonance-Enhanced Multi-Photon Ionization RMSD Root Mean Square Distance SHG Second Harmonic Generation TOF Time-Of-Flight UV UltraViolet VPT2 2nd-order Vibrational Perturbation Theory VUV Vacuum UltraViolet v List of papers This thesis consists of an introductory text and the following papers: I IRMPDSpectroscopy of Homo- andHeterochiral Asparagine Proton-Bound Dimers in the Gas Phase Åke Andersson, Mathias Poline, Kas J. Houthuijs, Rianne E. van Outersterp, Giel Berden, Jos Oomens, and Vitali Zhaunerchyk Published in: The Journal of Physical Chemistry A, 125, pp. 7449–7456, 2021, doi: https://doi.org/10.1021/acs.jpca.1c05667. My contributions: Partaking the in experiment, experimental data analysis, some theoretical analysis, and writing the whole manuscript. II Indication of 310-Helix Structure in Gas-Phase Neutral Pentaalanine Åke Andersson, Vasyl Yatsyna, Mathieu Linares, Anouk Rijs, and Vitali Zhaunerchyk Published in: The Journal of Physical Chemistry A, 127, pp. 938–945, 2023, doi: https://doi.org/10.1021/acs.jpca.2c07863. My contributions: Experimental analysis, theoretical analysis, and writing the whole manuscript. III IR Spectroscopic Studies of Gas-Phase Peptides Åke Andersson, and Vitali Zhaunerchyk Submitted to publisher in 2024 as a review chapter. My contributions: Literature search and writing the majority of the manuscript. IV IR-UV Ion-Dip Spectroscopy of Capped Phenylated Polyalanines in the Gas Phase Åke Andersson, Piero Ferrari, Imre Bakó, and Vitali Zhaunerchyk In manuscript. My contributions: Project planning, partaking in the experiment, experimental analysis, theoretical analysis, and writing the whole manuscript. vi Additional papers Additionally, I have contributed to the following papers: V Structure of Proton-Bound Methionine and Tryptophan Dimers in the Gas Phase Investigatedwith IRMPDSpectroscopy andQuantumChemical Cal- culations Åke Andersson, Mathias Poline, Meena Kodambattil, Oleksii Rebrov, Estelle Loire, Philippe Maître, and Vitali Zhaunerchyk Published in: The Journal of Physical Chemistry A, 124, pp. 2408–2415, 2020, doi: https://doi.org/10.1021/acs.jpca.9b11811. My contributions: Partaking in the experiment, experimental data analysis, and writing the whole manuscript. VI Single-Photon Hot-Electron Ionization of C70 Åke Andersson, Luca Schio, Robert Richter, Michele Alagia, Stefano Stranges, Piero Ferrari, Klavs Hansen, and Vitali Zhaunerchyk Published in: Physical Review A, 107, p. 013103, 2023, doi: https://doi.org/10.1103/PhysRevA.107.013103. My contributions: Partaking in the experiment and experimental data analysis. VII Comment on “CumulantMapping As the Basis ofMulti-Dimensional Spec- trometry” by Leszek J. Frasinski, Phys. Chem. Chem. Phys., 2022, 24, 20776–20787 Åke Andersson Published in: Phys. Chem. Chem. Phys., 25, pp. 32723–32725, 2023, doi: https://doi.org/10.1039/D3CP02525J. My contributions: All of it. VIII Parametric Cumulant Mapping: A Multidimensional Correlation Method Intended for Experiments with Fluctuating Event Rates Anna Brandt, Åke Andersson, and Vitali Zhaunerchyk Submitted to Journal of Physics B. My contributions: Development of the theory. vii Contents 1 Foundation 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Biomolecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Spectroscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 Introduction 9 2.1 The Protein Folding Problem . . . . . . . . . . . . . . . . . . . . . . 9 2.2 IR Spectroscopy in the Gas Phase . . . . . . . . . . . . . . . . . . . 10 2.3 Action Spectroscopy Techniques . . . . . . . . . . . . . . . . . . . . 11 2.4 Prediction of Vibrational Spectra . . . . . . . . . . . . . . . . . . . 13 3 Experimental Methods 15 3.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2 Spectroscopy Techniques . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.1 IRMPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.2 REMPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.3 IR–UV Ion dip . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3 Setup Components . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3.1 Molecular Sources . . . . . . . . . . . . . . . . . . . . . . . 21 3.3.2 Tunable Light Sources . . . . . . . . . . . . . . . . . . . . . 22 3.3.3 Mass Spectrometers . . . . . . . . . . . . . . . . . . . . . . . 26 3.4 Local GU Laboratory . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4.1 Source Chamber . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4.2 Interaction Cross . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4.3 Mass Spectrometer . . . . . . . . . . . . . . . . . . . . . . . 31 viii 4 Computational Methods 33 4.1 Goal and Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 Molecular Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.3 Conformer Searching . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.3.1 Conformer Distance . . . . . . . . . . . . . . . . . . . . . . 35 4.3.2 Energy-Blind Sampling . . . . . . . . . . . . . . . . . . . . . 35 4.3.3 Energy-Guided Sampling . . . . . . . . . . . . . . . . . . . 36 4.3.4 Dynamical Sampling . . . . . . . . . . . . . . . . . . . . . . 37 4.4 Energy Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.4.1 Wavefunction Approximations . . . . . . . . . . . . . . . . 38 4.4.2 Density Functionals . . . . . . . . . . . . . . . . . . . . . . . 40 4.4.3 Empirical Methods . . . . . . . . . . . . . . . . . . . . . . . 41 4.5 Frequency Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.5.1 Harmonic Approximations . . . . . . . . . . . . . . . . . . . 42 4.5.2 Anharmonic Perturbation . . . . . . . . . . . . . . . . . . . 43 4.5.3 Semiclassical Dynamics . . . . . . . . . . . . . . . . . . . . . 45 4.6 Hydrogen Bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5 Results and Discussion 49 5.1 IRMPD Spectroscopy of Proton-Bound Asparagine Dimers . . . . 49 5.2 IRMPD–VUV Spectroscopy of Neutral Pentaalanine . . . . . . . . 54 5.3 IR–UV Ion Dip Spectroscopy of Phenylated Polyalanines . . . . . 60 6 Conclusions and Outlook 67 Acknowledgements 71 Bibliography 73 ix 1 Foundation This chapter is meant to give an accessible explanation of the field and its basic concepts. Terms written in italics are defined in that sentence. 1.1 Motivation Proteins are versatile macromolecules, essential to all known forms of life. They are structured like long chains, consisting of linked amino acids. A protein is char- acterized by its sequence of constituent amino acids, which remarkably determines its folded shape and biological function. Understanding these relationships is a great unsolved problem in computational biology. Physical simulations of proteins could solve the folding problem. Although an adequate physical theory at the atomic level have been known for a century, its colossal complexity renders a proper simulation of protein folding computation- ally intractable. Consequently, physicists have needed to come up with simpler theories, sacrificing accuracy while chipping off complexity. At the extreme of simplicity lie parameterized models which do not model electrons directly, and must therefore be calibrated to account for their effects. Parameterized models are typically calibrated to large sets of results of complex computations or experiments on isolated molecules. The final calibrated model is simple, but only valid on molecules similar to those in the calibration set, and lim- ited in accuracy by the original results. Thus the desire to simulate proteins implies a demand for accurate experiments on isolated molecules similar to protein. What kind of experiment should be performed to satisfy this demand? The title of this work, infrared spectroscopy of small biomolecules in the gas phase, is my answer to this. I will explain why by analyzing its parts. 1 Chapter 1. Foundation Small biomolecules refers to the objects of study in this work: proton-bound or peptide-linked oligomers of amino acids. They are subsets of proteins and there- fore representative in every aspect except for size. Being considerably smaller, they are much easier to study, both experimentally and theoretically. Spectroscopy is the study of interaction between matter and light. In this work, the matter is a molecule and the light is infrared or ultraviolet. Infrared light in particular is used to reveal the vibrational frequencies of the molecule, which re- lates to its structure and energy landscape, which is useful in model calibration. Ultraviolet light is also used, to ionize the molecule when desired. Gas-phase spectroscopy studies the molecule in the gas phase, in the absence of environmental effects. This makes the experiment directly comparable with theoretical predictions, which almost always neglect the environment for reasons of complexity. Delivering large molecules to the gas phase is an art that requires both the low-pressure environment of a vacuum chamber and some trick. To summarize, gas-phase spectroscopy of small biomolecules gives us accurate experimental results of molecules in the absence of environmental effects. These results can then be used to calibrate parameterized models for similar molecules. And those models can then be applied to simulate the folding of proteins and other biological processes at the atomic level. 1.2 Biomolecules Amino acids are the building blocks of the biomolecules considered in this work. As the name hints at, they are car- boxylic acids—molecules with a carboxyl (COOH) group— that also contain an amino (NH2) group. Additionally, ev- ery amino acid species has a characteristic side chain (R) which determines its properties. A total of 22 species occur naturally. In each of those the three groups meet at a single point, the 𝛼-carbon, as depicted in Fig. 1.1. This makes the Figure 1.1: A natural molecule chiral (unless R = H), meaning that it is distinct amino acid. The pink ball R is its side chain. from its mirror image. The two possible configurations are called enantiomers l and d. Only l occurs naturally. The carboxyl group of one amino acid can react with the amino group of another to form a peptide link, binding the two together. Thus every amino acid can bond with two neighbors, forming a peptide-linked chain. If such a chain contains more than a few dozen amino acids, it is called a protein, otherwise it is only a peptide. A peptide or protein is characterized by its sequence of linked amino acids, known as the primary structure. Figure 1.2 shows how a peptide is formed from amino acids. 2 1.2. Biomolecules + + → + Figure 1.2: Reaction scheme of peptide formation. Three amino acids condense into one tripeptide by forming peptide links between the carboxyl and amino groups. propanol (gauche) propanol (trans) isopropanol methoxyethane Figure 1.3: Related propanol molecules. All molecules above have the same formula C3H8O, and are therefore structural isomers. Furthermore, the two molecules on the left differ by only rotations along single bonds, and are thus conformers. The three-dimensional structures of a molecule, called conformations, are gen- erally not stable over time. At room temperature, single-bonds often are free to rotate, changing the structure. This is called conformational isomerism, and stable structures that differ only by rotations about single bonds are called conformers. More distantly related are structural isomers, often just called isomers, which have the same chemical formula but district bond structures. Figure 1.3 shows conform- ers and isomers of propanol. The central -[N-C-C]- sequence of a protein is referred to as its backbone. De- spite containing only single bonds, the backbone structure is relatively stable. This stability arises from intramolecular interactions, which constrain the molecules and prevent the rotations about single bonds. Examples of intramolecular inter- actions include hydrogen bonding, steric repulsion, and hydrophobic attraction. These interactions stabilize the protein into rigid patterns known as secondary structures. Figure 1.4 shows the two most common secondary structures: the α- helix and the β-sheet. Beyond secondary structures there are also tertiary and quaternary, although they are out of scope for this thesis. The tertiary structure comprises the full three- dimensional structure of the protein, stabilized by hydrophobic interactions and hydrogen bonds of topologically distant groups. Multiple proteins may combine and stabilize into more complex enzymes. Such a combination is called a quater- nary structure. 3 Chapter 1. Foundation Figure 1.4: Example of secondary structures. Left: an α-helix. Right: an antiparallel β- sheet. Shaded green strips have been added to show the curving of the backbone. + + → + Figure 1.5: Protolysis scheme of dimer formation. The hydronium cation transfers its sur- plus proton to an amino acid, enabling it to form a hydrogen bond to the other amino acid. Amino acids and peptides can be protonated, meaning they contain one extra proton, which usually sits on the amino group. Such protonated amino groups tend to participate in strong hydrogen bonds because of the surplus charge. In the gas phase, protonated and neutral amino acids can form clusters stabilized by such bonds. The simplest example is the proton-bound dimer, shown in Fig. 1.5, which consists of one protonated and one neutral amino acid. While not naturally occurring, it is interesting as a model system for hydrogen bonding. Some spectroscopy schemes that will be introduced later require the molecule to resonantly absorb light in a particular wavelength range (ultraviolet, more on this later) at which biomolecules seldom do. The solution is then to add a chro- mophore, a functional group which narrowly absorbs the desired wavelength. The most popular choice in this field is the phenyl group, which naturally occurs in the amino acid phenylalanine. With such an absorption feature, it will be possible to perform spectroscopy on and deduce the three-dimensional structure of a specific conformer. 1.3 Spectroscopy Spectroscopy is broadly defined as the study of interaction between matter and light. There are three major types of it: emission, scattering and absorption spec- troscopy; corresponding to the creation, reflection and destruction of photons, which carry energy proportional to their frequency. 4 1.3. Spectroscopy Light Una ected matter Abs. A ected matter Matter Transmitted light Figure 1.6: Abstract diagram of general absorption spectroscopy. Light and matter inter- act, resulting in some matter absorbing light and becoming affected. In an experiment, at least three of the five quantities above are measured to deduce the absorption rate. This work is confined within absorption spectroscopy, where experiments typi- cally study the absorption rate as a function of wavelength by colliding molecules with photons, see Fig. 1.6. In such experiments, three signal amounts are observed: relaxed matter, excited matter, and transmitted light. In the context of gas-phase spectroscopy of biomolecules, the rate of absorption is relatively low, which makes the amount of transmitted light an unreliable variable. Instead, the amounts of re- laxed and excited matter is measured. This is called action spectroscopy because it measures the action of light unto matter. We are used to thinking of light frequency (or wavelength) as a scale from red to violet, but that is just the visible part. Light is really electromagnetic radiation which can have any frequency. Examples include (Fig. 1.7) infrared (IR) light at 60THz (2000 cm−1), visible orange light at 500THz (600nm), and ultraviolet (UV) light at 1200THz (250nm). In this thesis, a wide array of light frequencies will be used, from 10.5THz (350 cm−1) to 2540THz (118nm). The way light interacts with matter depends on its frequency. Infrared light is resonant with the vibrational modes in molecules. In particular, the mid-IR range of 24–120THz (800–4000 cm−1) contains resonances with local vibrations 800 cm -1 4000 cm -1 700 nm 400 nm 200 nm Far-IR Mid-IR Near-IR Vis. UV VUV 10 THz 30 THz 100 THz 300 THz 1000 THz 3000 THz Figure 1.7: Light frequency regions, from far-IR to vacuum UV. 5 Chapter 1. Foundation involving a few nuclei, while the far-IR range of 0.3–24THz (10–800 cm−1) con- tains the frequencies of molecule-wide vibrational modes. The set of resonances in the range of 100–1400 cm−1 are virtually unique for a given conformer, and there- fore referred to as its molecular fingerprint. By performing spectroscopy in these IR frequency ranges and cross-validating with theoretical analyses, the molecular geometry can be inferred. In contrast to IR, UV photons carry enough energy to excite the electronic state of the molecule, or even ionize it by knocking off an electron. For example, the previously mentioned phenyl group can be electronically excited by an UV photon of frequency 1122THz (267.3nm), and then ionized by another. Higher energy UV photons (above 1500THz or equivalently below 200nm) can directly ionize molecules by themselves, and must therefore be transported in vacuum. Hence, such UV photons are also called vacuum UV (VUV) photons. By combining IR and UV photons, even more information can be extracted. Most notably, conformer-specific spectroscopy becomes possible when a chromo- phore is used. The end result is then not just a single molecular geometry, but rather one molecular geometry per conformer. Descriptions of different spec- troscopy schemes including this one are given in Section 3.2. 1.4 Computation In order to infer the structure from a vibrational spectrum, theoretical predictions for each plausible structure must be computed. Such plausible structures are called stable conformers, and finding them is a difficult task. When they are found, their predicted stability and vibrational spectrum can be calculated. This is also a hard task, but it has established solutions. Finally, the experimental spectrum can be compared to predictions, in order to confirm a structure. Finding stable conformers is difficult, because the space of candidate conformers has a high dimension and cannot be exhaustively covered. Instead, strategies for conformer searching rely on randomness and adaptation. Even then, assessing the completeness of a search is as hard as the searching problem itself. Some strategies for finding conformer are described in Section 4.3, along with a discussion of their strengths and weaknesses. The probability 𝑝 of a molecule adopting a certain conformer at a given tempera- ture and pressure is directly related to its Gibbs energy 𝐺, through the Boltzmann equation 𝑝 ∝ exp(−𝐺/𝑘B𝑇 ). Accurately determining the energy (Gibbs or any other) of a molecule is a hard but well-studied problem, that requires a solution based on quantum mechanics. Section 4.4 discusses this in detail, but a softer in- troduction is given below: 6 1.4. Computation Quantum mechanics provides a fundamental framework for understanding the behavior of atoms and molecules. It discards the classical mechanics concepts of precise position and velocity, replacing them with a wavefunction that assigns a probability to each possible state of the physical system. This wavefunction depends on the coordinates of all particles, resulting in an exponential compu- tational complexity scaling with system size. Therefore exact solutions are in- tractable for practical purposes, and approximate methods are necessary to study actual molecules. To reduce the complexity of a molecule, the Born–Oppenheimer approximation is always used. It relies on the fact that electrons have a much smaller mass than the nuclei, and thus “see” them as fixed when solving the Schrödinger equation. Together with the adiabatic approximation, which assumes that electronic states adjust instantaneously to nuclear motion, the Schrödinger equation for the nu- clei can be solved separately. In some contexts the semi-classical approximation is made, meaning that the nuclei are treated classically. Even with the above approximations made, the complexity of the wavefunction is an exponential function of the number of electrons. Density functional theory (DFT) handles this problem by replacing the wavefunction with the electron den- sity, whose complexity does not inherently scale with the number of electrons. This is possible because the ground state properties of a system depends in princi- ple only on the electron density, as guaranteed by the Hohenberg–Kohn theorem. In practice, determining properties from only the electron density is difficult and requires approximations. Practical implementations of DFT are quite technical and beyond the scope of this thesis. What’s worth knowing is that DFT functionals, methods for determin- ing the energy from the electron density, can be placed on a ladder of increasing complexity and accuracy. At the bottom is the local density approximation, in which the energy of each small volume element is obtained by comparison with a homogeneous electron gas of the same density. Above this level lie generalized gradient approximations, which consider not only the local density, but also its gra- dient. Higher up still are hybrid methods, which additionally considers nonlocal effects. 7 2 Introduction This chapter introduces the field of gas-phase IR spectroscopy, starting from its connection to protein folding. 2.1 The Protein Folding Problem Proteins are essential molecules present in all forms of life. They are long polymers of amino acids, folded into particular shapes determined by the primary struc- ture. The shape of a protein then determines its biological function. For example, the protein in Fig. 2.1 has folded into a restriction enzyme that cleaves DNA at a specific site. Understanding how primary structure, protein shape, and biological function relate is known as the protein folding problem, and is still an open area of research. [9] Attempts to solve the protein folding problem by accurately simulating the dy- namics of folding are hindered by complexity. The laws of quantum physics are not in question, but simply intractable for molecules as large as proteins. It has there- Figure 2.1: The restriction endonuclease enzyme EcoRI, which consists of two identical folded proteins. Its function is to cleave DNA (red) at specific sites, and it can do this because of its particular shape, which is stabilized by the relatively stiff secondary structures: α-helices (magenta) and β-strands (cyan). 9 Chapter 2. Introduction fore been necessary develop simpler molecular force field models such as AM1, [10, 11] MM3, [12] and more recently PM6. [13, 14] Over time, computational costs have decreased exponentially, enabling simulation of folding of proteins. [15, 16] Simplified models of molecular forces have many free parameters which need to be set, typically by comparison with experimental data. It is also possible to fit the parameters to computed properties of a trusted method, but such a method is trusted because of its good agreement with experiments. In the end, accurate empirical data on similar molecules are required to develop the methods. Separate from physical simulations of proteins, there are machine learning so- lutions to the protein folding problem, notably AlphaFold. [17–19] Such methods apply statistical reasoning to determine the final protein shape, and thus also re- quire empirical data. 2.2 IR Spectroscopy in the Gas Phase Spectroscopy methods study matter through its interaction with electromagnetic radiation. IR spectroscopy in particular uses frequency-tunable IR light, which covers the frequency range of molecular vibrations. The vibrational frequencies of a molecule are sensitive to the molecular structure, meaning that the latter can be inferred from the former when combined with theoretical predictions. Thus, IR spectroscopy is applicable to satisfy the need for empirical data on protein-like molecules. [20] The IR spectrum of a biomolecule consists of resonances corresponding to vibra- tional modes, which lie in the frequency range of 10–4000 cm−1. Figure 2.2 shows this frequency range and the amide modes, which are local vibrational modes, Figure 2.2: IR frequency region with the locations of amide vibrational modes. 10 2.3. Action Spectroscopy Techniques common to all peptides. As a general rule, modes involving few nuclei have high frequencies, such as the amide A stretching mode. Conversely, modes at low fre- quencies (less than say, 1400 cm−1) involve many nuclei. This makes them sensitive functions of the molecular structure, and they are therefore called molecular fin- gerprints. [21] Going further, the far-IR range corresponds to completely delocal- ized vibrational modes, which are sensitive enough to tell the specific conformer. [22, 23] Proteins are naturally found in aqueous solution, where they interact with the environment. While there are IR spectroscopy techniques that study proteins un- der natural conditions, [24, 25] the properties of such protein depend on the envi- ronment is a way that requires advanced modeling. [26, 27] In contrast to the above, gas-phase IR spectroscopy studies molecules in a iso- lated environment, and therefore give insights into intrinsic molecular properties. [21, 28] This is much closer to electronic structure simulations, which are done in vacuum by default. However, most biomolecules are too fragile to simply be boiled into the gas phase. Instead, specific soft evaporation techniques such as electrospray ionization [29] or laser desorption [30] are used. The result is that the biomolecules are delivered to the low-pressure gas phase. An arguable downside to gas-phase spectroscopy is that it does not include ef- fects from the environment. However, this is not completely true. To understand the effects of the solvent, it is possible to progressively add solvent molecules to the object of study, a practice known as microsolvation. [31, 32] 2.3 Action Spectroscopy Techniques Because biomolecules are delivered to the gas phase at a low pressure and density, some spectroscopy techniques are not available. The low density makes the total absorption cross-section quite small, meaning most photons will not interact with any molecule. This makes transmission spectroscopy unfeasible, as it depends on the ratio of transmitted photons, which cannot be measured with good signal-to- noise. Instead, action spectroscopy must be applied, which measures the ratio of molecules that experience some action caused by photon absorption. This thesis will demonstrate four different action spectroscopy techniques, and discuss their strengths and weaknesses. The first technique is IR multiple-photon dissociation (IRMPD) spectroscopy of trapped ions, which has been widely employed to study biomolecules. [1, 5, 33–38] In IRMPD, the detectable action is the dissociation of trapped parent ions into fragments, triggered by the absorption of multiple IR photons. Due to charge conservation, only one fragment inherits the charge of the parent ion. Thus the action is detectable in the ion mass spectrum, both as an decrease in the parent 11 Chapter 2. Introduction count, and increase in total fragment count. Combining the two gives an accurate measure of the fragmentation rate, which is closely related to the IR absorption rate. [39] However, it is known that the IRMPD spectrum can be slightly shifted compared to single-photon IR absorption. [40] A limitation of the IRMPD studies mentioned so far is the requirement of an ion trap, and in turn the charged state of the studied molecule. It was recently demon- strated that IRMPD can be performed in a molecular beam of neutral molecules by using the intense and long-lasting IR pulses of a free electron laser. [41] To distinguish and count the parent and fragment neutrals, they are ionized with a VUV laser pulse and accelerated by an electric field into a mass spectrometer. The method is therefore known as the IRMPD–VUV method, and has since been applied to neutral peptides. [42, 43] Because the IRMPD–VUV method does not use a trap, and therefore fragments molecules with a single pulse, it should in theory be applicable only to molecules up to a maximal size. One of the papers in this thesis applies the method to the largest molecule yet: the pentaalanine peptide. [2] Conformer-Sensitive Techniques Some biomolecules adopt several conform- ers, each with a distinct IR spectrum. An experiment using the IRMPD techniques above can only obtain a weighted sum of those IR spectra, which complicates the comparison with predictions. One way to perform conformer-specific IR spec- troscopy builds on resonance-enhanced multi-photon ionization (REMPI) using UV photons. [21] REMPI spectroscopy counts the molecules ionized by a tunable UV laser. It is not possible with all biomolecules, but rather requires the presence of a UV- absorbent functional group. The group, called a chromophore, has an excited electronic state with a long lifetime. such electronic states are conformer-sensitive, meaning that the REMPI spectrum a biomolecule consists of resolved resonances belonging to its populated conformers. Potential options for the chromophore include aromatic rings, with the simplest example being the phenyl group, a simple six-carbon ring structure. Three of the amino acids naturally found in proteins contain an aromatic ring: phenylalanine, [44] tyrosine, [45] and tryptophan. [46] It is also possible to put a phenyl-based cap on the end of a peptide. [47] Conformer-specific IR spectroscopy can be achieved by adding an IR laser to the REMPI experiment. The absorption of an IR photon will then reduce the effectiveness of REMPI, causing the ion signal to dip. This technique is therefore called double-resonance IR–UV ion dip spectroscopy, [48] but also resonant ion- dip IR (RIDIR) spectroscopy. [49] Such studies are popular because they result in one IR spectrum per conformer, which can then be compared with predictions 12 2.4. Prediction of Vibrational Spectra and validate not only the spectra, but also the relative abundances. [44–47, 50, 51] 2.4 Prediction of Vibrational Spectra The obtained IR spectrum from an experiment is not by itself reversible to a molec- ular structure (conformer), although it does indicate the presence or absence of important functional groups. Instead, the experimental spectrum must be com- pared to predicted spectra of likely conformers. Generating the conformers and predicting their spectra are two separate tasks. Generating the conformers amounts to choosing values for the torsional angles in a molecule whose bond structure is already known. The space of conformers is therefore high-dimensional and impossible to fully explore. Instead, a sparse sampling strategy must be employed. There are a few algorithms for this pur- pose, based on independently randomized angles, [52] randomized internucleic distances, [53] mutation of known conformers, [54] and molecular dynamics simu- lations. [55] The output of such an algorithm is a long list of conformers, of which a subset the vibrational spectra are computed. It is hard to gauge whether suffi- ciently many conformers have been generated, but agreement between a predicted spectrum and experiment is considered proof of correctness. Given a specific conformer, its energy can be computed using density func- tional theory (DFT), which is well established. [56] This is used to rule out high- energy conformers, but also relates to the vibrational spectrum. By considering the derivatives of energy with respect to the positions of the nuclei, it is possible to predict the vibrational spectrum. [57] Such predictions are decently accurate, and better if higher order derivatives are considered. [58, 59] An alternative method for vibrational spectra prediction is the Born–Oppen- heimer molecular dynamics (BOMD) method. [60] Rather than considering a fix geometry, it is based on molecular dynamics simulations, thereby exploring the surrounding part of the conformational space. Although it has correctly predicted the spectra of some peptides, [61] the BOMD method has some fundamental flaws that will be discussed. In particular, the anharmonic effects and temperature de- pendence will be problematized. 13 3 Experimental Methods This chapter is meant to explain how action spectroscopy is performed in practice, down to every significant component. 3.1 Objective The objective of every experiment in this thesis is to obtain some spectrum of a molecular species in the gas phase. At the very minimum, an experimental setup for action spectroscopy must contain three components: a source that produces gas-phase molecules, a wavelength-tunable laser that produces photons to collide with said molecules, and a detector that counts the molecules afterwards in order to determine the absorption rate. All of these are placed in a vacuum chamber which require pumps for operation. The exact realization of these components depends on the spectroscopy technique. The relevant spectroscopy techniques will be listed in the next section, and the realization of these components will be described in the subsections of Section 3.3. 3.2 Spectroscopy Techniques There are a few spectroscopy techniques used in this thesis, differentiated by what kind of photons the target molecules absorb. 3.2.1 IRMPD Infrared multi-photon dissociation (IRMPD) is an action spectroscopy technique in which the molecule sequentially absorbs IR photons until it dissociates. [21] It 15 Chapter 3. Experimental Methods Figure 3.1: Schematic representation of IRMPD. Upon absorbing a photon, the resonant vibrational mode becomes excited and unable to absorb from the monochromatic laser beam. After some energy redistribution, the molecule can absorb again. Eventually, the molecule dissociates. is accomplished by exposing the molecules to a sufficiently strong IR laser beam. The ratio of fragmented to parent molecules can then be used to estimate the ab- sorption rate. To understand the rate of IRMPD, one must take anharmonic effects into ac- count. In a harmonic oscillator the vibrational energy levels would be equidistant, but in reality the spacing decreases as the mode energy increases. The consequence for spectroscopy is that after a vibrational mode absorbs a photon, its resonance frequency decreases and no longer aligns with the laser frequency, making fur- ther absorption unlikely. However, another anharmonic effect allows the excited mode to relax by transferring its energy to other modes, a phenomenon known as intramolecular vibrational energy redistribution (IVR). [62] After IVR has oc- curred, which takes about 100 fs, [63] the molecule can absorb once again. After a few (typically 5–10) cycles of absorption and IVR, the molecule dissociates be- cause the added energy is sufficient to break a bond. Figure 3.1 gives a schematic representation of the entire IRMPD process. Because IRMPD involves relatively slow IVR cycles, the molecules must be ex- posed to IR radiation for a sufficiently long time. This can be accomplished in two ways: with a long pulse or with an ion trap. Free electron laser macropulses are about 10µs long, which is sufficient. Electromagnetic traps such as Penning traps can be used to hold ions in place, allowing multiple pulses with shorter duration to cause IRMPD. A common experimental setup for IRMPD spectroscopy is to have an electro- spray ionization source pointing into a Penning trap. [39] The trapped ions can then be exposed to a laser beam long enough to cause IRMPD, causing some ions to fragment. The ratio of fragmented ions can then be inferred from the radiation. More details are given in the relevant subsections of Section 3.3. 16 3.2. Spectroscopy Techniques As a rule of thumb, spectroscopic events involving 𝑛 photons scale with the laser pulse energy 𝐸 like 𝐸𝑛. This has been shown not to be the case for IRMPD; the fragment fluence is zero when the pulse energy is below a threshold and increases nonlinearly beyond it. [39] An explanation for this behavior is that the IRMPD rate is limited by the first absorption. The nonlinear scaling makes normalization with respect to pulse energy a challenge. For a fixed pulse energy, the IRMPD rate 𝐴 at some frequency 𝜈 can be calcu- lated from the amounts of fragmented and parent ions. If there are 𝑁0 parent ions initially, and during the pulse each remaining parent ion has a constant chance of fragmentation, then 𝑁pa = 𝑁0 exp(−𝐴𝑡) parent ions will remain after time 𝑡, and 𝑁fr = 𝑁0(1−exp(−𝐴𝑡)) fragment ions will have been created. Solving for 𝐴 gives the IRMPD rate 𝐴(𝜈) = 1 (1 + 𝑁ln fr(𝜈)𝑡 𝑁 ). (3.1)pa(𝜈) This formula becomes unstable when 𝑁pa is small, which physically means that the laser pulse is strong enough to fragment most ions. The solution is then to lower the pulse energy. Figure 3.2 shows how the pulse energy affects the IRMPD spectrum of the pro- tonated methionine dimer Met2H +. The highly IR-active modes around 1400 and 1750 cm−1 involve bending and stretching of the carboxyl functional group, respec- tively. Low pulse energy (about 5mJ) is required to see the features; a higher pulse energy would saturate and broaden the signal. On the other hand, the less IR- active modes below 1000 cm−1 are best seen with high pulse energy (about 20mJ). At 1070 cm−1 both effects are seen: the shoulder peak is the most visible at mid pulse energy (about 12mJ) because at higher energies it becomes subsumed by the larger peak at 1150 cm−1. Met2H + High Mid Low 600 800 1000 1200 1400 1600 1800 Frequency (cm!1) Figure 3.2: IRMPD spectrum of the protonated methionine dimer, taken from Ref. [5]. 17 IRMPD rate Chapter 3. Experimental Methods IRMPD-VUV Neutral molecules are not as easily trappable as ions because they do not respond to electromagnetic fields, but it is still possible to perform IRMPD spectroscopy on a neutral molecular beam. [41–43] The lack of a trap places a requirement on pulse duration, which is satisfied by free electron lasers. In order to separately count the fragments and parent molecules, they are ionized by high energy so called vacuum ultraviolet (VUV) photons and accelerated into a time- of-flight mass spectrometer. This setup is called IRMPD-VUV spectroscopy for that reason. Although the intended purpose of the VUV pulse is to ionize the fragments and parent molecules, there is a side effect of additional fragmentation. This is proven by the fact that fragments are detected when only the VUV pulse is applied. One way to compensate for this effect is to subtract the VUV-only signal from the main. Using subscripts to denote whether the IR laser is on or off, the IRMPD rate can be estimated as 𝐴(𝜈) = (1 + 𝑁fr,on(𝜈) 𝑁ln fr,off𝑁pa,on(𝜈) ) − ln(1 + 𝑁 ). (3.2)pa,off The derivation of this formula is given in Ref. [2]. It relies on some assumptions whose validity needs further exploration, notably that the ionization probability of the parent molecule is equal to the sum of the ionization probabilities of its frag- ments, averaged over fragmentation patterns. It is tempting to think that because the right term in Eq. (3.2) does not depend on the IR frequency 𝜈, it only needs to be measured once. However, due to the fact that the amount of initial parent molecules is difficult to control, the IR pulses are fired with every one out of two VUV pulses. 3.2.2 REMPI Resonance-enhanced multi-photon ionization (REMPI) is an action spectroscopy technique which selectively ionizes molecules using UV photons. In the context of this thesis, two photons of the same frequency are used to ionize the molecule. The rate of ionization is greatly increased when the photon energy is resonant with an intermediate electronic state. Thus the electronic structure of a molecule can be probed by a UV laser with tunable frequency. Molecules containing chromophores such as a phenyl group have an intermedi- ate state with long lifetime, corresponding to a quality factor of more than 10 000. [4] This makes REMPI very efficient and selective; it is possible to distinguish not only isomers but also conformers from their REMPI spectra. [64] This useful property, shown in Fig. 3.3, enables conformer-specific spectroscopy, something that will be the focus of the next subsection. 18 3.2. Spectroscopy Techniques Conf. A Conf. B + 0 - 0 0 - 0 * Conf. A 1 - 0 0 UV Frequency Figure 3.3: Schematic representation of REMPI spectroscopy. REMPI occurs when the photon energy is resonant with an intermediate electronic state. Such states are sharp enough to resolve conformers and vibrational quanta. The REMPI rate as a function of pulse energy is often approximated as linear function, despite being a two-photon process. Consider a three-state model where at time 𝑡 there are many molecules in the ground state, 𝑒(𝑡) in the excited interme- diate state, and 𝑖(𝑡) in the ionized state. Molecules in the ground state are exited with rate 𝐴, and excited molecules are ionized with rate 𝐵𝑒 and relax with rate 𝐶𝑒. This gives the equations d 𝑡𝑒 = 𝐴 − 𝐵𝑒 − 𝐶𝑒; d 𝑡𝑖 = 𝐵𝑒 and 𝑒(0) = 𝑖(0) = 0. (3.3)d d Assuming 𝐴, 𝐵, and 𝐶 are proportional to the pulse power, then the ion yield is a function of the pulse energy 𝐸: 𝑎𝑏 IonYield(𝐸) = −(𝑏+𝑐)𝐸(𝑏 + 𝑐)2 (𝑒 + (𝑏 + 𝑐)𝐸 − 1), (3.4) where 𝑎, 𝑏, and 𝑐 are proportionality constants. This function is quadratic at low energies ((𝑏 + 𝑐)𝐸 ≪ 1) but becomes asymptotically linear at high energies ((𝑏 + 𝑐)𝐸 ≫ 1). Figure 3.4 shows empirical proof of this nonlinear behavior. AAFA data Model fit 0 1 2 3 4 5 6 7 8 9 10 Pulse energy (mJ) Figure 3.4: REMPI signal as a function of pulse energy, measured on phenylated capped tetraalanine (AAFA). The relation is approximately linear but better described by Eq. (3.4). 19 Ion signal Ion signal Chapter 3. Experimental Methods One flaw with Eq. (3.4) is that it does not handle saturation correctly. It shows linear behavior in the high-energy limit, but this is absurd as no more than 100% of molecules can be ionized. A solution is to modify the derivation of the model by adding another variable 𝑔 for the number of ground state molecules, and replace 𝐴 with 𝐴𝑔. 3.2.3 IR–UV Ion dip IR–UV ion dip is an action spectroscopy technique based on REMPI that allows the IR spectra of specific conformers to be measured. It exploits the fact that REMPI is selective enough to only ionize a specific conformer from its ground state. By exposing the molecule to IR photons, the ground state is depopulated, causing the ion signal to dip. By measuring and comparing the ion signal with and without IR light, the IR absorption can be calculated. Figure 3.5 shows an overview of the technique. A common setup for ion dip experiments involves a neutral molecular beam, tunable IR and UV lasers, and some ion detector. Before the main experiment, the UV range of the chromophore is explored. For the phenyl group, this range is typically within 37 400–37 700 cm−1. [4, 65] This reveals REMPI resonances of the molecules, each belonging to a specific conformer. In the actual ion dip exper- iment, the UV frequency is kept fixed at a REMPI resonance while the IR laser is scanned in a wide range. The result is an single-photon IR absorption spectrum for each resonance. A simple quantitative model for IR–UV ion dip spectroscopy exists, and its derivation will be given here. Initially there are 𝑁0 molecules in the ground state. If a molecule is exposed to an IR pulse of energy 𝐸, it will remain in the ground state with probability exp(−𝑎(𝜈)𝐸), where 𝑎 is the absorbance that depends on the IR frequency 𝜈. Then, REMPI will ionize a fraction 𝑟 of the ground state + * IR on IR off 0 IR Frequency IR Frequency Figure 3.5: Schematic and data representation of IR–UV ion dip spectroscopy. REMPI lets two UV photons resonantly ionize a specific conformer in the ground state. Any IR absorption depopulates the ground state, causing the ion signal to dip. The absorption spectrum can then be calculated from the dip. 20 Ion signal Absorption 3.3. Setup Components molecules. The two measured quantities are the ion signal with IR 𝑁on(𝜈) = 𝑟𝑁0 exp(−𝑎(𝜈)𝐸), and without 𝑁off = 𝑟𝑁0. The absorbance 𝑎 can then be found as 𝑎(𝜈) = 1 𝑁𝐸 ln off 𝑁 (𝜈). (3.5)on Just like with the IRMPD-VUV method, 𝑁off does not depend on the IR fre- quency 𝜈 and could in principle be measured once. However, due to the reality that 𝑁0 is difficult to control, 𝑁on and 𝑁off are measured equally often by firing IR pulses with every one out of two UV pulses. 3.3 Setup Components This section describes the crucial components used in experimental setups in this thesis, categorized as molecular sources, tunable light sources, and mass spec- trometers. It also describes the setup in the local laboratory in Fysikforskarhuset, Gothenburg. 3.3.1 Molecular Sources The role of a molecular source is to produce gas-phase molecules for the spectro- scopic experiment. Biomolecules in particular are fragile and require a soft deliv- ery to the gas phase. There are a few ways to do this depending on the type of molecule and the desired charge state. Electrospray Ionization Electrospray Ionization (ESI) is a soft ionization tech- nique used to transfer molecules from a liquid solution into the gas phase as pos- itive (or negative) ions. [29] First, the sample to be analyzed is dissolved in an acidic (or basic) solution. This solution is then passed through a capillary needle held at a positive (or negative) electric potential on the order of kV. The nee- dle points toward a grounded electrode, causing a strong electric field, which pulls the protonated (or deprotonated) ions to the tip of the needle. As the solution is pushed through the needle, it ejects charged droplets that quickly evaporate and fission due to Coulomb repulsion, until only singular molecules and small clusters remain. The gentle nature of ESI makes it ideal for analyzing biomolecules like peptides, which are otherwise prone to denaturation. 21 Chapter 3. Experimental Methods Laser Desorption Laser desorption, meaning firing a brief laser pulse onto the sample, is a relatively soft evaporation technique that produces mostly neutral but also ionized molecules. [66] The rapid and localized heating of the pulse vaporizes the sample, delivering molecules to the gas phase without significant fragmentation due to the short duration of the pulse. The laser frequency can be in the UV range [67] or in the IR range. [30] For the experiments in this thesis, laser pulses with frequency 1064nm and energy on the order of 1mJ are used. To improve the yield, the sample is often mixed with a compound that absorbs well at the laser frequency, the technique is then called matrix-assisted laser des- orption/ionization (MALDI). In the experimental setups in this thesis, carbon power was used as a matrix compound because of its efficient absorption of 1064nm light. Supersonic Jets Although not a source, supersonic jet expansion can be used to cool gas-phase molecules. The basic setup is a high-pressure reservoir with a controllable nozzle into a vacuum chamber. [68] When the nozzle opens for a few 100µs, the gas sprays into the chamber at supersonic speeds in the shape of a jet, causing it to cool. The modeling of supersonic flows is quite complex and given elsewhere, [69] but a simple argument for the cooling exists. Because the expansion is rapid, it can be assumed to be adiabatic, implying that 𝑝𝑉 𝛾 is constant, where 𝛾 is the heat capacity ratio, equal to 5/3 for monoatomic and 7/5 for diatomic gasses. Combined with the ideal gas law 𝑝𝑉 = 𝑁𝑘B𝑇 , one finds 𝑇 ∝ 𝑉 1−𝛾, implying that the temperature decreases as the volume increases. Gas-phase molecules of study can be cooled using supersonic jet expansion in at least two ways. The first is direct mixing with the carrier gas in the reservoir, before the expansion happens. This is only possible if the molecule is naturally in the gas phase. The second is a combination with MALDI: By timing the nozzle opening just after the laser pulse, the expansion will blow onto the MALDI fumes and mix the two gases during expansion. A supersonic jet expansion is typically combined with a skimmer, which is an inverse funnel used to select the central part of the jet, which becomes a molecu- lar beam. Because location and velocity are strongly correlated in a cold jet, the molecular beam is collimated. 3.3.2 Tunable Light Sources This subsection will describe the tunable light sources used in this thesis. Free electron lasers have a flexible frequency down to the far-IR range, and a relatively large pulse energy. Table top laser systems have superior frequency resolution, but cannot cover a wide range without reconfiguration. 22 3.3. Setup Components 100 ms ~7 s 1 ns ~1 ps Figure 3.6: Anatomy of FELIX pulsing. Every 100ms, FELIX fires a train of micropulses. Infrared Free Electron Lasers The free electron laser (FEL) FELIX at Rad- boud University, Nijmegen was used for three experiments in this thesis. The de- sign and theory of FELs are described elsewhere, [70, 71] and only a brief descrip- tion will be given here, with a focus on the produced radiation. An FEL generates light from bunches of electrons that oscillate when passing through a static but periodically alternating magnetic field. The frequency of the light is a function of the magnetic field strength, and can thus be varied continu- ously. The linewidth (frequency resolution) is inversely proportional to the num- ber of periods. FELIX has a frequency range of 66–3600 cm−1 (until 2022 the upper limit was 2000 cm−1 [72]), and a typical linewidth just under 1%. Its IR pulses are emitted every 100ms, see Fig. 3.6. Those pulses last for about 7 µm and consist of mi- cropulses separated by 1ns intervals. The fact that the pulses are relatively long has positive implications for IRMPD spectroscopy; between micropulses, the vi- brational energy has time to redistribute itself. Table Top Lasers Systems The first part of any laser system is the pump laser, which in all experiments of this thesis has been some Nd:YAG laser, which outputs pulses of light with wavelength 1064nm (𝜈0 = 9400 cm−1) at a rate of 10 or 20Hz. In a linear optical component, the frequency of light is preserved. But there are also nonlinear crystals such as KD2PO4 that enable processes which change the frequency of light. The simplest example is probably second harmonic generation (SHG), which implies that two photons of frequency 𝜈 are consumed to produce a photon of frequency 2𝜈. This process requires a phase-matching condition which in practice means that the crystal must be precisely angled relative to the laser beam. A SHG KD2PO4 crystal is often installed in Nd:YAG lasers to allow for generation of 532nm (2𝜈0 = 18 800 cm−1) light. Another nonlinear process is sum and differential frequency mixing (SFM and DFM), in which two incoming frequencies 𝜈1 and 𝜈2 are converted into either 23 Chapter 3. Experimental Methods Nd:YAG OPA DFM SHG OPO 1064 nm OPO idler Mid-IR 532 nm (OPO signal) OPA idler Figure 3.7: Schematic setup for the generation of IR light from an Nd:YAG laser. Colors are not to scale, but in the correct order. A part of the 1064nm fundamental is doubled into 532nm and fed to an OPO crystal. The output OPO idler becomes the input signal in the OPA, pumped by the fundamental. The resulting OPA idler can cover the vibrational stretching range 2000–4000 cm−1. With other crystals, the signal and idler from the OPA can be combined in a DFM crystal to generate light in the wide range of 560–2500 cm−1. 𝜈1+𝜈2 or |𝜈1−𝜈2|. For example, Nd:YAG lasers often also contain a SFM KD2PO4 crystal that outputs light of wavelength 355nm (3𝜈0 = 28 200 cm−1). DFM ArGaSe2 crystals can be used to mix long-wavelength light up to 18µm (555 cm−1). Yet another nonlinear process is optical parametric oscillation (OPO), which can be thought of as SFM in reverse, inside an optical resonator. Starting from a single pump frequency, two lower frequencies are produced. By convention the greater output frequency is called signal, and the smaller idler. The frequency of the signal (and hence the idler) depends on the crystal-to-beam angle. An OPO crystal can also be used outside the resonator as a single-pass optical parametric amplifier (OPA). The input is then the pump and the signal, and the output is an amplified signal and some idler. Figure 3.7 shows how all of the above can be used to cover a large part of the vibrational spectrum, including stretching and bending modes. [73, 74] Starting with an Nd:YAG, both the fundamental with frequency 9400 cm−1 (1064nm) and its second harmonic are produced. The latter is fed into an OPO crystal, which produces an idler tunable in the range of 5400–7400 cm−1 (1850–1350nm). An- other crystal is used as an OPA to subtract this frequency from the fundamental, resulting in an OPA idler of 2000–4000 cm−1 (5000–2500nm). Already, this range covers the OH and NH stretching modes that exist in peptides. Finally, the two outputs of the OPA are mixed using a ArGaSe2 DFM to yield a frequency of 1530– 4190 cm−1. This additionally covers the C=O stretching mode of peptides. Using a similar setup, it is possible to achieve a range of 18000–600nm. [74] In the system shown in Fig. 3.7, the OPO unit can be replaced with a dye laser and a DFM. Similar to OPO, a dye laser needs a pump as input and gives a signal (but no idler) as output. The OPO idler would then be replaced by the output of the DFM when fed the dye laser output and the second harmonic. A simpler setup can be used to generate UV light using a dye laser, see Fig. 3.8. 24 3.3. Setup Components Nd:YAG SHG SFM Dye SHG 1064 nm 355 nm UV 532 nm C153 out Figure 3.8: Schematic setup for the generation of UV light from an Nd:YAG laser. Colors are not to scale, but in the correct order. The 1064nm fundamental is doubled and then summed with itself to yield its 355nm third harmonic, which is fed to a dye laser filled with Coumarin 153. The dye laser emits tunable light in the range of 518–574nm, which is doubled to 259–287 nm. Again, starting with an Nd:YAG, its internal KD2PO4 optics produce the third harmonic of frequency 28 200 cm−1 (355nm). This is then used to pump a dye laser filled with Coumarin 153, whose output is tunable in the 17400–19 300 cm−1 (574– 518nm) range. A SHG BaB2O4 crystal is used to double this range into 34800– 38 600 cm−1 (287–259nm). This interval includes several chromophores, notably phenylalanine at approximately 267 nm. To summarize, table top lasers can be used to generate mid-IR light in the range of 1530–4190 cm−1 and UV light in the range of 34 800–38 600 cm−1. These two ranges in combination enable ion dip spectroscopy. Laser Scanning Algorithms For nonlinear optical crystals to function, they must be angled such that a phase-matching condition occurs. [75] The tolerance of this angle is on the order of 100µrad, and depends on the light frequency. Thus, scanning the laser requires all crystals to be precisely rotated. This is accomplished by using stepper motors and specialized scanning algorithms. One approach to scanning is to first construct a table where each row contains a laser frequency and the appropriate crystal angles. The laser can then be set to a specific frequency by looking up and interpolating between the closest rows in the table. In practice such a table only works for scanning in one direction, because of backlash in the stepper motors. Construction of these tables requires another scanning algorithm. A table-less approach is to use a fraction of the beam to infer the angle error. Because the beam profile has finite width, it has some variance in Fourier space. When the crystal is angled with a small error, it will function better for some non- central wavevectors, and therefore stray to the side after a distance. By employing a differential two-pixel photodiode, it is therefore possible to infer the angle error, and correct it using linear feedback. In some contexts, the two-pixel photodiode cannot be used and must be replaced 25 Chapter 3. Experimental Methods with a one-pixel photodiode. Linear theory then fails, because the photodiode signal is then an even function of angle error, resembling a Gaussian. For scanning in this case, an algorithm developed during this thesis called Zigzag can be used. A simplified presentation of Zigzag in Python is for index in range(patience): direction = index%2 top_energy = 0 while True: motor[index].increment_angle() energy = diode.get_energy() if energy > top_energy: top_energy = energy elif energy < 0.9*top_energy: break It works by changing direction when the energy falls below a threshold, which is continuously updated. 3.3.3 Mass Spectrometers In spectroscopic experiments there is not only the molecule of study present; there are also fragments of it and the background. For this reason, the detector must be sensitive to mass and produce a mass spectrum rather than a total count. Two types of such spectrometers are employed in the experiment of this thesis. Ion Cyclotron For spectroscopy of ions, a cylindrical Penning trap is used. It confines the ions in the radial direction with a strong homogeneous magnetic field, and in the axial direction with a linear electric field. The strong magnetic field is usually accomplished with a superconductive electromagnet, which requires liquid helium cooling. The motion of an ion in the Penning trap is a combination of cyclotron oscillation with frequency 𝜔 = 𝑞𝐵𝑚 (3.6) and a negligible oscillation due to the electric field. Hence mass is related to fre- quency, and by measuring the outgoing near-field radiation of the trap and taking the Fourier transform, it is possible to obtain a mass spectrum. This principle of mass measurement is known as Fourier-transform ion cyclotron resonance (FT- ICR, see Fig. 3.9), and can reach a relative accuracy better than 1 in 1 000 000. [76] 26 3.3. Setup Components +V 0 0 +V' B Ions Detector Figure 3.9: Schematic presentations of FT-ICR and TOF mass spectrometers. Left (FT- ICR): ions are trapped by a strong homogeneous magnetic field. The resulting cyclotron ra- diation is measured and Fourier transformed into an inverse mass spectrum. Right (TOF): molecules are ionized and immediately accelerated by a static electric field, and drift to- wards a detector. The optional midway reflectron extends the drift time by creating an- other point where the flight time is independent of ionization position. Time-of-flight On the other hand, neutral atoms cannot be directly guided by electromagnetic fields. Therefore, spectroscopy techniques on neutrals such as IRMPD–VUV and IR–UV ion dip produce ions after the IR absorption has oc- curred. In time-of-flight (TOF) mass spectrometers, these ions are accelerated upon ionization using a static electric field, giving them a velocity of 𝑣 = √2𝑞𝑈𝑚 , (3.7) where 𝑈 is the electric potential at the point of ionization. The ions then drift through a grounded tube for a time proportional to inverse velocity, or equiva- lently to mass squared. At the end of the drift path, singular ions are detected with precise time resolution using a micro-channel plate (MCP) detector. The accuracy of TOF mass spectrometers is limited not by time resolution (which could be coped with by extending the drift path), but the spread in acceleration po- tential due to the width of the molecular beam. Ions created further away from the drift region, where the potential is higher, initially lag behind but eventually catch up at a point and surpass those created at the center of the beam. This point is a time focus and the best place for the detector. By introducing a reflectron as part of the drift path, it is possible to delay the ions with high kinetic energy, and thereby creating a time focus further away. This in- creases the relative accuracy of the instrument, because the drift time is increased without affecting the time resolution. A typical relative mass accuracy of 1 in 2000 is reachable with a reflectron added. Figure 3.9 (right) shows a TOF mass spec- trometer with a reflectron. 27 Chapter 3. Experimental Methods 3 G L E 2 1 Figure 3.10: The local experimental setup, affectionately known as the Giraffe. Some major components. 1: The source chamber containing the nozzle for supersonic jet expan- sion, the skimmer, and the MALDI setup. On its opposite side (not visible), the chamber connects to the sluice and the sample delivery system. 2: Interaction cross where the ion- ization happens. Two CaF2 windows allow UV and IR light to enter the cross. 3: Tube for time-of-flight mass spectrometry. On the top (not visible) sits the reflectron. L: Nd:YAG laser used for laser desorption. G: Gate valve for isolating the high-vacuum part of the chamber. E: Electric field acceleration plates for the mass spectrometer. The same piece holds the extraction plates, which reach down into the cross. 28 3.4. Local GU Laboratory 3.4 Local GU Laboratory One of the goals set at the start of this PhD project was to build an experimental setup capable of spectroscopic experiments in our laboratory in Fysikforskarhuset, Gothenburg. The chosen design was inspired by a vacuum chamber setup for REMPI and IR–UV ion dip spectroscopy at Radboud University, Nijmegen. It consists of three major parts (Fig. 3.10): a source chamber, an interaction cross, and a time-of-flight mass spectrometer. 3.4.1 Source Chamber The source chamber, shaped like a rectangular block, is designed to generate a neutral molecular beam directed into the interaction region. The two most impor- tant components for this purpose are the skimmer and the source. The skimmer is fixed and mounted to the side of the source chamber that faces the interaction region, see Fig. 3.11. The source is mounted on a XY translational stage on the opposite side. By design it is detachable, and at present there are two sources to choose from: oven and pulsed valve. The oven is used to evaporate molecules that allow it, such as fullerene. The pulsed valve is used in tandem with MALDI to achieve supersonic jet cooling of fragile molecules such as peptides. N S M V H Figure 3.11: Virtual cross-cut of the source chamber. Important components are labeled. N: Pulsed valve and nozzle through which a carrier gas expands in a supersonic jet. It is mounted to a XY translational staged on the left chamber wall. S: Skimmer that selects the central part of the jet, and leads into the interaction cross. M: MALDI setup, the blue and yellow part is the quickly replaceable sample holder. The grey linear feedthrough is seen in the background. V: Vertical translational stage, used to adjust where the MALDI fumes enter the jet expansion. H: Horizontal translational stage used to move the sample bar. Rests on a blue shelf which is mounted to the chamber wall. 29 Chapter 3. Experimental Methods F h S s G a M f Figure 3.12: Virtual and real image of the sample delivery system. M: MALDI setup, with the sample holder colored blue. G: Gate valve. S: Sluice chamber for fast venting. F: Linear feedthrough that fits in the sluice in its shortened state, and reaches the MALDI setup in its elongated state. h: MALDI sample bar holder. s: Leaf spring that gently fixes the sample bar holder. a: Adapter than the linear feedthrough can dock to using a M6 screw. f: Linear feedthrough arm in extended state, docked to the sample bar holder. In the source chamber, below the pulsed valve, is a MALDI setup. Its purpose is to deliver fragile molecules into the gas phase. It sits on a shelf mounted to the wall of the source, with two linear translation stages in between. The vertical stage is needed because the efficiency of cooling is sensitive to the distance between the desorption point and the gas expansion. The horizontal stage moves the setup such that the desorption point contains new molecules to desorb. A small 1064nm Nd:YAG laser sits on top of the source chamber, and emits the necessary pulses to the desorption point. The MALDI sample bar lasts for just less than an hour, and must then be re- placed. To minimize time spent pumping down, a sluice with a linear feedthrough is installed, see Fig. 3.12. The feedthrough connects to the sample bar holder and can be used to pull it out into the sluice. The sluice gate can then be closed and the sluice opened on the other side, allowing the sample bar to be replaced. A dedicated pump clears the sluice within minutes, and the sample bar holder can then be reinstalled. 3.4.2 Interaction Cross The interaction cross is, as the name suggests, shaped like a six-directional cross. The neutral molecular beam enters it from one direction and intersects the laser beams that enter through windows on the sides perpendicular to the molecular beam. Any ions created are accelerated upwards into the mass spectrometer tube. The turbo pump connects to the bottom side. 30 3.4. Local GU Laboratory FWHM = 0.043 Da 92 92.5 93 93.5 0 10 20 30 40 50 60 70 80 90 100 Mass (Da) Figure 3.13: Mass spectrum of toluene, obtained from the local time-of-flight mass spec- trometer. The pink inset shows the peaks of toluene and its first C13 isotopologue. The full width half maximum of peaks is measured to be 0.043Da. 3.4.3 Mass Spectrometer The mass spectrometer used in our laboratory is a slightly adapted commercial solution from Jordan TOF. Specifically it is a reflectron time-of-flight mass spec- trometer. It consists of four parts: the extraction plates, the time-of-flight tube, the reflectron, and the microchannel plate (MCP) detector. The extraction plates reach down into the interaction cross because their purpose is to accelerate ions into the tube. Figure 3.13 shows the REMPI mass spectrum of toluene measured by the mass spectrometer in the local laboratory. At a mass of 92–93Da, the full width at half maximum is 0.043Da, implying a relative accuracy of more than 1 in 2000. Fig- ure 3.14 shows the corresponding REMPI spectrum. 37200 37250 37300 37350 37400 37450 37500 37550 UV Frequency (cm!1) Figure 3.14: The REMPI spectrum of toluene, as defined by the pink region in Fig. 3.13. 31 Ion yield (arb.) Count (arb.) 4 Computational Methods This chapter is meant to give an overview of the methods to simulate spectroscopic results employed in this thesis. 4.1 Goal and Overview A spectroscopic experiment typically results in a spectrum. To interpret this spec- trum and extract information about the underlying molecular properties, candi- date hypotheses are simulated and compared to the experimental result. If a sim- ulation matches the experiment, its hypothesis is deemed more likely. Candidate hypotheses for IR spectra typically include three parts: one or more conformer that are believed to be present in the experiment, an energy surface model believed to describe those conformers, and a model for describing the vi- brational frequencies. While the latter models can easily be found “off-the-shelf”, finding the relevant conformers to a molecule in practical time is an hard problem. Section 4.3 will categorize the available methods, with a focus on those used in my work. Searching for conformers in the vast conformational space is indeed a hard prob- lem. The result of a conformer search is a large count of conformers, potentially more than 100 000. These are then further optimized with a hierarchy of meth- ods to obtain accurate three-dimensional (3D) structures, and their energies. Sec- tion 4.4 will describe those and rank them on the hierarchy of accuracy and com- plexity. Finally, Section 4.5 will describe some ways the vibrational frequency spectrum can be computed, including the recent Born–Oppenheimer molecular dynamics method. 33 Chapter 4. Computational Methods SMILES XMol XYZ Gaussian Z-matrix FC(F)F 5 0 1 fluoroform C 0.00 0.00 0.00 C H 1.09 0.00 0.00 H 1 1.09 F -0.48 1.26 0.00 F 1 1.35 2 111 F -0.48 -0.63 1.09 F 1 1.35 2 111 3 120 F -0.48 -0.63 -1.09 F 1 1.35 2 111 4 120 Figure 4.1: Data representations of fluoroform (CHF3). The SMILES format is brief and omits hydrogens. The XMol XYZ format includes the Cartesian coordinates. The Gaus- sian Z-matrix format specifies each nuclei relative to up to three previous ones. 4.2 Molecular Formats Before diving into the various computational methods and softwares, it is worth describing how molecules are represented on computers. There are at least three important classes of formats: string, Cartesian, and Z-matrix. Conversion between formats is possible with the wonderful Open Babel tool. [77] String formats like SMILES (Simplified Molecular Input Line Entry System) are very compact descriptions of molecules, but they do not specify exact positions of the nuclei, and therefore contain no conformational data. They are useful as a starting point before the conformational search. SMILES in particular has an intuitive tree-like syntax, exemplified with fluoroform in Fig. 4.1. Cartesian formats like XMol XYZ are text files that fully specify the positions of all nuclei. They are comprised of a small header followed by one row for each nuclei. The row representation includes at least element and position, but some formats also include neighborhood information and chemical properties. Z-matrix formats like Gaussian Z-matrix also use one row per nuclei to spec- ify their positions. They differ from Cartesian formats in that they use internal coordinates, meaning bond lengths, bond angles, and dihedral angles. This rep- resentation is more practical when handling conformers, because they are defined by their dihedral angles. Another advantage of Z-matrix over Cartesian formats is that 𝑁 nuclei require only 3𝑁 − 6 coordinates rather than 3𝑁 , because the origin and orientation of the basis is never chosen. 4.3 Conformer Searching When an experiment is performed on molecular species, its general bond structure but not its preferred conformer is known. The first step of a simulation is therefore 34 4.3. Conformer Searching to find a large set of plausible conformers, which live in the conformational space generated by dihedral angle rotations around single bonds. This space is exponen- tially large and thus infeasible to exhaust for any peptide. Consequently, sparse sampling strategies are required to explore the conformational space. 4.3.1 Conformer Distance When searching the conformational space, it is possible to find the same conformer twice, represented in two different ways. In order to mitigate this inefficiency, the algorithm needs to define and check for duplicates. A good definition for duplicity is that the conformer distance, to be defined in the following paragraph, is less than some threshold value on the order of 1Å. The establish conformer distance, often simply called root-mean-square dis- tance (RMSD), is the least square distance between point sets, taken over the group generated by rotations and translations. [78] To be explicit, if two conform- ers have position matrices 𝐴 and 𝐵 of size 𝑁 × 3 the distance is RMSD(𝐴,𝐵) = √1 min ||𝐴𝑄 + 1𝑣 − 𝐵|| , (4.1) 𝑁 𝑄,𝑣 𝐹 where 𝑄 is a 3×3 rotation matrix, 1 is an 𝑁×1 vector of ones, 𝑣 is a 1×3 translation vector, and || ⋅ ||𝐹 is the Frobenius norm. This has an explicit solution RMSD(𝐴,𝐵) = √1 ||𝐴𝑈̃ − ?̃?𝑉 || 𝑁 𝐹 (4.2) where 𝑈𝑆𝑉 𝑇 = 𝐴𝑇̃ ?̃? (4.3) 𝐴̃ = 𝐴 − 1 𝐴 ?̃? = 𝐵 − 1𝑁 1 and 𝑁 1𝐵. (4.4) Equation (4.3) means that 𝑈 and 𝑉 are given by the singular value decomposition of the right hand side. The RMSD is directly applicable to Cartesian formats, which Z-matrix can be converted into. It is zero between identical conformers, and naturally invariant with respect to the translations and orthogonal rotations, but not row permuta- tions. Therefore, conformers that are separated by only translation and orthogo- nal rotation can still have nonzero RMSD, for example if they are related by a 60° methyl rotation. This issue can be resolved by taking the smallest RMSD across methyl rotations or the like. 4.3.2 Energy-Blind Sampling The most straightforward approach to generating conformations is to randomly select dihedral angles from a distribution on the domain [−180°, 180°]. This can 35 Chapter 4. Computational Methods be an efficient way to sample small molecules such as dipeptides. [42, 43] Not all angles need to be varied, for example the peptide link H-N-C=O dihedral angle is almost always 180°, because of the nearby double bond. Random sampling is easily implemented in high-level languages such as Matlab or Python, but there is also an open source implementation called Confab [52] than is integrated into Open Babel. Independently randomizing angles can lead to unphysical self-intersecting con- formations, especially when applied to long molecules such as peptides. Distance geometry embedding addresses this by first establishing upper and lower bounds for all atom-pair distances using bond information. A candidate distance matrix is then randomly generated within these bounds, and the conformation is fitted to match this distance matrix. An implementation of distance geometry embedding is freely available in the software package RDKit. [53] 4.3.3 Energy-Guided Sampling Because low-energy conformer are more interesting than high-energy ones, it is natural to employ coarse energy calculations in order to steer the search into lower energies. There are some simple and fast energy models for this purpose which are detailed in Section 4.4. Even if they have very poor correlation with DFT for stable conformers, they help avoiding unphysical behaviors such as self-intersection. The most famous example of energy-guided sampling is the Metropolis–Hastings algorithm, [79, 80] in which the system is iteratively mutated such that the trajec- tory obeys Boltzmann statistics. Specifically the logic loop can be written as state = initiate_state() for _ in range(patience): new_state = mutate_state(state) accept_prob = exp(-beta*(energy(new_state)-energy(state))) if uniform() < accept_prob: state = new_state save_state(state) This loop is known to generate states following a Boltzmann distribution if the function mutate_state is symmetric in the sense that for all X and Y, the probabili- ties of Y = mutate_state(X) and X = mutate_state(Y) are equal. On mutations of dihedral angles, which is the natural choice for conformer generation, this is sat- isfied by adding a random variable with symmetric distribution. The beta parameter is inversely proportional to temperature and must be cho- sen carefully. A too high value of beta makes the program find the closest energy minimum and stay there, while a too low value yields unphysical high-energy con- formers. There is a Goldilocks zone of decent values of beta, but it is also possible 36 4.3. Conformer Searching to start with a low value and increase it over time. This results in a different dis- tribution of conformers than running in the Goldilocks zone, analogous to how annealed metals differ from quenched. [81] A downside of pure Metropolis–Hastings sampling is that low-energy conform- ers are frequently revisited. This redundancy can be mitigated by so-called pol- ing, meaning adding repulsive potential terms centered on past conformers. [82] Another way of mitigating redundancy is to exclude entire regions based on pre- viously generated conformers. [83] Another algorithm entirely, which avoids redundancy by design rather than as an afterthought, is the basin-hopping algorithm. Its main idea is to sequentially explore the neighborhoods of every optimized state once. A simplified version can be written states = [initiate_state()] for state in states: for _ in range(num_attempts): new_state = optimize_state(mutate_state(state)) if energy(state) < max_energy and state not in states: states.append(state) save_state(state) Rather than an explicit patience constant, the theoretical runtime is related to the values of num_attempts and max_energy. Roughly speaking, the former affects the density of conformers and the latter the volume of the feasible space, both which increase runtime. In practice the runtime is difficult to estimate, and the algorithm is terminated externally. An implementation of the basin-hopping algorithm is included in the software package Tinker. [54] The implementation is named scan and makes some context- specific alterations to the code above. For example, rather than independently mutating the molecule num_attempts times, scan chooses num_attempts distinct vibrational modes and stimulates the molecules by adding momentum to them. The mutated molecule is then obtained by simulating the system with Newtonian dynamics for a while. Energies and forces are calculated from cheap molecular force field methods that can be externally supplied. 4.3.4 Dynamical Sampling A physics-inspired approach to generate conformers is through classical molecular dynamics (MD) simulations. The equations of motions are often solved with the Verlet integrator. [84] In theory, the molecule should eventually find a low-energy conformer, just as in reality. Unfortunately this is inefficient in practice because the integration time step (about 1 fs) is much lower than the time between distinct 37 Chapter 4. Computational Methods conformers (about 10ps), meaning most computation is wasted on details within conformer basins. There are some unphysical tricks that makes MD better at generating conform- ers. Replica-exchange MD [85, 86] starts by running multiple MD simulations in parallel, but at different temperatures or with different Hamiltonians. Periodi- cally, coordinates of parallel runs are exchanged according to Metropolis logic. Such exchanges make conformer generation more efficient by overcoming energy barriers. The CREST software designed for conformer searching uses a replica- exchange MD algorithm that additionally includes poling, meaning the addition of repulsive bias in the MD potential for each previously identified conformer. [55] 4.4 Energy Methods Arguably, the most fundamental problem in molecular physics is to determine the energy of a given system. For this task there are a plethora of methods, which can be classified on a scale from cheap (short runtime) to accurate. These methods syn- ergize because they have different use-cases; an energy-guided conformer search would use a cheap force field method like MM3, but the final energy evaluation of a conformer would use a composite method like G4 if possible. 4.4.1 Wavefunction Approximations The main difference between cheap and accurate methods is what approximations they make. Most methods used in this thesis make some common approximations that will be described here. In principle, the energy of any quantum-physical system can be found by solving the time-independent Schrödinger equation ?̂? |𝜓⟩ = 𝐸 |𝜓⟩ , (4.5) which is an eigenvalue problem for the energy scalar 𝐸, given an Hamiltonian op- erator ?̂? . In the case of a molecule consisting of 𝑛e electrons and 𝑛n nuclei with co- ordinates 𝑟𝑖 and𝑅𝑖, the wavefunction depends on all,𝜓 = 𝜓(𝑟1,… , 𝑟𝑛 , 𝑅1,… ,𝑅e 𝑛 )n and the Hamiltonian is ?̂? = 𝑇̂ + 𝑇̂ + 𝑉 ̂e n ee + 𝑉̂ ̂en + 𝑉nn, (4.6) where 𝑇̂e and 𝑇̂n are the kinetic operators of the electrons and nuclei, and 𝑉⋅̂ ⋅ are Coulomb potentials. In practice, the complexity and size of the wavefunction 𝜓 is too great to be solved analytically or numerically, and must be broken down successively into sim- pler forms. 38 4.4. Energy Methods An implicit approximation for light nuclei such as those in biomolecules, is the neglect of relativistic effects, implying that 𝜓 is scalar complex valued, and that the kinetic operators can be written 𝑛e ℏ2 𝑛e ℏ2𝑇̂ 2e = −∑ 2𝑚 ∇𝑟 and 𝑇 ̂ 2 n = −∑ 2𝑀 ∇𝑅 . (4.7)𝑖 𝑖𝑖=1 e 𝑖=1 𝑖 The most important approximation is the Born–Oppenheimer one, which sep- arates the solution of the nucleic and electronic part. [87] First, the nuclei are considered fixed at positions ?⃗? and the electronic Schrödinger equation (𝑇̂ + 𝑉 ̂e ee + 𝑉̂ ̂en(?⃗?) + 𝑉nn(?⃗?)) ∣𝜓e(?⃗?)⟩ = 𝐸e(?⃗?) ∣𝜓e(?⃗?)⟩ (4.8) is solved, yielding the electronic energy as a function of nucleic positions 𝐸e(?⃗?), often called the potential energy surface (PES). Then, the nucleic terms are rein- troduced and solved for in the nucleic Schrödinger equation (𝑇̂n +𝐸e) |𝜓n⟩ = 𝐸 |𝜓n⟩ (4.9) The Born–Oppenheimer approximation thus reduces the Schrödinger equation into two eigenvalue problems with much smaller wavefunctions. The electronic is the more challenging one, because there are always more electrons than nuclei in a molecule. One important approximation that significantly reduces the complexity of the wavefunction is the single Slater determinant approximation, which is famously used in the Hartree–Fock (HF) method. The idea is that the electronic many- electron wavefunction is a product of one-electron wavefunctions, up to anti-sym- metrization. Explicitly, the wavefunction is written 𝜙1(𝑟1) 𝜙2(𝑟1) ⋯ 𝜙𝑛e(𝑟1) 𝜓(𝑟 11,… , 𝑟𝑛 ) = √𝑛 ∣ ⋮ ⋮ ⋱ ⋮ ∣. (4.10)e e 𝜙1(𝑟𝑛 ) 𝜙2(𝑟𝑛 ) ⋯ 𝜙𝑛 (𝑟e e e 𝑛 )e To appreciate how greatly this form limits 𝜓, count the dimension of its space. Assuming the one-electron wavefunctions belong to some vector space 𝑉 of large dimension 𝑑, the space of many-electron wavefunctions is the 𝑛eth exterior power of 𝑉 , which has dimension ( 𝑑 ) ≈ 1 𝑛𝑛 𝑛 !𝑑 e . On the other hand, the subspace one e the form of Eq. (4.10) is parameterized by 𝑉 𝑛e , which has dimension 𝑛e𝑑. This counting does not take normalization into account, but the point remains that the number of degrees of freedom in 𝜓 becomes a linear function of 𝑑. The downside of such a strong approximation is that correlation information is lost, because the Slater determinant is essentially a product approximation, which 39 Chapter 4. Computational Methods in terms of probabilities means independence. Despite this, the HF method allows accounting for around 99% of the total electronic energy. [88] The remainder, due to correlation effects requires relaxing this approximation, which is what so-called post-HF methods do. The one-electron wavefunctions are in principle normalized functions from real 3D space to the complex numbers. In practice, they are restricted to finite-dimen- sional subsets with some useful property. Slater-type orbitals, inspired by the hy- drogen system, have basis functions with a radial factor 𝑅(𝑟) = 𝑟𝑛𝑒−𝑎𝑟. On the 2 other hand, Gaussian-type orbitals use 𝑅(𝑟) = 𝑟𝑛𝑒−𝑎𝑟 , which makes inner prod- ucts between them computable analytically. It is common to combine the strength of both types, by using as basis functions fixed linear combinations of Gaussian- type orbitals that approximate Slater-type orbitals. Two such implementations are the Pople [89] and Dunning [90, 91] basis sets. 4.4.2 Density Functionals Another approach to determining the energy system is density functional theory (DFT), which replaces the wavefunction 𝜓 with the electron probability density 𝜌(𝑟) = ∫ d3𝑟 ⋯∫ d3𝑟 |𝜓(𝑟, 𝑟 ,… , 𝑟 )|22 𝑛 2 𝑛 . (4.11)e e Although this seems like a massive loss of information, the ground state wavefunc- tion and energy can in principle be determined from the corresponding electron density. Formally, the second Hohenberg–Kohn theorem [92] states that there exists an universal functional 𝐹 such that the functional 𝐸[𝜌] = 𝐹 [𝜌] +∫𝑉 (𝑟)𝜌(𝑟) d3𝑟 (4.12) has a global minimum (𝜌0, 𝐸[𝜌0]) consistent with the ground state solution to the electronic Schrödinger equation with external (nucleic-electronic) potential𝑉 . This implies that any evaluation of 𝐸 gives an upper bound on the true energy 𝐸[𝜌0]. Unfortunately, 𝐹 is not explicitly defined in the proof of the theorem, but must rather be approximated. A popular way to practically use DFT is with the Kohn–Sham formalism, [56] which splits the universal functional into 𝐹[𝜌] = 𝑇s[𝜌] + 𝐽[𝜌] + 𝐸xc[𝜌], (4.13) where 𝑇s[𝜌] is the Kohn–Sham kinetic energy that can be found by solving a HF- like problem, 𝐽[𝜌] is the Coulomb energy 2 𝐽[𝜌] = 1 𝑒 ∫d3𝑟 ∫d3𝑟 𝜌(𝑟1)𝜌(𝑟2)2 4𝜋𝜖 1 20 |𝑟1 − 𝑟 | , (4.14) 2 40 4.4. Energy Methods and 𝐸xc[𝜌] is the exchange-correlation functional. There are many competing ap- proximations for 𝐸xc[𝜌], including separate parts for the exchange and the cor- relation part. For example, one could use Becke’s 1988 functional [93] for the exchange, and Lee, Yang, and Parr’s functional [94] for the correlation. This com- bined method is known as B-LYP, and this naming convention is generally upheld. The DFT functionals used in the papers of this thesis include B3LYP, [95] M06- 2X, [96] andωB97X-D. [97, 98] Although not purely DFT, composite methods such as CBS-4M [99, 100] and G4(MP2) [101] have also been used. Many DFT functionals do not correctly model London dispersion forces. While two distant atoms should experience a weak attractive force corresponding to a 𝐶𝑟−6 potential, the DFT energy decreases exponentially. A popular solution is then to add an empirical dispersion term. The papers of this thesis consistently use Grimme’s dispersion [102] with Becke–Johnson damping, [103] which for a diatomic molecules simplifies to 𝐸 (𝑟) ∝ − 𝐶6 𝐶8disp 1 + (𝑟/𝑟 −6)6 1 + (𝑟/𝑟 )8 , (4.15) 8 where 𝐶𝑛 and 𝑟𝑛 depend on the atom number of the nuclei. For larger molecules, the equation above is applied on all pairs of nuclei, and there is also a three-body term. [102] 4.4.3 Empirical Methods The simplest methods for calculating energies are linear combinations of analytical expressions of bond lengths, bond angles, and dihedral angles. Because they can easily be differentiated to give forces, they are known as molecular force fields. Examples include AMBER, [104] CHARMM, [105] and MM3. [12] The main difference is the expressions used and the data used to fit the coefficients. Force fields have poor accuracy, being almost uncorrelated with DFT. [106] Thus, they are mostly used in energy-guided sampling, where the absolute energy is not crucial. Semiempirical methods are more complex than force fields. They start from wavefunctions and Hamiltonians, but neglect some integrals or replace the with empirical expressions. The recently developed PM7 method [107] implemented in the freely available MOPAC software by Stewart [108] is one such method. In the papers of this thesis, PM7 is often used as a middle step between the conforma- tional search and DFT, because it correlates decently (20–40%) with DFT. 41 Chapter 4. Computational Methods 4.5 Frequency Methods The vibrational frequency spectrum of a molecule is intimately related to spectro- scopic experiments, and is a sensitive function of molecular structure. Therefore, much effort has gone into developing accurate methods for computing the fre- quency spectrum from the 3D structure. Conventional frequency calculations are based on energy derivatives of a spe- cific 3D structure. These are used to fit a simpler model to the potential energy surface, such that the Schrödinger equation for nucleic motion can be solved on this model. More recently, frequency methods based on molecular dynamics have risen in usage. They only need first-order derivatives of the potential energy surface, and instead explore many points around the equilibrium. This strategy along with its (dis)advantages will be discussed in Subsection 4.5.3. 4.5.1 Harmonic Approximations One of the simplest quantum systems for which the Schrödinger equation can be analytically solved is the harmonic oscillator. Explicitly, the one-dimensional (1D) equation is 𝑝̂2(2𝑚 + 1 2𝑘𝑥 2̂ ) |𝜓⟩ = 𝐸 |𝜓⟩ , (4.16) and has the eigenvalues 𝐸𝑛 = ℏ𝜔(𝑛 + 12) for non-negative 𝑛, where 𝜔 = √𝑘/𝑚. The corresponding eigenstates |𝑛⟩ are Gaussian function multiplied by Hermite polynomials. An important property of the eigenstates are that the expectation value ∞ ⟨𝑛′|𝑥|̂ 𝑛⟩ = ∫ 𝜓𝑛′(𝑥)𝑥𝜓𝑛(𝑥) d𝑥 (4.17) −∞ is nonzero precisely when 𝑛′ = 𝑛 ± 1. This value corresponds to the probability amplitude of an operator proportional to 𝑥̂ changing the vibrational quanta from 𝑛 to 𝑛′. In the context of absorption spectroscopy, this operator is the electric dipole, and the integral statement implies that a harmonic system can only absorb photons increasing the quanta by one. Thus the absorption energy is given by 𝐸𝑛+1 −𝐸𝑛 = ℏ𝜔. The 𝑑-dimensional quantum harmonic oscillator is intuitively similar to the 1D oscillator. In fact, it can be split into 𝑑 independent 1D oscillators by a linear change of coordinates. The new basis vectors are called vibrational modes and can be found by solving an eigenvalue problem involving the mass and stiffness 42 4.5. Frequency Methods matrices. The absorption energies of the whole system is {ℏ𝜔𝑚}, where the 𝑑 an- gular frequencies 𝜔𝑚 come from the eigenvalue problem. For a molecule with 𝑁 nuclei, there are 3𝑁 degrees of freedom. Define 3𝑁 coordinates 𝑥⃗ = {𝑥𝑖} such that 𝑥⃗ = 0⃗ corresponds to the equilibrium geometry, assuming that it is known. Now, the potential energy 𝑉 is approximated by its second-order Maclaurin expansion around this point: 3𝑁 3𝑁 𝑉 (𝑥)⃗ ≈ 𝑉 (0)⃗ +∑𝑉 ′𝑖 (0)⃗ 𝑥𝑖 + 1 ″ 2 ∑ 𝑉𝑖𝑗(0) ⃗ 𝑥𝑖𝑥𝑗. (4.18) 𝑖=1 𝑖,𝑗=1 Because the expansion was made around the equilibrium geometry, the first-order derivatives of the potential vanish. The Schrödinger equation on this harmonic potential can now be solved as previously described, with the set of second-order derivatives 𝑉 ″ as the stiffness matrix. Although there are 3𝑁 coordinates, 6 (5 for linear molecules) of these describe translations and rotations. For this reason, only 3𝑁−6 computed vibrational mode frequencies are nonzero, within numerical accuracy. Each of these modes con- tribute to the vibrational spectrum with an intensity proportional to the squared derivative of the electric dipole with respect to the mode coordinate. In summary, the harmonic frequency method computes the vibrational spec- trum by comparison with a quantum harmonic oscillator. Two harmonic approx- imations are made: that the restoring force on nuclei is a linear function of their coordinates, and that the electric dipole of the molecule is also a linear function of the same coordinates. The computed harmonic frequency spectra of molecules, when compared to ex- perimental data, consistently underestimate vibrational frequencies by a few per- cent. By multiplying the computed result with an empirical scaling factor, agree- ment is improved and the typical error is around 30 cm−1. [58] However, the need for this scaling factor suggests that the harmonic approximation (of force) is too simplistic. 4.5.2 Anharmonic Perturbation When a Maclaurin approximation is inaccurate, the natural solution is to include more terms. Adding only third order terms to the potential would make it un- bounded in a pathological way, so the next logical step is to add third and fourth order terms. This family of potentials is not analytically solvable. What can be done is to start from the harmonic theory, and add higher order terms considered as small perturbations. This is the rationale for second-order vibrational pertur- bation theory (VPT2). [57] 43 Chapter 4. Computational Methods VPT2 splits the Hamiltonian like ?̂? = ?̂?HO + ?̂?extra, where ?̂?HO corresponds to the understood harmonic oscillator, and ?̂?extra to higher order terms but also a Coriolis term. The perturbed energy for a given state with quantum number vector ?⃗? is then obtained as ⟨?⃗?∣?̂?extra∣?⃗?⟩ ⟨?⃗?∣?̂?𝐸 = ⟨?⃗?∣?̂? ∣?⃗?⟩ + ⟨?⃗?∣?̂? ∣?⃗?⟩ + ∑ extra ∣?⃗?⟩ ?⃗? HO extra , (4.19) ?⃗?∶?⃗?≠?⃗? ⟨?⃗?∣?̂?HO∣?⃗?⟩ − ⟨?⃗?∣?̂?HO∣?⃗?⟩ which with some VPT2-specific assumptions simplifies to [57] 3𝑁−6 3𝑁−6 𝑟 𝐸 1 1 1?⃗? = 𝐺0 + ∑ ℏ𝜔𝑟(𝑛𝑟 + )+ ∑ ∑ℏ𝜒𝑟𝑠(𝑛𝑟 + )(𝑛𝑠 + ). (4.20) 𝑟=1 2 𝑟=1 𝑠=1 2 2 In the expression above 𝐺0, 𝜔𝑟, and 𝜒𝑟𝑠 are constants that can be computed from geometry and the potential derivatives of order up to four. In fact, only the fourth- order derivatives of the form 𝑉 ⁗𝑟𝑟𝑠𝑡 need to computed, which saves complexity. The ground state absorption spectrum can then be obtained by subtracting the ground state energy 𝐸0⃗ from every energy. In theory any transition is allowed, but in practice only excitations of up to two quanta are computed. The one-quantum excitations are called fundamental frequencies and have (angular) frequencies 3 1 3𝑁−6𝜈𝑟 = ℏ−1(𝐸𝑒𝑟⃗ −𝐸0⃗) = 𝜔𝑟 + 2𝜒𝑟𝑟 + 2 ∑ 𝜒𝑟𝑠, (4.21)𝑠=1 where 𝑒𝑟⃗ is a unit vector in coordinate 𝑟. The two-quanta excitations are called first overtones if both quanta lie in the same mode, and combination bands otherwise. Their frequencies are 𝜈𝑟𝑟 = ℏ−1(𝐸2𝑒⃗ −𝐸0⃗) = 2𝜈𝑟 + 2𝜒𝑟𝑟 and (4.22)𝑟 𝜈 −1 𝑟≠𝑠𝑟𝑠 = ℏ (𝐸𝑒𝑟⃗ +𝑒⃗ −𝐸0⃗) = 𝜈𝑟 + 𝜈𝑠 + 𝜒𝑟𝑠. (4.23)𝑠 The corresponding intensities involve derivatives of the electric dipole and are de- scribed in Ref. [57]. A known problem with VPT2 is that the expression 𝜔𝑟+𝜔𝑠−𝜔𝑡 appears in some denominators of the 𝜒𝑟𝑠. Such ratios become excessively large when 𝜔𝑟+𝜔𝑠 ≈ 𝜔𝑡, a condition known as Fermi resonance, leading to absurdly large contributions to the vibrational spectrum. There are some different strategies for removing Fermi resonances. [57, 109, 110] The computational complexity of VPT2 is naturally higher than that of a har- monic frequency analysis, which only requires second-order derivatives of the po- tential at a point. The bottleneck of VPT2 is the computation of all potential 44 4.5. Frequency Methods derivatives of the form 𝑉 ⁗𝑟𝑟𝑠𝑡. Most functionals allow up to second-order deriva- tives to be computed at a point. Further derivatives must be numerically com- puted with a finite displacement, implying that all second-order derivatives must be computed at 1 + 2(3𝑁 − 6) points. There are economical versions of VPT2 that reduces the cost by computing only a subset of the fourth-order derivatives, making the method applicable to larger systems. [111, 112] VPT2 with many options is implemented in the Gaussian software. [113] Bench- marking studies [59, 114] show a typical error of around 15 cm−1, an improvement over harmonic theory. However the presence of Fermi resonances sometimes makes VPT2 inapplicable. [5] 4.5.3 Semiclassical Dynamics A recently popularized idea for computing vibrational spectra is the Born-Oppen- heimer molecular dynamics (BOMD) method. Unlike harmonic frequency anal- ysis and VPT2 which analyzes the potential from a point, the BOMD method cal- culates a spectrum based on the trajectory of a BOMD simulation. [115, 116] The semiclassical BOMD simulation treats the nuclei as classical particles af- fected by a potential found by solving the Schrödinger equation with the Born– Oppenheimer approximation. This is possible thanks to the Hellman–Feynman theorem, which states that [88, 117] ?̂?𝜆 |𝜓𝜆⟩ = 𝐸𝜆 |𝜓𝜆⟩ d𝐸 d?̂? implies 𝜆𝜆 = ⟨𝜓 ∣ 𝜆 𝜆 𝜆 ∣𝜓𝜆⟩. (4.24)d d Setting 𝜆 to a nucleic coordinate gives an expression for the corresponding force. The motion of the nuclei is then computed with an integration scheme for New- tons equations, such as the Verlet integrator. [84] The result is a trajectory of the molecule, notably including the electric dipole. A common time-saving trick is to estimate the initial wavefunction or electron density based on values from the previous time step. In particular, the atom- centered density matrix propagation (ADMP) method [118–120] implemented in Gaussian [113] gives the electron density fictitious inertia such that its dynamics can be included in the simulation. ADMP is thus an performance improvement to plain BOMD. Given a trajectory of the electric dipole 𝜇,⃗ the BOMD method proceeds to com- pute the absorption spectrum as the inverse Fourier transform of the autocorrela- tion of the centered electric dipole 𝜇̃ = 𝜇⃗ − ⟨𝜇⟩⃗ : ∞ ∞ 𝜎(𝜔) ∝ 𝜔2∫ 𝑑𝜏 𝑒𝑖𝜔𝜏 ∫ 𝑑𝑡𝜇(̃ 𝑡) ⋅ 𝜇(̃ 𝑡 + 𝜏). (4.25) −∞ −∞ 45 Chapter 4. Computational Methods Verlet Solver on Anharmonic Oscillator Verlet HO Verlet AO Simulation Corrected !cor"t = 2sin( 1 2!sim"t) correction correction 1 2 1 4 prediction V (x) = 2x + 50x Exact HO Exact AO 0 0.2 0.4 0.6 0.8 1 Time step !0"t Figure 4.2: Principle and example of frequency shift compensation in the BOMD method. Left: by simulating a harmonic oscillator (HO) with finite time step, a correction formula can be derived and then applied to an anharmonic oscillator (AO). Right: AO frequency as a function of simulation time step, raw and with correction. This formula can be derived from linear response theory. [121, 122] Another way to think about it is through the Wiener–Khintchine theorem, which states that the (inverse) Fourier transform of the autocorrelation equals the squared absolute value of the (inverse) Fourier transform. Thus Eq. (4.25) relates the absorption spectrum to the Fourier spectrum. When running BOMD simulations in practice, some parameters must be cho- sen. The temperature or average mode energy is the most important, because anharmonic effects scale linearly with simulation temperature. [123] The energy distribution across modes can also be chosen in a few ways. [124] The duration of the simulation relates to the spectral resolution of the Fourier transform. Finally, the time step much be small enough to accurately describe the dynamics. If the time step is far too large, energy conservation breaks down. However, if the time step is only slightly too large, the consequence is a systematic frequency shift. [123] This frequency shift depends only on the time step and the integration scheme, but not on the molecule. For example, using Verlet integration with time step Δ𝑡 to simulate a harmonic oscillator with angular frequency 𝜔0 results in a trajectory with angular frequency 𝜔sim such that 𝜔simΔ𝑡 = 2 arcsin(12𝜔0Δ𝑡). [123] This knowledge can be used to run simulations with a relatively high time step, and the resulting frequency shift can be compensated for afterwards. Figure 4.2 shows this trick applied to a 1D anharmonic oscillator simulated at various large time steps. In spite of popular belief, the BOMD method does not adequately model an- harmonic frequency shifts. This is easily seen by considering a toy system of a 1D harmonic oscillator with a small quartic perturbation. The trajectory will then be periodic with a fundamental frequency that depends on the energy. To match the 46 Frequency !=!0 1.03 1.04 1.05 1.06 1.07 1.08 4.6. Hydrogen Bonds exact fundamental frequency of this quantum system, the temperature must sat- isfy 𝑘B𝑇 ≈ ℏ𝜔, which is 1–2 orders of magnitude too high for diatomic molecules. [125] Furthermore, the BOMD method predict that the first overtone is exactly twice the fundamental, contrary to VPT2 (see Eq. (4.22)) and experiment. [126] The strength of the BOMD method is rather to handle floppy system which do not have a unique equilibrium geometry. An example of this is microsolvated clus- ters where the water molecules can sit at many sites. [127, 128] Another notable example where the BOMD method performed better than harmonic analysis was on a capped dipeptide. [61] 4.6 Hydrogen Bonds Hydrogen bonds (H-bonds) are attractive non-covalent interactions, most often between hydrogen and either nitrogen, oxygen, or fluoride. The formal definition of a H-bond is not agreed upon, but there is a suggestion. Johnson and coauthors [129] suggest a classification of non-covalent interactions based on the electron density and its derivatives. Specifically, three values are computed at every point: the density 𝜌, the reduced density gradient (RDG) ∝ 𝜌−4/3|∇𝜌|, and the median eigenvalue 𝜆2 of ∇∇⊺𝜌. H-bonds are then said to occur where the density is large, the RDG is small, and where the median eigenvalue is negative. Figure 4.3 shows an example analysis of triglycine. The threshold value of 0.4 is chosen for the RDG, selecting a subset of 3D space. The subset is colored ac- cording to its sign(𝜆2)𝜌 value, and consists of four components, each marked with a colored boundary in both halves of the figure. Two of them (gold and red) a high density and are considered H-bond, while the other two (purple and blue) are consider van der Waals interactions. Figure 4.3: Non-covalent interaction analysis of a triglycine conformer. Left: scatterplot of electron density and reduced density gradient (RDG). Some regions are highlighted. Right: 3D structure of the molecule, together with the highlighted regions. 47 5 Results and Discussion This chapter summarizes the original results found in the papers. 5.1 IRMPD Spectroscopy of Proton-Bound Asparagine Dimers This section summarizes the findings of Ref. [1], which used IRMPD to study proton-bound dimers of asparagine (Asn). Specifically, the homo- and heterochi- ral dimers were analyzed with the intent of detecting minute chiral differences. Asparagine has four functional groups that can form hydrogen bonds (two amino, one carbonyl, and one carboxyl), and therefore it was believed that the two di- astereomers (the homo- and the heterodimer) would adopt significantly different structures (see Fig. 5.1) stabilized by chiral-specific intermolecular interactions, and that this would be visible in the IR spectrum. Because the laser intensity is somewhat fluctuating, the two diastereomers were analyzed simultaneously to rule out false differences. Theoretical The theoretical investigation started with a conformer search, which consisted of many parallel MD simulations in the DFTB+ software using the the density functional-based tight binding method. Many different initial geometries were obtained by randomly generating dihedral angles without concern to energy, and then optimizing the structure. The MD simulations were then started with velocities drawn from a 298K Maxwell–Boltzmann distribution. Frames of inter- est of the resulting trajectories were structurally optimized into conformers by the B3LYP functional with the 6-311++G** Pople basis set, with which their harmonic spectra were also calculated. Anharmonic VPT2 spectra were also calculated, but 49 Chapter 5. Results and Discussion Conf. DD-B1 Conf. DL-A1 Figure 5.1: The lowest Gibbs energy conformer of dd-Asn +2H and dl-Asn H + 2 , according to the G4MP2 method on a structure optimized with B3LYP-GB3BJ/6-311++G**. Inter- and intramolecular interactions are shown as dashed lines annotated with their length in Å. The conformers names contain A and B in reference to which functional group the protonated amino group binds to. are elided because of problems with Fermi resonances. Finally, their energies were computed with the Gaussian G4MP2 method. The most stable (lowest Gibbs en- ergy) conformer of each diastereomer is seen in Fig. 5.1. Three different types of conformers were found, differing in which functional group the protonated alpha amino group formed H-bonds with. [130] Type A im- plies an H-bond with the alpha amino group of the other moiety, and type B with the carboxyl group of the other moiety. Type Z is similar to type B, but also im- plies that the neutral moiety is an zwitterion, meaning the proton on the carboxyl has moved to the alpha amino group. Significantly, the most stable conformer of the homodimer was type B, while the heterodimer was predicted to adopt type A, a clear chiral effect seen in Fig. 5.1. Experimental To make the results more understandable, a brief summary of the experiment will be given. The experiment was conducted at Radboud University, using the free electron laser FELIX in the frequency range of 500–1875 cm−1 but also a table-top OPO laser in the frequency range of 3000–3600 cm−1. A solution of both enantiomers (d-Asn and 15N-labeled l-Asn) was prepared and delivered to the gas-phase using electrospray ionization. This produced dimers of three kinds: dd, dl, and ll. Because the l-enantiomers were isotopically labeled, the three kinds of dimers were mass resolved at 𝑚 = 265, 267, and 269Da. A quadrupole filter was used to direct two of the three kinds into a Penning trap. By means of Fourier transform ion cyclotron resonance, the mass spectrum of ions inside the trap were continuously measured. The absorption spectra could of both diastere- omers could then be determined by irradiating the trapped ions and measuring the amounts of whole and fragment molecules. 50 5.1. IRMPD Spectroscopy of Proton-Bound Asparagine Dimers 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 3400 3500 3600 Figure 5.2: Experimental IRMPD spectra of the three kinds of asparagine dimers: (top) dd-Asn H+, (middle) dl-Asn H+2 2 , and (bottom) ll-Asn H + 2 . The intensity scale is in arbitrary units, but the horizontal grid lines have constant spacing in all three frequency regions. The IRMPD rate is averaged over several scans to increase precision. Features of interest are highlighted with black triangles. The frequency range of 3000–3350 cm−1 is omitted because it contained no sharp features, only a very broad NH stretching. Figure 5.2 shows the minutely different IRMPD spectra of all three types of as- paragine dimers. The most noticeable effect is that some peak frequencies are non- constant affine functions of mass. For example, the frequency of NH-stretching is 3415, 3411, and 3407 cm−1 for dd, dl, and ll respectively. This frequency shift is due to the 15N-labeling and thus a proof of the presence of nitrogen in the ob- served vibrational mode. Inversely, the frequency of OH-stretching is 3540 cm−1 for all three kinds, proving that nitrogen does not participate in the corresponding vibrational mode. Following such logic, it is possible to assign vibrational modes to the observed peaks. Analysis Figure 5.3 shows a comparison of the experimental spectra and the pre- dicted spectra of the most stable conformer of each type. Theoretical frequencies are scaled with appropriate fixed constants. [58] The conformer type (A, B, or Z) has direct implications for the spectrum, for example the carboxyl stretching mode in the range of 1750–1850 cm−1. In this range, type A conformers have two close bands, because the two carboxyl group have similar chemical environments. Type B conformers, on the other hand, have one band redshifted, corresponding to the carboxyl group that participates in the intermolecular H-bond. Finally, type Z has only one band in this range, because the deprotonated carboxyl group has 51 555 556 557 565 565 565 591 591 591 759 762 782 782 835 835 854 849 1145 1147 1149 1296 1296 1296 1394 1398 1402 1413 1413 1413 1443 1446 1456 1456 1456 1590 1592 1594 1592 1597 1602 1692 1696 1702 1757 1761 1757 1784 1790 1786 3407 3411 3415 3493 3498 3503 3519 3519 3540 3540 3540 Chapter 5. Results and Discussion 600 800 1200 1400 1600 1800 3500 600 800 1200 1400 1600 1800 3500 Figure 5.3: Predicted spectra of stable asparagine conformers compared with experiment. The B3LYP functional with the N07D basis set is used for calculating the vibrational fre- quencies, which are then scaled and broadened. Two scaling constants are used: one below 2000 cm−1 and one above. shifted all the way to 1660 cm−1. In experiment, two peaks were seen in the 1800– 1900 cm−1 range, ruling out type Z as a dominant conformer. By combining the vibrational spectra of predicted conformers with the afore- mentioned isotope shifts, it was possible to assign vibrational modes to all fre- quencies greater than 1500 cm−1, and many lesser. The full assignment with proof is given in the discussion of the original paper. [1] The colors of the bands in Fig. 5.3 refers to this assignment. Red colors are used to indicate movement of an oxygen, such as the alcohol stretching at 3540 cm−1. Green indicates movement of the nitrogen in the side chain amino group, which occurs in the symmetric and anti- symmetric stretching at 3407 and 3519 cm−1. Blue is used in the modes where the nitrogen of the alpha amino group moves, such as the umbrella motion of the pro- tonated amino group at 1500 cm−1. Colors are combined when multiple apply, for example the carboxyl group at 1790 cm−1 is colored both red (oxygen) and blue (alpha nitrogen). The assignment of vibrational modes can be used to infer the conformer type in a more systematic fashion. By contrasting the experimental frequencies of a set of modes with the theoretical modes of a conformer, it is possible to give a grade to the combination of theoretical method and conformer. Figure 5.4 shows precisely this for some five popular theoretical methods and the most stable conformer of each type, using two measures. The first measure is the root-mean-square-error, 52 5.1. IRMPD Spectroscopy of Proton-Bound Asparagine Dimers 30 20 10 0 0.2 0.4 0.6 B3LYP/ B3LYP/ B3LYP/ M062X/ B97XD/ B3LYP/ B3LYP/ B3LYP/ M062X/ B97XD/ N07D aug-cc-pVDZ 6-311++G** 6-311++G** 6-311++G** N07D aug-cc-pVDZ 6-311++G** 6-311++G** 6-311++G** Figure 5.4: Quantified difference between experimental and predicted spectra.√The top half shows the root-mean-square-error of predictions, and the bottom half shows 1 − 𝑆2, where 𝑆 is the Pearson correlation coefficient. √ and the second is 1 − 𝑆2, where 𝑆 is the Pearson correlation coefficient. The theoretical frequencies are scaled with appropriate fixed constants. [58] Figure 5.4 shows that the B3LYP functional with the relatively small N07D ba- sis set is marginally better than other methods, and that type B conformers best predict the experimental spectra of both diastereomers. For the homodimer this is expected, but the most stable conformer of the heterodimer is DL-A1. How- ever, there one more conformer of the heterodimer, DL-B3, which is within 𝑘B𝑇 of DL-A1. The presence of both would explain the mixed indications from the experiment: that type A is better around 1900 cm−1 but type B is better around 1400 cm−1. Summary The existence of a chiral effect was not able to be proven experimen- tally. While the initial belief was that the spectra would look different due to the homodimer adopting a type B conformer and the heterodimer type A, in reality the experiment showed very similar spectra. The biggest observed difference was in the shoulder feature at 1450 cm−1. However, this difference is quite small and is arguable an effect of the 15N labeling. Future experiments at cryogenic tem- peratures, using techniques like cryogenic traps or storage rings, would increase resolution of features and enhance capacity for chiral differentiation. In summary, the homo- and heterochiral proton-bound dimers of asparagine were studied with simultaneous IRMPD spectroscopy in the frequency ranges of 500–1875 cm−1 and 3000–3600 cm−1. By using 15N labeling and theoretical meth- ods, it was possible to assign vibrational modes to most frequencies. No chiral effect was proven to exist. 53 Chapter 5. Results and Discussion 5.2 IRMPD–VUV Spectroscopy of Neutral Pentaalanine This section summarizes the results of Ref. [2], which investigated neutral pen- taalanine (Ala5) with IRMPD–VUV spectroscopy. In proteins, long sequences of alanine residues are known to fold into alpha helices, and so it was curious to see if gas-phase Ala5, b would do the same. This experiment was also interesting because it was the largest molecule yet for IRMPD–VUV spectroscopy to be ap- plied to. It was not obvious at the time that fragmentation would occur, because in IRMPD–VUV the molecular beam is irradiated only for a short time, and larger molecule naturally take longer to fragment. Experimental The experiment took place at Radboud University, and used FE- LIX in the frequency range of 340–1820 cm−1. The sample of Ala5 in dehydrated form was delivered to the gas phase using MALDI. A supersonic jet expansion of argon gas cooled down the molecules and swept them towards a skimmer. Only the central, coolest part (approximately 10K) passed through the skimmer and formed a molecular beam. That beam was then exposed to two laser beam pulses: first an IR pulse from FELIX of duration 7 µs and energy 30–80mJ1 (depending on frequency), and then a VUV pulse of duration 3ns and energy 1µJ. The resulting ions were then counted by a reflectron-type time-of-flight mass spectrometer. Because FELIX pulses are subdivided into micropulses separated by 1ns peri- ods, the 7 µs FELIX pulse comprises 7000 micropulses, between which IVR oc- curs. IVR has a timescale of 100 fs, so it is safe to assume that the absorbing mode is relaxed after each 1ns period of darkness. Figure 5.5 show the experimental IRMPD–VUV spectrum of Ala5 in the fre- quency range of 340–1820 cm−1. The spectrum will be discussed after the theory. 1Ref. [2] incorrectly writes the unit as mW. Ala5 IRMPD-VUV experiment. 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 Frequency (cm!1) Figure 5.5: Experimental IRMPD–VUV spectrum of neutral Ala5, obtained with FELIX. 54 Intensity (arb.) 5.2. IRMPD–VUV Spectroscopy of Neutral Pentaalanine C G+ 0 G– χi+1 A+ 180 A– ψi ω Ti φ ψCN φi+1 ωC χ1 i=1–4 Figure 5.6: Definition of dihedral angles of Ala5. The part is brackets should be read as repeated four times. There are four types of dihedral angles: C-N-C-C (𝜑), N-C-C-N (𝜓), C-C-N-C (𝜔), and N-H-H-C (𝜒). Some exceptions occur at the endpoints, for example the 𝜑N angle is defined by H̄NCC, where H̄ is the center between the amino hydrogens. The inset shows the rule for classifying dihedreal angles as cis (C), gauche (G), anticlinal (A), or trans (T). Because the 𝜔 angles are locked into trans configuration, and the 𝜒 angles only affect non-interacting methyl groups, conformers can be defined from only the 𝜑 and 𝜓 angles. Theoretical Ala5 has a simple linear structure; neglecting methyl rotations, the conformer is determined by 10 dihedral angles of the backbone. Figure 5.6 shows the definition of all dihedral angles in the molecule. Because Ala5 is a relatively large molecule, it is not feasible to sample its con- formational space with energy-blind methods. Instead, the basin-hopping algo- rithm scan from the Tinker software was employed to generate conformers. A low-energy subset of these were further optimized with the B3LYP functional and the Jun-cc-pVTZ basis set, and their harmonic frequencies were calculated with the same method. The CBS-4M method was used to compute the Gibbs ener- gies at 400K, an estimate of the temperature after laser desorption. To quanti- tatively determine the existence and location of hydrogen bonds (H-bonds), the NCI method [129] was employed. Figure 5.7 shows the structure of the 8 most stable conformers, which have a combined abundance of 95%, and are logically named A1–A8 after abundance rank at 400K. Additionally, Fig. 5.7 shows the H-bonds of each conformer. Most of the listed conformers form a closed loop by connecting the carboxyl group to the amino or carbonyl group of the other end. This is not realistic for a protein, and detracts from Ala5 as a model system for helices. 55 Chapter 5. Results and Discussion H-bond strength (a!30 ) 0.043 0.038 0.033 0.028 0.023 0.018 OO OO A1 (0.0) 25% HH HHNN NN OO HH TCG−A+G+G+G+CG−A+ H H NN NN NN HH HH HH OO OO OO OO OO A2 (0.7 20% HH HH) NN NN OO HH A−CA−A+G+G−G−G+G+A− H H NN NN NN HH HH HH OO OO OO OO OO H H A3 (1.6) 15% H HNN NN OO HH H N N N A−CG−G+G+G−G−G+G+G+ H N N NHH HH HH OO OO OO OO OO H H A4 (2.9) 10% H HNN NN OO HH TCA−CG−G+G+G− − H N N N G A+ H N N NHH HH HH OO OO OO OO OO A5 (3.0) 10% HH HHNN NN OO HH A−CA− H N N N CG−G+G+G−G−A+ H N N NHH HH HH OO OO OO OO OO A6 (4.4) 6% HH HHNN NN OO HH A+CG−G+G+A− H N N N G−CA−G+ H N N NHH HH HH OO OO OO OO OO A7 (5.2) 5% HH HHNN NN OO HH A−CG−CA−G+ H N N N G+G−G−T H N N NHH HH HH OO OO OO OO OO H H A8 (5.6) 5% H HNN NN OO HH − − − + + − − − + HH NN NN NA G G G G A G CA G NHH HH HH OO OO OO Figure 5.7: Structure and hydrogen bonds (H-bonds) of the 8 most stable conformers of Ala5. The left column shows the name, Gibbs energy at 400Kin kJ/mol, abundance, and backbone dihedral angles of each conformer. The right column shows the locations of the H-bonds of the corresponding structures. Theirs strengths are indicated by the color, and non-covalent interactions just short of the H-bond definition (𝜌 ∈ [15, 18]× 10−3a −30 ) are dashed. 56 5.2. IRMPD–VUV Spectroscopy of Neutral Pentaalanine Experiment Prediction CH3-wa COH-be am-III CH3-um am-II am-I conformer-speci-c NH-wa NH2-tw/wa NH-ro CH-ro CH3-sc NH2-sc COOH a A1 25% b A2 20% c A3 15% d A4 10% e A5 10% f A6 6% g A7 5% h A8 5% 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 Frequency (cm!1) Figure 5.8: Harmonic spectra of the 8 most abundant conformers as colored lines, com- pared with the experimental IRMPD–VUV spectrum as black lines. The top of the figure lists important vibrational modes at their frequency. The abbreviations used are: wagging (wa), twisting (tw), bending (be), rocking (ro), amide (am), umbrella (um), and scissoring (sc). 57 Intensity (arb.) Chapter 5. Results and Discussion Analysis The harmonic spectra of these conformers can be seen in Fig. 5.8, to- gether with the experimental spectrum for comparison. Of particular interest is the carboxyl C=O stretching mode at 1760 cm−1, clearly resolved in experiment but predicted by only A1, A3, and A7. In A1 and A7 the carboxyl group is free to move, and in A3 it is only weakly bond by an H-bond. In all other conformers, the carboxyl group is connected by a strong H-bond to some other group, which causes its stretching frequency to decrease. Another feature of interest is the car- boxyl COH bending mode at 1135 cm−1, which is correctly predicted by A1 and A7, but shifted to 1190 cm−1 by A3. Thus there is strong evidence for the presence of A1 and/or A7, with the former being more likely due to the computed energies. A1 and A7 cannot, however, explain the wide peak at 630 cm−1, likely a combi- nation of several modes due to its width. A3 has many modes in the vicinity, but also A2, A6, A7, and A8 are options. Another feature not explained by any of the mentioned conformers is the lack of absorption at 820 cm−1, where only A4 (and to a lesser degree A8) has a band. It is therefore likely that the experiment contains many conformers, a conclusion consistent with the computed abundances. Dynamical Spectra In addition to its harmonic analysis, the spectrum of A1 was also determined with the BOMD method using the same functional, though with a smaller basis set: N07D. The BOMD simulations consisted of 30 runs using a time step of 0.5 fs, each lasting for 2–3ps. The runs were made with Gaussian using the ADMP keyword. A consequence of the relatively√short run duration 𝜏 is that the spectral fre- quency resolution is limited by 1/( 3𝜏). In the worst case of 𝜏 = 2ps, this res- olution becomes 10 cm−1. Below 1000 cm−1, this value is greater than the 1% linewidth assumed for FELIX, effectively broadening the far-IR BOMD spectra more than experiment. At temperatures 50K (used in Ref. [61]) and 100K, the results were very similar to the harmonic spectrum, other than the aforementioned BOMD broadening. This can be understood by the fact that the typical deviation from equilibrium is too small to explore anharmonic effects because 𝑘B𝑇 ≪ ℏ𝜔. On the other hand, at 500K and 1000K, the molecule eventually or immediately fragmented in simulation. A quick calculation shows that at 400K the total energy in all 153 vibrational modes of Ala5 would sum to 624kJ/mol, exceeding the lit- erature [131] C-C bond energy of 346kJ/mol. This means that fragmentation is permitted but not necessarily probable. To evaluate whether the time step of 0.5 fs had shifted the simulated BOMD spectra, the effect of time step on simulated frequencies was systematically inves- tigated. Actually, as one finds from dimensional analysis, the relative shift depends only on the product 𝜈Δ𝑡 of the frequency and the time step. It was therefore suf- ficient to simulate a few small molecules in order to find this relation. 58 5.2. IRMPD–VUV Spectroscopy of Neutral Pentaalanine #10-3 15 BOMD frequencies Fitted power law 10 5 0 0 0.05 0.1 0.15 Harmonic frequency#"t Figure 5.9: Dimensionless BOMD frequency shift of nine small biomolecules. The BOMD frequencies were obtained by fitting Gaussian functions to the BOMD spectra of 1K sim- ulations. The error bar size is equal to the full width half maximum of the corresponding peak in its BOMD spectrum. A power law is fitted to the the frequency shifts. Figure 5.9 shows the BOMD frequencies of a few small biomolecules simulated at 1K with a rather large time step. The dimensionless frequency shift is well de- scribed by a power law: (𝜈D − 𝜈H)Δ𝑡 = 2.95(𝜈HΔ𝑡)3.25, (5.1) where 𝜈H is the harmonic and 𝜈D is the BOMD frequency. Thus when Ala5 was simulated using a time step of 0.5 fs, the frequency shift of its modes (below 1900 cm−1) was at most 2 cm−1. Had the time step been 1 fs, the frequency shift would have been up to 9 cm−1, a noticeable amount. In principle Eq. (5.1) could be used to “unshift” the BOMD spectra to normal, but this was not done. Summary The IRMPD–VUV spectrum of Ala5, the largest molecule studied with this method, was determined in the frequency range of 340–1820 cm−1. A theoretical investigation of the molecule predicted many populated conformers, and a comparison of spectra confirmed this. Specifically, the carboxyl group was very informative when inferring the structure, but also caused a loop-like rather than a helix-like structure. Future experiments could seal this functional group to improve the chances of seeing a helix. On a side note, the BOMD method was used to predict spectra, but did not offer better results than harmonic theory, in spite of its larger computational cost. The effect of time step on BOMD frequencies was investigated and found to be systematic, implying that frequency errors are correctable. 59 (BOMD!Harm.)#"t Chapter 5. Results and Discussion 5.3 IR–UV Ion Dip Spectroscopy of Phenylated Polyalanines This section summarizes the findings of Ref. [4], which intended to overcome dif- ficulties encountered in the study of Ala5. One of the conclusions of that study was that Ala5 adopted a mixture of conformers. Because that experiment was not conformer selective, it was difficult to say exactly which conformers were present. The present paper solves this by replacing one alanine residue with phenylalanine. Another problem was that the carboxyl group had a tendency to form strong H- bonds with the opposite end, and in doing so prevented a helical structure. Hence, caps have been added to both termini. The present paper studied the two capped phenylated polyalanine peptides Ac- Ala-Ala-Phe-Ala-NH2 (AAFA) and Ac-Ala-Ala-Phe-Ala-Ala-NH2 (AAFAA). Utilizing the added phenyl group, conformer-specific IR–UV ion dip spectroscopy was performed in the frequency range of 300–1900 cm−1 using FELIX and also 3200–3600 cm−1 using a table top OPO laser. Theoretical An extensive conformational search was carried out for both AAFA and AAFAA, the results of which is seen in Fig. 5.10. To begin with, four par- allel conformer searches were executed for each molecule. The purpose of this redundancy was to estimate the risk of missing conformers in the searches. Each search consisted of the basin-hopping Tinker scan generating 25 000 conformers. All found conformers were then optimized with the semi-empirical PM7 method, AAFA-1 (0.00) AAFA-2 (23.85) AAFA-3 (26.29) AAFA-4 (26.49) AAFAA-1 (0.00) AAFAA-2 (3.36) AAFAA-3 (3.99) AAFAA-18 (23.82) Figure 5.10: Selected conformers of AAFA and AAFAA. Their corresponding Gibbs energies are given in kJ/mol. AAFAA-18 is included because of its predicted spectrum. 60 5.3. IR–UV Ion Dip Spectroscopy of Phenylated Polyalanines O O H H H N N NH N N H H O O O AAFA-1 Ph Figure 5.11: The strongly dominant conformer AAFA-1, as a 3D structure and as a skele- ton formula. The bright cyan lines show the H-bonds that stabilizes its β-hairpin structure. which has a better correlation with DFT. The most stable conformers, meaning those with energy within 21kJ/mol = 𝑘B ⋅ 2500K of the minimum, were further optimized with B3LYP/Jun-cc-pVTZ and finally evaluated with CBS-4M. The results for AAFA was remarkable: within 1000 iterations, each search found the same extremely stable conformer, named AAFA-1 and seen in Fig. 5.11. This conformer has a Gibbs energy 24kJ/mol lower than any other, because of its four strong H-bonds, a tetramer configuration known as β-hairpin. [28, 132] On the other hand, the results for AAFAA were not as clear. The parallel con- former searches did not produce a common most stable (lowest Gibbs energy) conformer, suggesting that 25 000 iterations were insufficient to reliably sample the conformational space of AAFAA. For this reason, a new generation of four more searches of 25 000 iterations each were performed. This new generation pro- duced a new most stable conformer, but simultaneously failed to find the top three most stable conformers from the previous generation. It was clear that this type of search was not reliable for AAFAA, and other strategies were considered. Another conformer search was then performed in the CREST software, which uses a sophisticated replica-exchange MD algorithm. Similarly to before, this con- former search found a new most stable conformer, but failed to find other low- energy conformers. In fact, the top three most stable conformers had only been found once — in three different searches. At this point, the non-reproducible na- ture of the searches became an interesting message in and of itself. To better understand why the conformational searches of AAFAA were failing, another generation of four searches of 25 000 iterations each was carried out for the capped pentaalanine peptide (AAAAA). These were similarly inefficient at finding a most stable conformer, likely due to the large size of the molecule. In a final act of desperation for AAFAA conformers, phenyl groups were sub- stituted onto the central residue of the nine most stable AAAAA conformers, and the molecule was optimized using B3LYP/Jun-cc-pVTZ. This resulted in AAFAA conformers with a very similar backbone structure. Unfortunately, none of the conformers produced were significant. Moreover, as seen in Fig. 5.12, the addi- tion of the phenyl group did not preserve conformer rank. 61 Chapter 5. Results and Discussion 20 AAAAA AAFAA 15 10 5 0 1 2 3 4 5 6 7 8 9 Rank of initial AAAAA conformer Figure 5.12: Computed energies of AAAAA and the corresponding AAFAA conformers. B3LYP was used to compute the electronic energy with added zero-point correction. The computation phenylated version of AAAAA-4 did not converge in time. Experimental The experimental setup was the same as for Ala5, except that the VUV laser setup was substituted for a tunable UV laser setup, covering the fre- quency range of 268.3–262.4nm = 37 280–38 120 cm−1. To begin with, a REMPI scan was performed in this interval for both molecules, revealing resonances (3 for AAFA, 1 for AAFAA) in a subset of the range, shown in Fig 5.13. The UV laser was then parked on each REMPI resonance while the IR laser scanned its entire range. Figure 5.14 shows the resulting three IR–UV ion dip spectra of AAFA, one for each resonance in its REMPI spectrum. The three spec- tra are extremely similar, proving that they all three resonances come from the same conformer. This is consistent with the prediction of a dominant conformer. Looking back at Fig. 5.13. the leftmost peak at 𝜈0 = 37 498 cm−1 is believed to be the fundamental transition, and the other two are most likely vibrational overtones 37 515 cm−1 = 𝜈0+17 cm−1 and 37 530 cm−1 = 𝜈0+32 cm−1. Indeed, the ground state vibrational spectrum calculated with B3LYP/Jun-cc-pVTZ contains two small frequencies: 17.0 and 32.3 cm−1. AAFA 37498 AAFAA 37562 37515 37530 37475 37500 37525 37550 37500 37525 37550 37575 UV frequency (cm!1) Figure 5.13: Experimental REMPI spectra of AAFA and AAFAA. The curves show the ion yields of the molecules, averaged over multiple scans. Each of the labeled peaks in the AAFA spectra was used for ion dip spectroscopy. 62 Ion yield (arb.) B3LYP EE+ZPE (kJ/mol) 5.3. IR–UV Ion Dip Spectroscopy of Phenylated Polyalanines UV freq. (cm!1) 37498 37515 37530 400 500 600 700 800 900 1000 1100 1200 1300 1400 1450 1500 1550 1600 1650 1700 1750 3250 3300 3350 3400 3450 3500 3550 IR frequency (cm!1) Figure 5.14: Experimental IR–UV ion dip spectra of AAFA. Red, yellow, and green colors used to draw the absorption signal signify the UV frequency used, as seen in the legend. Above 3000 cm−1, the spectra are vertically offset to reveal otherwise identical features. Below 800 cm−1, there is a difference in saturation, possibly because the IR intensity was not correctly recorded. Overall the spectra are very similar and most likely originate from the same conformer. Analysis Figure 5.15 shows the predicted harmonic spectra compared to the ion dip experiment of both molecules. Starting with AAFA, where the conformation search was unambiguous, the agreement between experiment and theory is quite good. In the stretching re- gion above 3000 cm−1, four out of five predicted bands align with experiment. The erring band is a NH stretching mode at 3295 cm−1, corresponding to N terminus NH group. This group participates in the strongest H-bond in the molecule, pos- sibly making its potential anharmonic and causing the harmonic prediction to fail. Continuing with AAFA, the agreement is a bit difficult to assess below 2000 cm−1 because of the sheer number of features. Most major features are accounted for by the predicted harmonic spectra of AAFA-1, and this conformer certainly fits the spectrum better than the alternatives. Overall, AAFA-1 was correctly predicted to be the dominant conformer and matches the experimental ion dip spectrum. The dominant conformer AAFA-1 adopts a β-hairpin structure, characterized by a β-turn in the middle. Phenylated alanine tripeptides also include this β-turn, [50] but longer alanine peptides are expected to form a α-helices. [133] Looking at AAFAA, the agreement is worse. The experimental spectrum con- tains a band at 3510 cm−1, believed to be free or weakly interacting NH2 stretching. The most stable conformer with this feature is AAFAA-18. Even if its high energy is overlooked, AAFAA-18 is not a great fit for the rest of the spectrum. Therefore, 63 Absorption (arb.) Chapter 5. Results and Discussion AAFA-1 Exp. 400 500 600 700 800 A90A0 F10A00-21100 1200 1300 1400 1500 1600 1700 3300 3400 3500 400 500 600 700 800 A90A0 F10A00-31100 1200 1300 1400 1500 1600 1700 3300 3400 3500 400 500 600 700 800 A90A0 F10A00-41100 1200 1300 1400 1500 1600 1700 3300 3400 3500 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 3300 3400 3500 IR frequency (cm!1) AAFAA-1 Exp. 400 500 600 700 800 A90A0 F10A00A1-1200 1200 1300 1400 1500 1600 1700 3300 3400 3500 400 500 600 700 800 A90A0 F10A00A1-1300 1200 1300 1400 1500 1600 1700 3300 3400 3500 400 500 600 700 800 A90A0 F10A00A1-110081200 1300 1400 1500 1600 1700 3300 3400 3500 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 3300 3400 3500 IR frequency (cm!1) Figure 5.15: Harmonic spectra of conformers of AAFA and AAFAA as colored lines versus experimental IR–UV ion dip spectra in black. The harmonic spectra are calculated with B3LYP/Jun-cc-pVTZ and scaled by 0.980 below 2000 cm−1 and 0.955 above. 64 Absorption (arb.) Absorption (arb.) 5.3. IR–UV Ion Dip Spectroscopy of Phenylated Polyalanines it is likely that the conformer searches of AAFAA failed to find the true conformer seen in experiment. The failure of a large conformer searches to fully explore AAFAA raises ques- tions about how many iterations are needed to explore a given molecule, or more generally when a conformer search can be said to have converged. The present paper used a naive strategy of running multiple searches in parallel, and requiring their results to be similar to each other. This is computationally inefficient because the same structures are optimized and explored multiple times. Instead, the search algorithm could be rewritten to remember how many times each conformer have been found, and terminate only if all the most stable conformers have been found multiple times. Summary The IR–UV ion dip spectra of AAFA and AAFAA were successfully recorded in the frequency ranges of 300–1900 and 3200–3600 cm−1. Ironically, both molecules adopted a singe dominant conformer, so the conformer-specific nature of the experiment was unnecessary. The dominant conformer of AAFA was found to be a β-hairpin structure common to multiple tetramers, and was validated on both Gibbs energy and vibrational spectrum. The slightly larger AAFAA molecule proved a challenge to analyze. Multiple large (𝑁 > 200 000) conformer searches were carried out, but the results were not conclusive. From comparison with experiment, it seems likely that the true conformer was not found. This is a sobering result about the validity of conformer searches done on large molecules, which frequently employ far smaller conformer searches. Corresponding AAFAA and AAAAA conformers were compared and found to have differing energy ranks, implying that the addition of a phenyl group changes which conformers are the most stable in a molecule. This is interesting because phenyl groups are often added onto molecules to enable IR–UV ion dip spec- troscopy, but the conclusions about such phenylated molecules should not imme- diately be transferred to their original counterparts. 65 6 Conclusions and Outlook This chapter briefly presents the conclusions of the experimental papers and the theoretical methods. The implications of the latter on the field is discussed. Also, the present state of the local laboratory is summarized. Experimental Four spectroscopy techniques were applied in the experiments of this thesis. IRMPD and IRMPD–VUV spectroscopy can be applied to arbitrary charged and neutral molecules respectively, in contrast to REMPI and IR–UV ion dip spectroscopy, which both require a chromophore. The three IR techniques are in tandem with theoretical predictions sensitive to the molecular structure. In par- ticular, positions of IR bands in the mid-IR range tell the chemical environments of functional groups, which allows for many conformers to be excluded. Further- more, the spectrum in the far-IR range is a fingerprint that allows a specific con- former to be confirmed. The IRMPD study of labeled and unlabeled asparagine dimers [1] demonstrated how the vibrational modes of a molecule can be unambiguously assigned to exper- imental bands. By tracing how the frequency minutely changed as a function of mass labeling, it was possible to tell how much the labeled atom species partici- pated in a given band. The study also demonstrated how the IR spectra of two similar molecules could be recorded simultaneously, eliminating laser fluctuations as a source of false differences. The experimental investigation of pentaalanine [2] stressed a known weakness of the IRMPD–VUV method: that it is not conformer-sensitive. Because of this and due to the many populated conformers, it was not possible to precisely deter- mine which conformers were actually seen in experiment. The vibrational modes of the unique carboxyl group gave information about its chemical environment, 67 Chapter 6. Conclusions and Outlook which while helpful for excluding conformers, is not realistic as a model system for proteins. Finally, the IR–UV ion dip study of capped phenylated polyalanines [4] gave mixed results. Focusing on the tetramer for the moment, the experiment and the- ory both found a strongly dominant conformer containing a central β-turn. Its structure is common to other tetramers, and is known as β-hairpin. [132] This is distinct from the α-helix structure that longer polyalanines are predicted to adopt, meaning the future attempts to create gas-phase helices should use longer pep- tides. Theoretical The conformer search of AAFAA revealed that more than 200 000 iterations were not enough to sufficiently sample the conformational space. This finding is motivated by two facts: that the most stable conformers were found only once across several parallel runs, and that not one of their spectra matches the experiment. It is a troubling message, as many studies have been made with con- siderably fewer iterations, and may be similarly incomplete. With the knowledge of this pathology in conformer searching, future similar searches should include a test of method convergence. The study of AAFAA used a naive test: comparing the result of parallel runs. This is computationally ineffi- cient because it causes redundant calculations, but it does work. A better strategy would be to extend the basin-hopping method to track not only which conformers have been found, but also how many times they have been found. The latter could then be used as a measure of convergence of the search. The harmonic predictions of spectra using B3LYP or similar functionals are gen- erally quite good, and capture the differences between conformers, as seen in the experimental papers. However, harmonic theory does fail to describe some vibra- tional modes involving groups that participate in a strong H-bond, such as the pro- tonated amino group in the proton-bound dimer of asparagine. Most notably, the stretching mode of this group was several 100 cm−1 broad, clearly not describable by a single band. Also, its umbrella inversion mode was consistently mispredicted by harmonic theory. The anharmonic VPT2 method does generally increase the accuracy of individ- ual mode frequencies, and does not require an arbitrary scaling factor. Unfor- tunately, it occasionally produces some unphysically strong IR intensities, which worsens the overall agreement with experiment. Because of the unpredictable na- ture of this method, it could not be used to compare conformers. The BOMD method for spectrum generation was employed for pentaalanine, where it resulted in spectra similar to harmonic ones, and subsequently critically analyzed. One conclusion from this analysis is that requiring BOMD to accurately describe anharmonic shifts of fundamental frequencies results in multiple contra- 68 dictory constraints on the temperature. Specifically, each vibrational mode should have a distinct temperature of ℏ𝜔/𝑘B, which is hot enough to make the molecule dissociate. Another negative finding is that the BOMD method is incorrect for overtone already in diatomic systems, where it incorrectly predicts the first over- tone to be exactly twice of the fundamental. On a more positive note, the analysis of the BOMD method found that the fre- quency shifts due to the choice of time step are predictable and easily corrected, allowing accelerated simulations. In fact, the shifts are described by a universal re- lation dependent only on the integration scheme. Such a relation can be obtained by simulating a plain harmonic oscillator with the given integration scheme, and then be applied in post-processing of BOMD spectra to reduce the error. The uni- versal relations of the Verlet integrator and the integration scheme of the Gaussian ADMP method were provided in this work. Laboratory Part of this PhD project has been about building a chamber setup for experiments in the local GU laboratory, now known as the Giraffe. When I started out, there were design plans inspired by a setup at FELIX, but nothing was built. Since then, its design plans have been extended, the Giraffe has been assembled, and REMPI experiments have performed with it. The Giraffe features a switchable source system consisting of a gas source, a oven source, and a MALDI source. Molecules produced from the selected source are seeded into a supersonic jet expansion, and form a collimated molecular beam after passing through a skimmer. Optical windows allow the molecular beam to be irradiated, and any consequent ionization will accelerate the ions into a reflectron type TOF mass spectrometer. A REMPI experiment of toluene was performed using the gas source, with the purpose of testing the mass spectrometer. The resulting mass spectrum showed the toluene peak at 92Da with an FWHM of 0.043Da. This implies a resolution greater than 1 in 2000, sufficient for the study of peptides with less than 20 residues. In May 2025, a new IR laser (LaserVision) was delivered to the laboratory, and will soon be installed at the time of writing. When this laser is operational, the cur- rent REMPI setup will be able to perform IR–UV ion dip spectroscopy, enabling conformer-specific studies of small biomolecules. I hope that this development will be used by future students to advance the research of our group. 69 Acknowledgements My five years as a PhD student have been a transformative experience filled with learning, possible only thanks to some inspiring people I have met in Gothenburg and while traveling. I will attempt to list the most prominent. Although we only worked together for a brief amount of time, Vasyl Yatsyna taught me much about spectroscopy experiments and supported me in designing the Giraffe. His PhD defense was the first I attended, and it provided me with valuable insights. On the theoretical side, Mathias Poline from Uppsala helped me understand conformer searching and optimization, during the two papers we collaborated on. In the local laboratory, the specific expertise of Ruslan Chulkov was invaluable for understanding and optimizing our laser systems. Di Lu was always very helpful in answering questions about and finding design solutions for vacuum chambers. Manufacturing parts for the Giraffe was made possible thanks to Jan-Åke Wiman and his workshop. Installing high-voltage electrodes and water-cooling in turbo pumps was safely and reliably managed by Mats Rostedt. Development of the switchable sources were greatly aided by Viola D’mello. Many of the experiments were performed in labs outside of Sweden. At the ELETTRA synchrotron facility, I learned about coincidence spectroscopy data analysis from Robert Richter. My understanding of statistical physics of molecules and nanoparticles came from Klavs Hansen in the time we worked together. At the FELIX facility, the hard work of Kas Houthuijs and later Piero Ferrari made our experiments possible. The knowledge of Jos Oomens was also very useful In parallel with my research, I have taught in courses at Chalmers and GU. Thanks to Thomas Wernstål I was able to obtain teaching roles in mathematics, notably a course by Dennis Eriksson, whom I returned to work with many times. Elisabeth Eriksson was always helpful in finding substitute teaching opportunities and lending course literature. For ten years I have been active in the International Physicists’ Tournament, a physics competition to which Andreas Isacsson initially introduced me. Thanks to him, Jana Madjarova, and many others, this competition has expanded from a student activity to a supervised course. Finally, I owe the most to Vitali Zhaunerchyk, without whom I would not have met half of the listed people. He has been instrumental in all parts of my education as a researcher, and I am deeply appreciative for all of his contributions to my education. Thank you, Vitali! 71 Bibliography [1] Åke Andersson et al. “IRMPD Spectroscopy of Homo- and Heterochiral Asparagine Proton-Bound Dimers in the Gas Phase”. In: The Journal of Physical Chemistry A 125.34 (2021), pp. 7449–7456. doi: https://doi. org/10.1021/acs.jpca.1c05667. [2] Åke Andersson et al. “Indication of 310-Helix Structure in Gas-Phase Neu- tral Pentaalanine”. In: The Journal of Physical Chemistry A 127.4 (2023), pp. 938–945. doi: https://doi.org/10.1021/acs.jpca.2c07863. [3] Åke Andersson and Vitali Zhaunerchyk. “IR Spectroscopic Studies of Gas- Phase Peptides”. In: submission to publisher (2024). [4] Åke Andersson et al. “IR-UV Ion-Dip Spectroscopy of Capped Pheny- lated Polyalanines in the Gas Phase”. In: manuscript (2025). [5] Åke Andersson et al. “Structure of Proton-Bound Methionine and Tryp- tophan Dimers in the Gas Phase Investigated with IRMPD Spectroscopy and Quantum Chemical Calculations”. In: The Journal of Physical Chem- istry A 124.12 (2020), pp. 2408–2415. doi: https://doi.org/10.1021/ acs.jpca.9b11811. [6] Åke Andersson et al. “Single-Photon Hot-Electron Ionization of C70”. In: Physical Review A 107.1 (2023), p. 013103. doi: https://doi.org/10. 1103/PhysRevA.107.013103. [7] Åke Andersson. “Comment on “Cumulant Mapping As the Basis of Multi- Dimensional Spectrometry” by Leszek J. Frasinski, Phys. Chem. Chem. Phys., 2022, 24, 20776–20787”. In: Phys. Chem. Chem. Phys. 25.47 (2023), pp. 32723–32725. doi: https://doi.org/10.1039/D3CP02525J. [8] Anna Brandt, Åke Andersson, and Vitali Zhaunerchyk. “Parametric Cu- mulant Mapping: A Multidimensional Correlation Method Intended for Experiments with Fluctuating Event Rates”. In: submission to Physical Re- view B (2025). 73 Bibliography [9] Ken A. Dill et al. “The Protein Folding Problem”. In: Annual review of bio- physics 37 (June 2008), pp. 289–316. doi: 10.1146/annurev.biophys.37. 092707.153558. [10] Michael J. S. Dewar et al. “Development and use of quantum mechanical molecular models. 76. AM1: a new general purpose quantum mechanical molecular model”. In: J. Am. Chem. Soc. 107.13 (June 1985), pp. 3902–3909. doi: 10.1021/ja00299a024. [11] Gerd B. Rocha et al. “RM1: A reparameterization of AM1 for H, C, N, O, P, S, F, Cl, Br, and I”. en. In: J. Comput. Chem. 27.10 (2006), pp. 1101–1111. doi: 10.1002/jcc.20425. [12] Norman L. Allinger, Young H. Yuh, and Jenn Huei Lii. “Molecular me- chanics. The MM3 force field for hydrocarbons. 1”. In: J. Am. Chem. Soc. 111.23 (Nov. 1989), pp. 8551–8566. doi: 10.1021/ja00205a001. [13] James J. P. Stewart. “Application of localized molecular orbitals to the solution of semiempirical self-consistent field equations”. In: International Journal of Quantum Chemistry 58.2 (1996), pp. 133–146. doi: 10.1002/ (sici)1097-461x(1996)58:2<133::aid-qua2>3.0.co;2-z. [14] James J. P. Stewart. “Optimization of parameters for semiempirical meth- ods V: Modification of NDDO approximations and application to 70 ele- ments”. en. In: Journal of Molecular Modeling 13.12 (Dec. 2007), pp. 1173– 1213. doi: 10.1007/s00894-007-0233-4. [15] Jae Shick Yang et al. “All-Atom Ab Initio Folding of a Diverse Set of Pro- teins”. In: Structure 15.1 (Jan. 2007), pp. 53–63. doi: 10.1016/j.str.2006. 11.010. [16] James J. P. Stewart. “Application of the PM6 method to modeling proteins”. eng. In: J.Mol. Model. 15.7 (July 2009), pp. 765–805. doi: 10.1007/s00894- 008-0420-y. [17] John Jumper et al. “Highly accurate protein structure prediction with Al- phaFold”. en. In: Nature 596.7873 (Aug. 2021), pp. 583–589. doi: 10.1038/ s41586-021-03819-2. [18] John Jumper et al. “Applying and improving AlphaFold at CASP14”. en. In: Proteins: Struct., Funct., Bioinf. 89.12 (2021), pp. 1711–1721. doi: 10.1002/ prot.26257. [19] Eli Fritz McDonald et al. “Benchmarking AlphaFold2 on peptide structure prediction”. In: Structure 31.1 (Jan. 2023), 111–119.e2. doi: 10.1016/j.str. 2022.11.012. 74 Bibliography [20] Andreas Barth. “Infrared spectroscopy of proteins”. In: Biochimica et Bio- physica Acta (BBA) - Bioenergetics 1767.9 (Sept. 2007), pp. 1073–1101. doi: 10.1016/j.bbabio.2007.06.004. [21] Anouk M. Rijs and Jos Oomens. “IR Spectroscopic Techniques to Study Isolated Biomolecules”. In: Gas-Phase IR Spectroscopy and Structure of Bi- ological Molecules (2014), pp. 1–42. doi: 10.1007/128_2014_621. [22] Jérôme Mahé et al. “Can far-IR action spectroscopy combined with BOMD simulations be conformation selective?” en. In: Phys. Chem. Chem. Phys. PCCP 17.39 (Sept. 2015), pp. 25905–25914. doi: 10.1039/c5cp01518a. [23] Marie-Pierre Gaigeot Sjors Bakels and Anouk M. Rijs. “Gas-Phase In- frared Spectroscopy of Neutral Peptides: Insights from the Far-IR and THz Domain”. en. In: Chem. Rev. 120.7 (Apr. 2020). review, pp. 3233–3260. doi: 10.1021/acs.chemrev.9b00547. [24] Samantha Hume et al. “2D-Infrared Spectroscopy of Proteins in Water: Using the Solvent Thermal Response as an Internal Standard”. In: Ana- lytical Chemistry 92.4 (Feb. 2020), pp. 3463–3469. doi: 10 . 1021 / acs . analchem.9b05601. [25] Peter Hamm and Martin T. Zanni. “Ultrafast Two-Dimensional Infrared Spectroscopy of Proteins”. en. In: ed. by Gordon C. K. Roberts. Berlin, Hei- delberg: Springer, 2013, pp. 2692–2697. doi: 10.1007/978-3-642-16712- 6_130. [26] Michael Feig and Charles L Brooks. “Recent advances in the development and application of implicit solvent models in biomolecule simulations”. In: Current Opinion in Structural Biology 14.2 (Apr. 2004), pp. 217–224. doi: 10.1016/j.sbi.2004.03.009. [27] Jens Kleinjung and Franca Fraternali. “Design and application of implicit solvent models in biomolecular simulations”. In: Current Opinion in Struc- tural Biology. Theory and simulation / Macromolecular machines 25 (Apr. 2014), pp. 126–134. doi: 10.1016/j.sbi.2014.04.003. [28] Eric Gloaguen and Michel Mons. “Isolated Neutral Peptides”. en. In: ed. by Anouk M. Rijs and Jos Oomens. Vol. 364. Topics in Current Chemistry. Cham: Springer International Publishing, Jan. 2015, pp. 225–270. doi: 10. 1007/128_2014_580. [29] Matthias Wilm. “Principles of Electrospray Ionization”. In: Mol. Cell. Pro- teomics 10.7 (July 2011), p. M111.009407. doi: 10.1074/mcp.m111.009407. 75 Bibliography [30] Klaus Dreisewerd et al. “Fundamentals of matrix-assisted laser desorp- tion/ionization mass spectrometry with pulsed infrared lasers”. In: Interna- tional Journal of Mass Spectrometry. Special Issue: In honour of Franz Hil- lenkamp 226.1 (Mar. 2003), pp. 189–209. doi: 10.1016/S1387-3806(02) 00977-6. [31] Markus Gerhards. “Spectroscopy of Neutral Peptides in the Gas Phase: Structure, Reactivity, Microsolvation, Molecular Recognition”. en. In: Prin- ciples of Mass Spectrometry Applied to Biomolecules. John Wiley & Sons, Ltd, 2006. Chap. 1, pp. 1–61. doi: 10.1002/047005042X.ch1. [32] Holger Fricke et al. “Investigations of the water clusters of the protected amino acid Ac-Phe-OMe by applying IR/UV double resonance spectroscopy: microsolvation of the backbone”. English. In: Phys. Chem. Chem. Phys. PCCP 12.14 (Mar. 2010), pp. 3511–3521. doi: 10.1039/c000424c. [33] Yasaman Jami Alahmadi, Ameneh Gholami, and Travis D. Fridgen. “The protonated and sodiated dimers of proline studied by IRMPD spectroscopy in the N–H and O–H stretching region and computational methods”. In: Phys. Chem. Chem. Phys. 16 (48 2014), pp. 26855–26863. doi: 10.1039/ C4CP03104K. [34] Matthew F. Bush et al. “Effects of alkaline earth metal ion complexation on amino acid zwitterion stability: results from infrared action spectroscopy”. In: J.Am.Chem. Soc. 130.20 (2008), pp. 6463–6471. doi: 10.1021/ja711343q. [35] Ruxia Feng, Hong Yin, and Xianglei Kong. “Structure of Protonated Tryp- tophan Dimer in the Gas Phase Investigated by IRPD Spectroscopy and Theoretical Calculations”. In: Rapid Commun. Mass Spectrom. 30 (2016), pp. 24–28. doi: 10.1002/rcm.7615. [36] Oscar Hernandez, Béla Paizs, and Philippe Maître. “Rearrangement Chem- istry of a𝑛 Ions Probed by IR Spectroscopy”. In: Int. J. Mass Spectrom. Spe- cial Issue: MS 1960 to Now 377 (Feb. 2015), pp. 172–178. doi: 10.1016/j. ijms.2014.08.008. [37] Ronghu Wu and Terry B. McMahon. “An Investigation of Protonation Sites and Conformations of Protonated Amino Acids by IRMPD Spec- troscopy”. en. In:ChemPhysChem 9.18 (2008), pp. 2826–2835. doi: 10.1002/ cphc.200800543. [38] Hong Yin and Xianglei Kong. “Structure of Protonated Threonine Dimers in the Gas Phase: Salt-Bridged or Charge-Solvated?” In: J. Am. Soc. Mass Spectr. 26.9 (Sept. 2015), pp. 1455–1461. doi: 10.1007/s13361-015-1194- y. 76 Bibliography [39] Giel Berden et al. “An automatic variable laser attenuator for IRMPD spectroscopy and analysis of power-dependence in fragmentation spectra”. In: Int. J. Mass Spectrom. 443 (2019), pp. 1–8. doi: 10.1016/j.ijms.2019. 05.013. [40] Pascal Parneix, Marie Basire, and Florent Calvo. “Accurate modeling of infrared multiple photon dissociation spectra: the dynamical role of an- harmonicities”. In: J. Phys. Chem. A 117.19 (2013), pp. 3954–3959. doi: 10. 1021/jp402459f. [41] Vasyl Yatsyna et al. “Infrared action spectroscopy of low-temperature neu- tral gas-phase molecules of arbitrary structure”. In: Phys. Rev. Lett. 117.11 (2016), p. 118101. doi: 10.1103/physrevlett.117.118101. [42] Vasyl Yatsyna et al. “Conformational Preferences of Isolated Glycylglycine (Gly-Gly) Investigated with IRMPD-VUV Action Spectroscopy and Ad- vanced Computational Approaches”. In: J. Phys. Chem. A 123.4 (2019), pp. 862–872. doi: 10.1021/acs.jpca.8b10881. [43] Vasyl Yatsyna et al. “Competition between folded and extended structures of alanylalanine (Ala-Ala) in a molecular beam”. In: Phys. Chem. Chem. Phys. PCCP 21.26 (2019), pp. 14126–14132. doi: 10.1039/c9cp00140a. [44] D. ￿eha et al. “Structure and IR Spectrum of Phenylalanyl–Glycyl–Glycine Tripetide in the Gas-Phase: IR/UV Experiments, Ab Initio Quantum Chem- ical Calculations, and Molecular Dynamic Simulations”. English. In: Chem- istry – A European Journal 11.23 (Nov. 2005), pp. 6803–6817. doi: 10.1002/ chem.200500465. [45] H. Fricke et al. “Secondary structure binding motifs of the jet cooled tetrapep- tide model Ac–Leu–Val–Tyr(Me)–NHMe”. English. In: Phys. Chem. Chem. Phys. PCCP 9.32 (Aug. 2007), pp. 4592–4597. doi: 10.1039/b706519a. [46] I Hünig and K Kleinermanns. “Conformers of the peptides glycine-tryptophan, tryptophan-glycine and tryptophan-glycine-glycine as revealed by double resonance laser spectroscopy”. English. In: Phys. Chem. Chem. Phys. PCCP 6.10 (May 2004), pp. 2650–2658. doi: 10.1039/b316295h. [47] Karl N. Blodgett et al. “Conformation-Specific Spectroscopy of Asparagine- Containing Peptides: Influence of Single and Adjacent Asn Residues on In- herent Conformational Preferences”. English. In: J. Phys. Chem. A 122.44 (Nov. 2018), pp. 8762–8775. doi: 10.1021/acs.jpca.8b08418. [48] Joost M. Bakker et al. “Folding Structures of Isolated Peptides as Revealed by Gas‐Phase Mid‐Infrared Spectroscopy”. English. In:ChemPhysChem 6.1 (Jan. 2005), pp. 120–128. doi: 10.1002/cphc.200400345. 77 Bibliography [49] William H. III James et al. “Competition between Amide Stacking and In- tramolecular H Bonds in ￿-Peptide Derivatives: Controlling Nearest-Neighbor Preferences”. English. In: J. Phys. Chem. A 115.43 (Nov. 2011), pp. 11960– 11970. doi: 10.1021/jp2081319. [50] Wutharath Chin et al. “Gas Phase Formation of a 310-Helix in a Three- Residue Peptide Chain: Role of Side Chain-Backbone Interactions as Evi- denced by IR−UV Double Resonance Experiments”. In: J. Am. Chem. Soc. 127.34 (Aug. 2005), pp. 11900–11901. doi: 10.1021/ja052894z. [51] Woon Yong Sohn et al. “Local NH–𝜋 interactions involving aromatic residues of proteins: influence of backbone conformation and 𝜋𝜋* excitation on the 𝜋 H-bond strength, as revealed from studies of isolated model peptides”. In: Phys. Chem. Chem. Phys. PCCP 18.43 (2016), pp. 29969–29978. [52] Noel M. O’Boyle et al. “Confab - Systematic generation of diverse low- energy conformers”. In: J. Cheminf. 3.1 (Mar. 2011), p. 8. doi: 10.1186/ 1758-2946-3-8. [53] G. Landrum. RDKit. https://www.rdkit.org/. 2010. [54] Joshua A. Rackers et al. “Tinker 8: Software Tools for Molecular Design”. In: Journal of chemical theory and computation 14.10 (Oct. 2018), pp. 5273– 5289. doi: 10.1021/acs.jctc.8b00529. [55] Philipp Pracht et al. “CREST—A program for the exploration of low-energy molecular chemical space”. In: J. Chem. Phys. 160.11 (Mar. 2024). doi: 10. 1063/5.0197592. [56] W. Kohn and L. J. Sham. “Self-Consistent Equations Including Exchange and Correlation Effects”. In: Physical Review 140.4A (Nov. 1965), A1133– A1138. doi: 10.1103/PhysRev.140.A1133. [57] Peter R. Franke, John F. Stanton, and Gary E. Douberly. “How to VPT2: Accurate and Intuitive Simulations of CH Stretching Infrared Spectra Us- ing VPT2+K with Large Effective Hamiltonian Resonance Treatments”. In: The Journal of Physical Chemistry A 125.6 (Feb. 2021), pp. 1301–1324. doi: 10.1021/acs.jpca.0c09526. [58] Mathew D. Halls, Julia Velkovski, and H. Bernhard Schlegel. “Harmonic frequency scaling factors for Hartree-Fock, S-VWN, B-LYP, B3-LYP, B3- PW91 and MP2 with the Sadlej pVTZ electric property basis set”. In: Theor. Chem. Acc. 105.6 (2001), pp. 413–421. doi: 10.1007/s002140000204. 78 Bibliography [59] Ruiqin Xu et al. “Harmonic and anharmonic vibrational computations for biomolecular building blocks: Benchmarking DFT and basis sets by theo- retical and experimental IR spectrum of glycine conformers”. en. In: Jour- nal of Computational Chemistry 45.21 (2024), pp. 1846–1869. doi: 10.1002/ jcc.27377. [60] Marie-Pierre Gaigeot. “Theoretical spectroscopy of floppy peptides at room temperature. A DFTMD perspective: gas and aqueous phase”. In: Phys. Chem. Chem. Phys. PCCP 12.14 (2010), pp. 3336–3359. doi: 10 . 1039 / b924048a. [61] Sander Jaeqx et al. “Gas-Phase Peptide Structures Unraveled by Far-IR Spectroscopy: Combining IR-UV Ion-Dip Experiments with Born–Oppenheimer Molecular Dynamics Simulations”. English. In: Angew. Chem. - Int. Ed. 53.14 (Apr. 2014), pp. 3663–3666. doi: 10.1002/ange.201311189. [62] O. V. Boyarkin et al. “Intramolecular energy transfer in highly vibrationally excited methanol. I. Ultrafast dynamics”. In:The Journal ofChemical Physics 107.20 (Nov. 1997), pp. 8409–8422. doi: 10.1063/1.475041. [63] John C. Lindon, George E. Tranter, and David W. Koppenaal, eds. Ency- clopedia of spectroscopy and spectrometry. Third edition. Enthält Volume 1-4. Amsterdam: Elsevier, Academic Press, 2017. 1 p. [64] Vasyl Yatsyna et al. “Aminophenol isomers unraveled by conformer-specific far-IR action spectroscopy”. In:Phys. Chem.Chem.Phys. PCCP 18.8 (2016), pp. 6275–6283. doi: 10.1039/c5cp07426f. [65] D. Scuderi et al. “Chiral recognition of diols by complexation with (R)-(+)- 1-phenyl-1-propanol: a R2PI approach in supersonic beam”. en. In: Physical Chemistry Chemical Physics 5.20 (Oct. 2003), pp. 4570–4575. doi: 10.1039/ B308178H. [66] I-Chung Lu et al. “Ion-to-Neutral Ratios and Thermal Proton Transfer in Matrix-Assisted Laser Desorption/Ionization”. In: Journal of the American Society for Mass Spectrometry 26.7 (July 2015), pp. 1242–1251. doi: 10 . 1007/s13361-015-1112-3. [67] Michael. Karas, Doris. Bachmann, and Franz. Hillenkamp. “Influence of the wavelength in high-irradiance ultraviolet laser desorption mass spec- trometry of organic molecules”. In: Analytical Chemistry 57.14 (Dec. 1985), pp. 2935–2939. doi: 10.1021/ac00291a042. [68] Murray V. Johnston. “Supersonic jet expansions in analytical spectroscopy”. In: TrAC Trends in Analytical Chemistry 3.2 (Feb. 1984), pp. 58–61. doi: 10.1016/0165-9936(84)87055-7. 79 Bibliography [69] John M Hayes. “Analytical spectroscopy in supersonic expansions”. In:Chem- ical Reviews 87.4 (1987), pp. 745–760. [70] RJ Bakker et al. “Short-pulse effects in a free-electron laser”. In: IEEE jour- nal of quantum electronics 30.7 (1994), pp. 1635–1644. [71] D. Oepts, A. F. G. van der Meer, and P. W. van Amersfoort. “The Free- Electron-Laser user facility FELIX”. In: Infrared Phys. Techn. Proceedings of the Sixth International Conference on Infrared Physics 36.1 (Jan. 1995), pp. 297–308. doi: 10.1016/1350-4495(94)00074-U. [72] Riet,Michel et al. “Report on the FELIX Wavelength Range Extension”. en. In: (2023). doi: 10.18429/JACOW-FEL2022-MOP33. [73] Markus Gerhards. “High energy and narrow bandwidth mid IR nanosec- ond laser system”. In: Optics Communications 241.4 (Nov. 2004), pp. 493– 497. doi: 10.1016/j.optcom.2004.07.035. [74] WR Bosenberg and Dean R Guyer. “Broadly tunable, single-frequency optical parametric frequency-conversion system”. In: JOSA B 10.9 (1993), pp. 1716–1722. [75] M. V. Hobden. “Phase-Matched Second-Harmonic Generation in Biaxial Crystals”. In: Journal of Applied Physics 38.11 (Oct. 1967), pp. 4365–4372. doi: 10.1063/1.1709130. [76] Andrew P. Bowman et al. “Ultra-High Mass Resolving Power, Mass Ac- curacy, and Dynamic Range MALDI Mass Spectrometry Imaging by 21-T FT-ICR MS”. In: Analytical Chemistry 92.4 (Feb. 2020), pp. 3133–3142. doi: 10.1021/acs.analchem.9b04768. [77] Noel M. O’Boyle et al. “Open Babel: An open chemical toolbox”. In: Jour- nal of Cheminformatics 3.1 (Oct. 2011), p. 33. doi: 10.1186/1758-2946- 3-33. [78] K Somani Arun, Thomas S Huang, and Steven D Blostein. “Least-Squares Fitting of Two 3-D Point Sets”. In: IEEE Trans. Pattern Anal. Mach. Intell. PAMI-9.5 (Sept. 1987), pp. 698–700. doi: 10.1109/tpami.1987.4767965. [79] W. K. Hastings. “Monte Carlo sampling methods using Markov chains and their applications”. In: Biometrika 57.1 (Apr. 1970), pp. 97–109. doi: 10. 1093/biomet/57.1.97. [80] Jiann-Horng Lin et al. “A Markov Chain Sampling Method for Conforma- tional Energy Optimization of Peptides”. In: International Journal of Engi- neering and Technical Research 7.6 (2017), p. 264981. 80 Bibliography [81] Stephen R. Wilson and Weili Cui. “Conformation Searching Using Simu- lated Annealing”. en. In: ed. by Kenneth M. Merz and Scott M. Le Grand. Boston, MA: Birkhäuser, 1994, pp. 43–70. doi: 10.1007/978- 1- 4684- 6831-1_2. [82] Andrew Smellie, Steven L. Teig, and Peter Towbin. “Poling: Promoting conformational variation”. en. In: Journal of Computational Chemistry 16.2 (1995), pp. 171–187. doi: 10.1002/jcc.540160205. [83] Christoph Grebner et al. “Efficiency of tabu-search-based conformational search algorithms”. en. In: J. Comput. Chem. 32.10 (2011), pp. 2245–2253. doi: 10.1002/jcc.21807. [84] Ernst Hairer, Christian Lubich, and Gerhard Wanner. “Geometric numer- ical integration illustrated by the Störmer–Verlet method”. In: Acta numer- ica 12 (2003), pp. 399–450. [85] Yuji Sugita and Yuko Okamoto. “Replica-exchange molecular dynamics method for protein folding”. In:Chem.Phys. Lett. 314.1 (Nov. 1999), pp. 141– 151. doi: 10.1016/S0009-2614(99)01123-9. [86] Ruxi Qi et al. “Replica Exchange Molecular Dynamics: A Practical Ap- plication Protocol with Solutions to Common Problems and a Peptide Ag- gregation and Self-Assembly Example”. In: Methods in molecular biology (Clifton, N.J.) 1777 (2018), pp. 101–119. doi: 10.1007/978-1-4939-7811- 3_5. [87] R. G. Woolley and B. T. Sutcliffe. “Molecular structure and the Born— Oppenheimer approximation”. In:Chemical PhysicsLetters 45.2 (Jan. 1977), pp. 393–398. doi: 10.1016/0009-2614(77)80298-4. [88] Frank Jensen. Introduction to computational chemistry. 2. ed., repr. Includes bibliographical references and index. Chichester: Wiley, 2009. 599 pp. [89] R. Ditchfield, W. J. Hehre, and J. A. Pople. “Self-Consistent Molecular- Orbital Methods. IX. An Extended Gaussian-Type Basis for Molecular- Orbital Studies of Organic Molecules”. In: The Journal of Chemical Physics 54.2 (Jan. 1971), pp. 724–728. doi: 10.1063/1.1674902. [90] Thom H. Dunning. “Gaussian basis sets for use in correlated molecular cal- culations. I. The atoms boron through neon and hydrogen”. In: The Jour- nal of Chemical Physics 90.2 (Jan. 1989), pp. 1007–1023. doi: 10.1063/1. 456153. [91] Ewa Papajak et al. “Perspectives on basis sets beautiful: seasonal plantings of diffuse basis functions”. In: Journal of chemical theory and computation 7.10 (2011), pp. 3027–3034. doi: 10.1021/ct200106a. 81 Bibliography [92] P. Hohenberg and W. Kohn. “Inhomogeneous Electron Gas”. In: Physical Review 136.3B (Nov. 1964), B864–B871. doi: 10.1103/PhysRev.136.B864. [93] A. D. Becke. “Density-functional exchange-energy approximation with cor- rect asymptotic behavior”. In:Physical ReviewA 38.6 (Sept. 1988), pp. 3098– 3100. doi: 10.1103/PhysRevA.38.3098. [94] Chengteh Lee, Weitao Yang, and Robert G. Parr. “Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density”. In: Physical Review B 37.2 (Jan. 1988), pp. 785–789. doi: 10.1103/ PhysRevB.37.785. [95] Axel D. Becke. “Density‐functional thermochemistry. III. The role of exact exchange”. In: The Journal of Chemical Physics 98.7 (Apr. 1993), pp. 5648– 5652. doi: 10.1063/1.464913. [96] Yan Zhao and Donald G. Truhlar. “The M06 suite of density functionals for main group thermochemistry, thermochemical kinetics, noncovalent in- teractions, excited states, and transition elements: two new functionals and systematic testing of four M06-class functionals and 12 other functionals”. en. In: Theoretical Chemistry Accounts 120.1 (May 2008), pp. 215–241. doi: 10.1007/s00214-007-0310-x. [97] Jeng-Da Chai and Martin Head-Gordon. “Long-range corrected hybrid density functionals with damped atom–atom dispersion corrections”. en. In: Physical Chemistry Chemical Physics 10.44 (Nov. 2008), pp. 6615–6620. doi: 10.1039/B810189B. [98] You-Sheng Lin et al. “Long-Range Corrected Hybrid Density Function- als with Improved Dispersion Corrections”. In: Journal of Chemical Theory and Computation 9.1 (Jan. 2013), pp. 263–272. doi: 10.1021/ct300715s. [99] Joseph W. Ochterski, G. A. Petersson, and J. A. Montgomery. “A complete basis set model chemistry. V. Extensions to six or more heavy atoms”. In: The Journal of Chemical Physics 104.7 (Feb. 1996), pp. 2598–2619. doi: 10. 1063/1.470985. [100] J. A. Montgomery et al. “A complete basis set model chemistry. VII. Use of the minimum population localization method”. In: The Journal of Chemical Physics 112.15 (Apr. 2000), pp. 6532–6542. doi: 10.1063/1.481224. [101] Larry A Curtiss, Paul C Redfern, and Krishnan Raghavachari. “Gaussian-4 theory using reduced order perturbation theory”. In: The Journal of Chem- ical Physics 127.12 (Sept. 2007), p. 124105. doi: 10.1063/1.2770701. 82 Bibliography [102] Stefan Grimme et al. “A consistent and accurateab initioparametrization of density functional dispersion correction (DFT-D) for the 94 elements H- Pu”. In: The Journal of Chemical Physics 132.15 (Apr. 2010). doi: 10.1063/ 1.3382344. [103] Stefan Grimme, Stephan Ehrlich, and Lars Goerigk. “Effect of the damp- ing function in dispersion corrected density functional theory”. In: Jour- nal of Computational Chemistry 32.7 (Mar. 2011), pp. 1456–1465. doi: 10. 1002/jcc.21759. [104] Paul K. Weiner and Peter A. Kollman. “AMBER: Assisted model building with energy refinement. A general program for modeling molecules and their interactions”. en. In: J. Comput. Chem. 2.3 (1981), pp. 287–303. doi: 10.1002/jcc.540020311. [105] Bernard R. Brooks et al. “CHARMM: A program for macromolecular en- ergy, minimization, and dynamics calculations”. en. In: J. Comput. Chem. 4.2 (1983), pp. 187–217. doi: 10.1002/jcc.540040211. [106] Ilana Y. Kanal, John A. Keith, and Geoffrey R. Hutchison. “A sobering assessment of small-molecule force field methods for low energy conformer predictions”. en. In: Int. J. Quantum Chem. 118.5 (2018), e25512. doi: 10. 1002/qua.25512. [107] James J. P. Stewart. “Optimization of parameters for semiempirical meth- ods VI: more modifications to the NDDO approximations and re-optimization of parameters”. eng. In: J. Mol. Model. 19.1 (Jan. 2013), pp. 1–32. doi: 10. 1007/s00894-012-1667-x. [108] James Stewart. MOPAC website. http://openmopac.net/PM7_and_PM6- D3H4_accuracy/AccuracyofPM7andPM6-D3H4.html. Aug. 2024. [109] Qin Yang and Julien Bloino. “An Effective and Automated Processing of Resonances in Vibrational Perturbation Theory Applied to Spectroscopy”. In: The Journal of Physical Chemistry A 126.49 (Dec. 2022), pp. 9276–9302. doi: 10.1021/acs.jpca.2c06460. [110] Julien Bloino. “Reliability and Resonances in Vibrational Perturbation The- ory”. eng. In: International Symposium on Molecular Spectroscopy, June 2023. doi: 10.15278/isms.2023.6911. [111] Julien Bloino. “Anharmonicity at Larger Scales: Vibrational Spectra of Chiral Organometallic Complexes”. eng. In: International Symposium on Molecular Spectroscopy, June 2023. doi: 10.15278/isms.2023.7032. 83 Bibliography [112] Marco Fusè et al. “Scaling-up VPT2: A feasible route to include anhar- monic correction on large molecules”. In: SpectrochimicaActaPartA:Molec- ular and Biomolecular Spectroscopy 311 (Apr. 2024), p. 123969. doi: 10. 1016/j.saa.2024.123969. [113] M. J. Frisch et al.Gaussian˜16 Revision C.01. Gaussian Inc. Wallingford CT. 2016. [114] Malgorzata Biczysko et al. “Harmonic and Anharmonic Vibrational Fre- quency Calculations with the Double-Hybrid B2PLYP Method: Analytic Second Derivatives and Benchmark Studies”. In: J. Chem. Theory Comput. 6.7 (July 2010), pp. 2115–2125. doi: 10.1021/ct100212p. [115] Martin Thomas et al. “Computing vibrational spectra from ab initio molec- ular dynamics”. en. In: Physical Chemistry Chemical Physics 15.18 (Apr. 2013), pp. 6608–6622. doi: 10.1039/C3CP44302G. [116] Marie-Pierre Gaigeot and Riccardo Spezia. “Theoretical methods for vi- brational spectroscopy and collision induced dissociation in the gas phase”. In:Gas-Phase IRSpectroscopy and Structure ofBiologicalMolecules. Springer, 2014, pp. 99–151. doi: 10.1007/128_2014_620. [117] R. P. Feynman. “Forces in Molecules”. In: Physical Review 56.4 (Aug. 1939), pp. 340–343. doi: 10.1103/PhysRev.56.340. [118] H Bernhard Schlegel et al. “Ab initio molecular dynamics: Propagating the density matrix with Gaussian orbitals”. In: J. Chem. Phys. 114.22 (June 2001), pp. 9758–9763. doi: 10.1063/1.1372182. [119] Srinivasan S Iyengar et al. “Ab initio molecular dynamics: Propagating the density matrix with Gaussian orbitals. II. Generalizations based on mass- weighting, idempotency, energy conservation and choice of initial condi- tions”. In: J. Chem. Phys. 115.22 (Dec. 2001), pp. 10291–10302. doi: 10 . 1063/1.1416876. [120] H. Bernhard Schlegel et al. “Ab initio molecular dynamics: Propagating the density matrix with Gaussian orbitals. III. Comparison with Born–Oppenheimer dynamics”. In: J. Chem. Phys. 117.19 (Oct. 2002), pp. 8694–8704. doi: 10. 1063/1.1514582. [121] Donald A. McQuarrie. Statistical mechanics. @Harper’s chemistry series. Erweiterte Neuaufl. von: McQuarrie : Statistical thermodynamics. New York [u.a.]: Harper & Row, 1976. 641 pp. [122] M. Toda. Statistical Physics II. Nonequilibrium StatisticalMechanics. Ed. by Ryogo Kubo et al. 2nd ed. Springer Series in Solid-State Sciences Ser. v.31. Berlin, Heidelberg: Springer Berlin / Heidelberg, 1998. 1296 pp. 84 Bibliography [123] Åke Andersson. “The Universal Blueshift in BOMD Simulations”. eng. In: Atoms, Molecules and Clusters in Motion conference, Apr. 2024. [124] Nguyen-Thi Van-Oanh et al. “Improving anharmonic infrared spectra us- ing semiclassically prepared molecular dynamics simulations”. In: Physical Chemistry Chemical Physics 14.7 (2012), pp. 2381–2390. doi: 10 . 1039 / c2cp23101h. [125] M. A. Suhm and F. Kollipost. “Femtisecond single-mole infrared spectroscopy of molecular clusters”. en. In: Phys. Chem. Chem. Phys. PCCP 15.26 (June 2013), pp. 10702–10721. doi: 10.1039/C3CP51515J. [126] O. N. Ulenikov et al. “Survey of the high resolution infrared spectrum of methane (12CH4 and 13CH4): Partial vibrational assignment extended towards 12 000 cm−1”. In: The Journal of Chemical Physics 141.23 (Dec. 2014), p. 234302. doi: 10.1063/1.4899263. [127] Aude Simon et al. “Vibrational spectroscopy and molecular dynamics of water monomers and dimers adsorbed on polycyclic aromatic hydrocar- bons”. en. In: Physical Chemistry Chemical Physics 14.19 (2012), pp. 6771– 6786. doi: 10.1039/C2CP40321H. [128] Aude Simon et al. “Water Clusters in an Argon Matrix: Infrared Spectra from Molecular Dynamics Simulations with a Self-Consistent Charge Den- sity Functional-Based Tight Binding/Force-Field Potential”. In: The Journal of Physical Chemistry A 119.11 (Mar. 2015), pp. 2449–2467. doi: 10.1021/ jp508533k. [129] Erin R. Johnson et al. “Revealing noncovalent interactions”. In: J. Am. Chem. Soc. 132.18 (2010), pp. 6498–6506. doi: 10.1021/ja100936w. [130] Jongcheol Seo et al. “Side-chain effects on the structures of protonated amino acid dimers: A gas-phase infrared spectroscopy study”. In: Int. J. Mass Spectrom. 429 (June 2018), pp. 115–120. doi: 10.1016/j.ijms.2017. 06.011. [131] James E. Huheey, Ellen A. Keiter, and Richard Keiter. Inorganic Chem- istry Principles of Structure and Reactivity. Harper Collins College Publish- ers, 1993, pp. 326–330. [132] Richard J. Plowright, Eric Gloaguen, and Michel Mons. “Compact Folding of Isolated Four-Residue Neutral Peptide Chains: H-Bonding Patterns and Entropy Effects”. en. In: ChemPhysChem 12.10 (2011), pp. 1889–1899. doi: 10.1002/cphc.201001023. [133] Yanjie Wei, Walter Nadler, and Ulrich H. E. Hansmann. “On the helix-coil transition in alanine based polypeptides in gas phase”. In: J. Chem. Phys. 126.20 (May 2007), p. 204307. doi: 10.1063/1.2734967. 85