Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani Department of Laboratory Medicine Institute of Biomedicine Sahlgrenska Academy, University of Gothenburg Gothenburg 2024 i Cover illustration: “The Bacterial Sepsis”. © Mahnaz Irani Shemirani 2024 To my mom, Iran, whose encouragement and guidance illuminated my path in academia; to my brother; Jamshid, for his unwavering moral and medical support; to my dad’s enduring spirit; Khosro, who taught me resilience during his brief time with me; and to my sister’s enduring spirit; Mahshid, who shared my dream of earning doctoral degree. Biomarker profiling in sepsis diagnostics © Mahnaz Irani Shemirani 2024 mahnaz.irani.shemirani@gu.se ISBN 978-91-8069-215-1 (PRINT) ISBN 978-91-8069-216-8 (PDF) NENMNMÄ VANE ÄRRK VA KEE Printed in Borås, Sweden 2024 Printed by Stema Specialtryck AB Trycksak3T0r4y1c k0s2a3k43041 0234 ii iii SS TT Cover illustration: “The Bacterial Sepsis”. © Mahnaz Irani Shemirani 2024 To my mom, Iran, whose encouragement and guidance illuminated my path in academia; to my brother; Jamshid, for his unwavering moral and medical support; to my dad’s enduring spirit; Khosro, who taught me resilience during his brief time with me; and to my sister’s enduring spirit; Mahshid, who shared my dream of earning doctoral degree. Biomarker profiling in sepsis diagnostics © Mahnaz Irani Shemirani 2024 mahnaz.irani.shemirani@gu.se ISBN 978-91-8069-215-1 (PRINT) ISBN 978-91-8069-216-8 (PDF) Printed in Borås, Sweden 2024 Printed by Stema Specialtryck AB ii iii ABSTRACT Effective and timely antibiotic therapy for sepsis requires a thorough understanding of the types and molecular characteristics of bacterial strains. Therefore, we investigated diagnostic strategies to facilitate faster classification of bacteria and identification of their molecular features. We benchmarked the 1928 Diagnostic platform (1928 Diagnostics, Gothenburg, Sweden) for characterizing Staphylococcus aureus (S. aureus) strains against an in-house bioinformatics (INH) pipeline and reference clinical laboratory methods, including MALDI-TOF MS and phenotypic antibiotic susceptibility testing. We observed a high agreement between the 1928 platform and the INH pipeline in predicting laboratory results. Notably, the 1928 platform exhibited a lower rate of false negative while showing slightly higher rates of false positive (Paper I). Additionally, our findings revealed that clindamycin, erythromycin, and fusidic acid exhibited efficacy against all methicillin resistance S. aureus strains, and vancomycin demonstrated susceptibility in all tested strains (Paper II). The challenge remains in predicting the bacterial type. Several studies highlighted the differences between blood markers of gram-positive and gram-negative bacterial sepsis. Using machine learning algorithms and Proximity Extension Assay (PEA), we discovered a set of informative proteins comprising 55 proteins, including 5 potential biomarkers, which distinguish patients with gram-positive or gram- negative bacteria from other cases, achieving AUCs of 0.66 and 0.69, respectively (Paper III). However, while the analysis of 55 proteins offered insights into classifying bacterial types, our method did not distinguish between specific bacterial strains. Employing a more comprehended approach utilizing whole blood microarray technology on septic patients infected with either S. aureus or Escherichia coli revealed 25 genes with high AUC values (0.98 and 0.96, respectively) that effectively distinguished these infections from other cases. These findings were consistent across two separate independent datasets, with AUC values ranging from 0.72 to 0.87 (Paper IV). In conclusion, efforts to improve diagnostic strategies and understand bacterial characteristics in sepsis continue. Platforms like 1928 Diagnostics and technologies such as the PEA show promise, with machine learning offering opportunities to tackle bacterial typing challenges. These advancements are crucial for evolving clinical practices in sepsis diagnosis and management. Keywords: pipeline, whole genome sequencing, machine learning, biomarker, proteomics, transcriptomics, gram-negative bacteria, gram-positive bacteria ISBN 978-91-8069-215-1 (PRINT) ISBN 978-91-8069-216-8 (PDF) iv v ABSTRACT Effective and timely antibiotic therapy for sepsis requires a thorough understanding of the types and molecular characteristics of bacterial strains. Therefore, we investigated diagnostic strategies to facilitate faster classification of bacteria and identification of their molecular features. We benchmarked the 1928 Diagnostic platform (1928 Diagnostics, Gothenburg, Sweden) for characterizing Staphylococcus aureus (S. aureus) strains against an in-house bioinformatics (INH) pipeline and reference clinical laboratory methods, including MALDI-TOF MS and phenotypic antibiotic susceptibility testing. We observed a high agreement between the 1928 platform and the INH pipeline in predicting laboratory results. Notably, the 1928 platform exhibited a lower rate of false negative while showing slightly higher rates of false positive (Paper I). Additionally, our findings revealed that clindamycin, erythromycin, and fusidic acid exhibited efficacy against all methicillin resistance S. aureus strains, and vancomycin demonstrated susceptibility in all tested strains (Paper II). The challenge remains in predicting the bacterial type. Several studies highlighted the differences between blood markers of gram-positive and gram-negative bacterial sepsis. Using machine learning algorithms and Proximity Extension Assay (PEA), we discovered a set of informative proteins comprising 55 proteins, including 5 potential biomarkers, which distinguish patients with gram-positive or gram- negative bacteria from other cases, achieving AUCs of 0.66 and 0.69, respectively (Paper III). However, while the analysis of 55 proteins offered insights into classifying bacterial types, our method did not distinguish between specific bacterial strains. Employing a more comprehended approach utilizing whole blood microarray technology on septic patients infected with either S. aureus or Escherichia coli revealed 25 genes with high AUC values (0.98 and 0.96, respectively) that effectively distinguished these infections from other cases. These findings were consistent across two separate independent datasets, with AUC values ranging from 0.72 to 0.87 (Paper IV). In conclusion, efforts to improve diagnostic strategies and understand bacterial characteristics in sepsis continue. Platforms like 1928 Diagnostics and technologies such as the PEA show promise, with machine learning offering opportunities to tackle bacterial typing challenges. These advancements are crucial for evolving clinical practices in sepsis diagnosis and management. Keywords: pipeline, whole genome sequencing, machine learning, biomarker, proteomics, transcriptomics, gram-negative bacteria, gram-positive bacteria ISBN 978-91-8069-215-1 (PRINT) ISBN 978-91-8069-216-8 (PDF) iv v SAMMANFATTNING PÅ SVENSKA Effektiv och snabb antibiotikabehandling för sepsis kräver en grundlig förståelse av typerna och de molekylära egenskaperna hos bakteriestammar. Dock saknar konventionella diagnostiska metoder ofta den hastighet som är nödvändig för att noggrant fastställa dessa egenskaper. Därför har vi undersökt diagnostiska strategier för att underlätta den snabba klassificeringen av bakterier och belysa deras molekylära egenskaper. Vi jämförde den kommersiella molnbaserade plattformen, 1928-plattformen (1928 Diagnostics, Göteborg, Sverige) för karakterisering av Staphylococcus aureus (S. aureus) - stam mot en intern bioinformatik (INH) pipeline samt till referensmetoder i det kliniska laboratoriet; MALDI-TOF MS och fenotypisk antibiotikakänslighetstestning (AST). Vår analys visade en hög överensstämmelse mellan 1928-plattformen och INH-pipelinen i förutsägelsen av MALDI-TOF MS och AST-resultat. Noterbart var att 1928-plattformen hade en lägre frekvens av felaktig identifiering av resistenta fenotyper, medan den visade något högre frekvenser av felaktig identifiering av känsliga fenotyper (Studie I). Dessutom visade våra fynd att klindamycin, erytromycin och fusidinsyra uppvisade effekt mot alla meticillinresistens S. aureus- stammar, och vankomycin visade känslighet i alla testade stammar (Studie II). Utmaningen kvarstår i att förutsäga bakterietypen. Flera studier visade på skillnaderna mellan blodmarkörer för gram-positiv och gram-negativ bakteriell sepsis. Med hjälp av maskininlärningsalgoritmer och Proximity Extension Assay (PEA) upptäckte vi en panel av informativa proteiner som omfattar 55 proteiner, inklusive 5 potentiella biomarkörer, som skiljer patienter med gram-positiva eller gram-negativa bakterier från andra fall och uppnår AUC-värden på 0,66 och 0,69 respektive (Studie III). Men medan analysen av 55 proteiner erbjöd insikter i att klassificera bakterietyper, skilde vår metod inte mellan specifika bakteriestammar. Genom att använda en mer omfattande metod som involverar helblodsmikroarrayteknik på septiska patienter infekterade med antingen S. aureus eller Escherichia coli identifierades 25 gener med höga AUC-värden (0.98 och 0.96, respektive) som effektivt särskilde dessa infektioner från andra fall. Dessa fynd var konsekventa över två separata, oberoende dataset, med AUC-värden som sträckte sig från 0,72 till 0,87 (Studie IV). Sammanfattningsvis fortsätter ansträngningarna att förbättra diagnostiska strategier och förstå bakteriella egenskaper vid sepsis. Plattformar som 1928 Diagnostics och teknologier som PEA visar lovande, med maskininlärning som erbjuder möjligheter att ta itu med bakterietypningsutmaningar. Dessa framsteg är avgörande för att utveckla klinisk praxis för diagnostik och behandling av sepsis. vi vii SAMMANFATTNING PÅ SVENSKA Effektiv och snabb antibiotikabehandling för sepsis kräver en grundlig förståelse av typerna och de molekylära egenskaperna hos bakteriestammar. Dock saknar konventionella diagnostiska metoder ofta den hastighet som är nödvändig för att noggrant fastställa dessa egenskaper. Därför har vi undersökt diagnostiska strategier för att underlätta den snabba klassificeringen av bakterier och belysa deras molekylära egenskaper. Vi jämförde den kommersiella molnbaserade plattformen, 1928-plattformen (1928 Diagnostics, Göteborg, Sverige) för karakterisering av Staphylococcus aureus (S. aureus) - stam mot en intern bioinformatik (INH) pipeline samt till referensmetoder i det kliniska laboratoriet; MALDI-TOF MS och fenotypisk antibiotikakänslighetstestning (AST). Vår analys visade en hög överensstämmelse mellan 1928-plattformen och INH-pipelinen i förutsägelsen av MALDI-TOF MS och AST-resultat. Noterbart var att 1928-plattformen hade en lägre frekvens av felaktig identifiering av resistenta fenotyper, medan den visade något högre frekvenser av felaktig identifiering av känsliga fenotyper (Studie I). Dessutom visade våra fynd att klindamycin, erytromycin och fusidinsyra uppvisade effekt mot alla meticillinresistens S. aureus- stammar, och vankomycin visade känslighet i alla testade stammar (Studie II). Utmaningen kvarstår i att förutsäga bakterietypen. Flera studier visade på skillnaderna mellan blodmarkörer för gram-positiv och gram-negativ bakteriell sepsis. Med hjälp av maskininlärningsalgoritmer och Proximity Extension Assay (PEA) upptäckte vi en panel av informativa proteiner som omfattar 55 proteiner, inklusive 5 potentiella biomarkörer, som skiljer patienter med gram-positiva eller gram-negativa bakterier från andra fall och uppnår AUC-värden på 0,66 och 0,69 respektive (Studie III). Men medan analysen av 55 proteiner erbjöd insikter i att klassificera bakterietyper, skilde vår metod inte mellan specifika bakteriestammar. Genom att använda en mer omfattande metod som involverar helblodsmikroarrayteknik på septiska patienter infekterade med antingen S. aureus eller Escherichia coli identifierades 25 gener med höga AUC-värden (0.98 och 0.96, respektive) som effektivt särskilde dessa infektioner från andra fall. Dessa fynd var konsekventa över två separata, oberoende dataset, med AUC-värden som sträckte sig från 0,72 till 0,87 (Studie IV). Sammanfattningsvis fortsätter ansträngningarna att förbättra diagnostiska strategier och förstå bakteriella egenskaper vid sepsis. Plattformar som 1928 Diagnostics och teknologier som PEA visar lovande, med maskininlärning som erbjuder möjligheter att ta itu med bakterietypningsutmaningar. Dessa framsteg är avgörande för att utveckla klinisk praxis för diagnostik och behandling av sepsis. vi vii LIST OF PAPERS ADDITIONAL PAPERS This thesis is based on the following studies, referred to in the text by their I. Irani Shemirani, M. Biomarkers approach in the diagnosis and roman numerals. prognosis of sepsis. Int. J. Public Health Res 2022, 12 (2). I. Shemirani, M.I., Tilevik, D., Tilevik, A., Jurcevic, S., Arnellos, D., Enroth, H., Pernestig, A.K. Benchmarking of two bioinformatic workflows for the analysis of whole- genome sequenced Staphylococcus aureus collected from patients with suspected sepsis. BMC infect dis 2023, 23(1), 39. DOI:10.1186/s12879-022-07977-0 II. Irani Shemirani, M. Ljungström, L. Epidemiology and antibiotic resistance patterns of Staphylococcus aureus strains in suspected sepsis patients in Skaraborg. (Submitted) III. Irani Shemirani, M., Pernestig, A.K., Björkman, J., Tilevik, D., von Mentzer, A., Ejdebäck, M., Ståhlberg, A. Identification of protein biomarkers to differentiate between gram-negative and gram-positive infections in adults suspected to sepsis. (Under Review) IV. Irani Shemirani, M. Transcriptional markers classifying Escherichia coli and Staphylococcus aureus induced sepsis in adults: a data-driven approach. PLOS ONE 2024, 19(7), DOI: 10.1371/journal.pone.0305920 viii ix LIST OF PAPERS ADDITIONAL PAPERS This thesis is based on the following studies, referred to in the text by their I. Irani Shemirani, M. Biomarkers approach in the diagnosis and roman numerals. prognosis of sepsis. Int. J. Public Health Res 2022, 12 (2). I. Shemirani, M.I., Tilevik, D., Tilevik, A., Jurcevic, S., Arnellos, D., Enroth, H., Pernestig, A.K. Benchmarking of two bioinformatic workflows for the analysis of whole- genome sequenced Staphylococcus aureus collected from patients with suspected sepsis. BMC infect dis 2023, 23(1), 39. DOI:10.1186/s12879-022-07977-0 II. Irani Shemirani, M. Ljungström, L. Epidemiology and antibiotic resistance patterns of Staphylococcus aureus strains in suspected sepsis patients in Skaraborg. (Submitted) III. Irani Shemirani, M., Pernestig, A.K., Björkman, J., Tilevik, D., von Mentzer, A., Ejdebäck, M., Ståhlberg, A. Identification of protein biomarkers to differentiate between gram-negative and gram-positive infections in adults suspected to sepsis. (Under Review) IV. Irani Shemirani, M. Transcriptional markers classifying Escherichia coli and Staphylococcus aureus induced sepsis in adults: a data-driven approach. PLOS ONE 2024, 19(7), DOI: 10.1371/journal.pone.0305920 viii ix CONTENT ABBREVIATIONS ............................................................................................ XII 1 INTRODUCTION ......................................................................................... 14 1.1 Definition ............................................................................................ 15 1.2 Epidemiology ...................................................................................... 17 1.3 Etiology ............................................................................................... 19 1.4 Pathogenesis ........................................................................................ 21 1.5 Diagnosis ............................................................................................. 23 2 AIM ........................................................................................................... 29 3 MATERIALS AND METHODS ..................................................................... 30 3.1 Subjects ............................................................................................... 31 3.2 Methods ............................................................................................... 33 3.3 Statistical analysis ............................................................................... 40 3.4 Ethical consideration ........................................................................... 41 4 RESULTS AND DISCUSSION ....................................................................... 42 4.1 Paper I- Benchmarking of two bioinformatics workflows .................. 43 4.2 Paper II- Epidemiology and antibiotic resistance pattern ................... 48 4.3 Paper III- Identifying a possible protein biomarker panel .................. 50 4.4 Paper IV- Transcriptomic markers ...................................................... 54 5 CONCLUSION ............................................................................................ 58 6 FUTURE PERSPECTIVES ............................................................................. 59 ACKNOWLEDGEMENT .................................................................................... 60 REFERENCES .................................................................................................. 62 x xi CONTENT ABBREVIATIONS ............................................................................................ XII 1 INTRODUCTION ......................................................................................... 14 1.1 Definition ............................................................................................ 15 1.2 Epidemiology ...................................................................................... 17 1.3 Etiology ............................................................................................... 19 1.4 Pathogenesis ........................................................................................ 21 1.5 Diagnosis ............................................................................................. 23 2 AIM ........................................................................................................... 29 3 MATERIALS AND METHODS ..................................................................... 30 3.1 Subjects ............................................................................................... 31 3.2 Methods ............................................................................................... 33 3.3 Statistical analysis ............................................................................... 40 3.4 Ethical consideration ........................................................................... 41 4 RESULTS AND DISCUSSION ....................................................................... 42 4.1 Paper I- Benchmarking of two bioinformatics workflows .................. 43 4.2 Paper II- Epidemiology and antibiotic resistance pattern ................... 48 4.3 Paper III- Identifying a possible protein biomarker panel .................. 50 4.4 Paper IV- Transcriptomic markers ...................................................... 54 5 CONCLUSION ............................................................................................ 58 6 FUTURE PERSPECTIVES ............................................................................. 59 ACKNOWLEDGEMENT .................................................................................... 60 REFERENCES .................................................................................................. 62 x xi ABBREVIATIONS MCODE Molecular Complex Detection ME Major error ADA Adenosine Deaminase MFAP5 Microfibrillar-Associated Protein 5 AST Antibiotic susceptibility test MLST Multilocus sequence typing AUC-ROC Area Under the Receiver Operating Characteristic MNAR Missing Not At Random Curve MRSA Methicillin-resistance Staphylococcus aureus CD8A T-Cell Surface Glycoprotein CD8 Alpha Chain MSE Mean square error CFHR5 Complement Factor H Related 5 NGS Next-generation sequencing CM Cardiometabolic NPX Normalized protein expression values CNDP1 Carnosine Dipeptidase 1 PAMPs Pathogen-associated molecular patterns CRP C-reactive protein PANTHER Protein ANalysis THrough Evolutionary Relationships CSF-1 Colony Stimulating Factor 1 PCA Principal Component Analysis CVD II Cardiovascular II PCC Pearson correlation coefficient DAMPs Damage-associated molecular patterns PCR Polymerase chain reaction DIC Disseminated Intravascular Coagulation PCT Procalcitonin E.coli Escherichia coli PEA Proximity Extension Assay EHR Electronic health record PPI Protein-protein interaction et(A,B) Exfoliative toxins (A, B) PRR Pathogen recognition receptors EUCAST European Committee on Antimicrobial Susceptibility PVL Panton-Valentine Leucocidin Testing qSOFA Quick Sequential Organ Failure Assessment GDF2 Growth Differentiation Factor 2 RF Random forest ICD International Classification of Disease RFE Recursive Feature Elimination ICU Intensive Care Unit RLRs RIG-I-like receptors IFN-γ Interferon gamma SAA4 Serum Amyloid A4 IHME Institute for Health Metrics and Evaluation S. argenteus Staphylococcus argenteus IL Interleukin S. aureus Staphylococcus aureus Inf Inflammation S. epidermidis Staphylococcus epidermidis INH In-house pipeline SIRS Systematic Inflammatory Response Syndrome IR Immune Response SOFA Sequential Organ Failure Assessment LASSO Least Absolute Shrinkage and Selection Operator TLRs Toll-like receptors LBP Lipopolysaccharide binding protein TNF Tumor Necrotizing Factor LOD Limit of detection TNFRSF TNF Receptor Superfamily Member LPS Lipopolysaccharide tSNE t-distributed stochastic neighbor embedding LR Logistic regression tsst1 Toxic shock syndrometoxin-1 LTA Lipoteichoic acid VME Very major error MALDI-TOF MS Matrix-Assisted Laser Desorption/Ionization Time-of- Flight Mass Spectrometry WGS Whole genome sequencing MBL2 Mannose Binding Lectin 2 WHO World Health Organization xii xiii ABBREVIATIONS MCODE Molecular Complex Detection ME Major error ADA Adenosine Deaminase MFAP5 Microfibrillar-Associated Protein 5 AST Antibiotic susceptibility test MLST Multilocus sequence typing AUC-ROC Area Under the Receiver Operating Characteristic MNAR Missing Not At Random Curve MRSA Methicillin-resistance Staphylococcus aureus CD8A T-Cell Surface Glycoprotein CD8 Alpha Chain MSE Mean square error CFHR5 Complement Factor H Related 5 NGS Next-generation sequencing CM Cardiometabolic NPX Normalized protein expression values CNDP1 Carnosine Dipeptidase 1 PAMPs Pathogen-associated molecular patterns CRP C-reactive protein PANTHER Protein ANalysis THrough Evolutionary Relationships CSF-1 Colony Stimulating Factor 1 PCA Principal Component Analysis CVD II Cardiovascular II PCC Pearson correlation coefficient DAMPs Damage-associated molecular patterns PCR Polymerase chain reaction DIC Disseminated Intravascular Coagulation PCT Procalcitonin E.coli Escherichia coli PEA Proximity Extension Assay EHR Electronic health record PPI Protein-protein interaction et(A,B) Exfoliative toxins (A, B) PRR Pathogen recognition receptors EUCAST European Committee on Antimicrobial Susceptibility PVL Panton-Valentine Leucocidin Testing qSOFA Quick Sequential Organ Failure Assessment GDF2 Growth Differentiation Factor 2 RF Random forest ICD International Classification of Disease RFE Recursive Feature Elimination ICU Intensive Care Unit RLRs RIG-I-like receptors IFN-γ Interferon gamma SAA4 Serum Amyloid A4 IHME Institute for Health Metrics and Evaluation S. argenteus Staphylococcus argenteus IL Interleukin S. aureus Staphylococcus aureus Inf Inflammation S. epidermidis Staphylococcus epidermidis INH In-house pipeline SIRS Systematic Inflammatory Response Syndrome IR Immune Response SOFA Sequential Organ Failure Assessment LASSO Least Absolute Shrinkage and Selection Operator TLRs Toll-like receptors LBP Lipopolysaccharide binding protein TNF Tumor Necrotizing Factor LOD Limit of detection TNFRSF TNF Receptor Superfamily Member LPS Lipopolysaccharide tSNE t-distributed stochastic neighbor embedding LR Logistic regression tsst1 Toxic shock syndrometoxin-1 LTA Lipoteichoic acid VME Very major error MALDI-TOF MS Matrix-Assisted Laser Desorption/Ionization Time-of- Flight Mass Spectrometry WGS Whole genome sequencing MBL2 Mannose Binding Lectin 2 WHO World Health Organization xii xiii Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 1 INTRODUCTION 1.1 DEFINITION The word sepsis, σηψις, originated from the Greek word sepsin with the meaning “decomposition” or “decay”. The first documented use of the word sepsis is found in Homer’s poems 2700 years ago as a derivative form of the word sepo, σηπω, meaning “I rot”. For centuries, the term has been used by Hippocrates, Aristotle, Galen, and Plutarch as a clinical description of systematic inflammation (1). Van Leeuwenhoek first observed living bacteria in 1674 (2). During the 19th and 20th centuries, sepsis was described by “Germ theory” in which pathogenic microorganisms invade the bloodstream in such a way that it causes the onset of systemic infection symptoms (3). At this time the germ theory prevailed, despite the fact that a number of scientists, including Sir William Osler, declared the patient’s body response to the infection as the cause of death rather than the infection itself (4). Death of patients with sepsis, despite antibiotic treatment and pathogen eradication, along with experimental tests, highlighted the importance of the host’s immune response to sepsis manifestations in the 20th century. In 1991, during a conference of the American College of Chest Physicians and the Society of Critical Care Medicine (ACCP/SCCM), the first consensus definition of sepsis (sepsis-1) was established with the aim of improving clinical diagnosis and standardizing research protocols. ACCP/SCCM introduced criteria for systematic inflammatory response syndrome (SIRS) and defined sepsis as the presence of at least two of the SIRS criteria as a result of infection (Figure 1) (5). In sepsis-1 definition, a degree of clinical stages for sepsis was taken into consideration. Severe sepsis was defined by accompanying organ dysfunction, hypoperfusion, or hypotension with sepsis, and septic shock was clinically defined by hypotension resistance to fluid and vasopressor therapy (5). Figure 1. Stages of sepsis according to Sepsis-1 definition. Sepsis progresses from a systemic inflammatory response to infection (sepsis), to severe sepsis (sepsis with organ dysfunction), and finally to septic shock (severe sepsis with persistent hypotension despite fluid resuscitation. The clinical picture of SIRS considered in this definition is nonspecific and manifests in many conditions. Conversely, inflammation is a generic response to any stimuli from minor trauma to autoimmune disease. Due to the 14 15 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 1 INTRODUCTION 1.1 DEFINITION The word sepsis, σηψις, originated from the Greek word sepsin with the meaning “decomposition” or “decay”. The first documented use of the word sepsis is found in Homer’s poems 2700 years ago as a derivative form of the word sepo, σηπω, meaning “I rot”. For centuries, the term has been used by Hippocrates, Aristotle, Galen, and Plutarch as a clinical description of systematic inflammation (1). Van Leeuwenhoek first observed living bacteria in 1674 (2). During the 19th and 20th centuries, sepsis was described by “Germ theory” in which pathogenic microorganisms invade the bloodstream in such a way that it causes the onset of systemic infection symptoms (3). At this time the germ theory prevailed, despite the fact that a number of scientists, including Sir William Osler, declared the patient’s body response to the infection as the cause of death rather than the infection itself (4). Death of patients with sepsis, despite antibiotic treatment and pathogen eradication, along with experimental tests, highlighted the importance of the host’s immune response to sepsis manifestations in the 20th century. In 1991, during a conference of the American College of Chest Physicians and the Society of Critical Care Medicine (ACCP/SCCM), the first consensus definition of sepsis (sepsis-1) was established with the aim of improving clinical diagnosis and standardizing research protocols. ACCP/SCCM introduced criteria for systematic inflammatory response syndrome (SIRS) and defined sepsis as the presence of at least two of the SIRS criteria as a result of infection (Figure 1) (5). In sepsis-1 definition, a degree of clinical stages for sepsis was taken into consideration. Severe sepsis was defined by accompanying organ dysfunction, hypoperfusion, or hypotension with sepsis, and septic shock was clinically defined by hypotension resistance to fluid and vasopressor therapy (5). Figure 1. Stages of sepsis according to Sepsis-1 definition. Sepsis progresses from a systemic inflammatory response to infection (sepsis), to severe sepsis (sepsis with organ dysfunction), and finally to septic shock (severe sepsis with persistent hypotension despite fluid resuscitation. The clinical picture of SIRS considered in this definition is nonspecific and manifests in many conditions. Conversely, inflammation is a generic response to any stimuli from minor trauma to autoimmune disease. Due to the 14 15 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani limitations, the list of diagnostics criteria was expanded in 2001 during the 1.2 EPIDEMIOLOGY second consensus conference (sepsis-2), incorporating organ dysfunction criteria for diagnosing of sepsis, while the definition of severe sepsis remained unchanged. This caused confusion among researchers and clinicians when distinguishing between ‘sepsis’ under the new criteria and ‘severe sepsis’ Sepsis is a critical global health issue, marked by significant morbidity and under the old criteria (5). mortality. Historically, sepsis has been linked to major epidemics, such as the Yellow Fever outbreak in Philadelphia (1793), the Ebola epidemic in West With deepened knowledge of the pathophysiology of sepsis, the need to change Africa (2013-2016), and the COVID-19 pandemic (2019-2022). The definition the definition of sepsis was recognized. In 2016, a new sepsis definition and diagnostic criteria for sepsis have evolved over time (see Chapter 1.1, (sepsis-3) was proposed in which sepsis was defined as a “life-threatening leading to variations in reporting and estimation across different countries, organ dysfunction caused by a dysregulated host response to infection” and which complicates accurate epidemiological assessments. sepsis shock as a “subset of sepsis in which underline circulatory, cellular and metabolic abnormalities are profound enough to substantially increase A study by the Institute for Health Metrics and Evaluation (IHME) in 2020 mortality” (6). Thereby, sepsis and severe sepsis were used interchangeably to estimated 49 million global cases of sepsis annually, with 11 million sepsis- resolve the confusion in the old definition, and organ dysfunction must be related deaths based on death certificates. This study, covering 195 countries included in the clinical diagnosis of sepsis. and 282 causes of death, reported a 37% decrease in global sepsis incidence and a 31% reduction in sepsis-related deaths from 1990 to 2017 (9). However, In Sweden, a working group comprised of representatives from the Swedish these estimates, based primarily on death certificates, may not fully capture Society of Infectious Diseases and the Swedish Society of Intensive Care sepsis or related organ dysfunction (9). Medicine published a consensus document on the definition and criteria for severe sepsis and septic shock in 2011. With few exceptions, the consensus In most studies, sepsis epidemiology is studied using administrative hospital was drafted using the definitions and criteria from Sepsis-1 (1991) and Sepsis- discharge data, identified through International Classification of Diseases 2 (2001). The key distinction was in the definition of severe sepsis. Severe (ICD) codes 9 and 10. Fleischmann-Struzek et al. (2020) recently found that sepsis was characterized in this definition as a proven infection with organ ICD-based estimates of hospital-treated sepsis were approximately 50% lower dysfunction (7). This definition was used to classify patients with severe sepsis than those from IHME (10). They reported a global incidence of 189 cases per until 2016 when the new definition by the third international consensus 100,000 person-years and a 26.7% mortality rate for hospital-treated sepsis, (Sepsis-3) integrated severe sepsis within the definition of sepsis. A national with ICU-treated sepsis having an incidence of 58 per 100,000 person-years consensus group was formed in the autumn of 2016 on behalf of the Swedish and a mortality rate of 41.9% (10). Despite its global standardization by the Association of Infectious Disease Physicians (SILF), the Swedish Society of World Health Organization (WHO), ICD estimates can be skewed by Acute Care (SWESEM), the Swedish Society of Anesthesiology and Intensive variations in ICD revisions and local modifications (9), as well as factors like Care (SFAI), and the Swedish Intensive Care Register (SIR), as well as a diagnosis accuracy, infection misclassification, documentation quality, and representative from the National Institute of Health and Welfare, department reimbursement incentives (11). of disease classification. This group’s mission was to examine Sepsis-3 and decide how it ought to be applied in Swedish healthcare. The consensus group The growing use of electronic health record (EHR) systems allows for the recommended that the definitions of sepsis and septic shock proposed in investigation sepsis epidemiology using clinical criteria instead of ICD data. Sepsis-3 should replace the previous Swedish definitions. It was also Rhee et al. (2017) analyzed EHR records of over 2.9 million adults hospitalized recommended that the new international criteria be used when diagnosing and across 409 US hospitals between 2009 and 2014, finding a 6% incidence of classifying sepsis and septic shock (8). sepsis (12). Their EHR-based approach demonstrated a sensitivity of 70% and a comparable positive predictive value of 70.4% to ICD-based approach. While EHR data showed constant sepsis prevalence and mortality rate, ICD data indicated an annual increase in prevalence (+10.3% [95% CI, 7.2% to 13.3%], p < .001) and a decrease in mortality (−7.0% [95% CI, −8.8% to 16 17 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani limitations, the list of diagnostics criteria was expanded in 2001 during the 1.2 EPIDEMIOLOGY second consensus conference (sepsis-2), incorporating organ dysfunction criteria for diagnosing of sepsis, while the definition of severe sepsis remained unchanged. This caused confusion among researchers and clinicians when distinguishing between ‘sepsis’ under the new criteria and ‘severe sepsis’ Sepsis is a critical global health issue, marked by significant morbidity and under the old criteria (5). mortality. Historically, sepsis has been linked to major epidemics, such as the Yellow Fever outbreak in Philadelphia (1793), the Ebola epidemic in West With deepened knowledge of the pathophysiology of sepsis, the need to change Africa (2013-2016), and the COVID-19 pandemic (2019-2022). The definition the definition of sepsis was recognized. In 2016, a new sepsis definition and diagnostic criteria for sepsis have evolved over time (see Chapter 1.1, (sepsis-3) was proposed in which sepsis was defined as a “life-threatening leading to variations in reporting and estimation across different countries, organ dysfunction caused by a dysregulated host response to infection” and which complicates accurate epidemiological assessments. sepsis shock as a “subset of sepsis in which underline circulatory, cellular and metabolic abnormalities are profound enough to substantially increase A study by the Institute for Health Metrics and Evaluation (IHME) in 2020 mortality” (6). Thereby, sepsis and severe sepsis were used interchangeably to estimated 49 million global cases of sepsis annually, with 11 million sepsis- resolve the confusion in the old definition, and organ dysfunction must be related deaths based on death certificates. This study, covering 195 countries included in the clinical diagnosis of sepsis. and 282 causes of death, reported a 37% decrease in global sepsis incidence and a 31% reduction in sepsis-related deaths from 1990 to 2017 (9). However, In Sweden, a working group comprised of representatives from the Swedish these estimates, based primarily on death certificates, may not fully capture Society of Infectious Diseases and the Swedish Society of Intensive Care sepsis or related organ dysfunction (9). Medicine published a consensus document on the definition and criteria for severe sepsis and septic shock in 2011. With few exceptions, the consensus In most studies, sepsis epidemiology is studied using administrative hospital was drafted using the definitions and criteria from Sepsis-1 (1991) and Sepsis- discharge data, identified through International Classification of Diseases 2 (2001). The key distinction was in the definition of severe sepsis. Severe (ICD) codes 9 and 10. Fleischmann-Struzek et al. (2020) recently found that sepsis was characterized in this definition as a proven infection with organ ICD-based estimates of hospital-treated sepsis were approximately 50% lower dysfunction (7). This definition was used to classify patients with severe sepsis than those from IHME (10). They reported a global incidence of 189 cases per until 2016 when the new definition by the third international consensus 100,000 person-years and a 26.7% mortality rate for hospital-treated sepsis, (Sepsis-3) integrated severe sepsis within the definition of sepsis. A national with ICU-treated sepsis having an incidence of 58 per 100,000 person-years consensus group was formed in the autumn of 2016 on behalf of the Swedish and a mortality rate of 41.9% (10). Despite its global standardization by the Association of Infectious Disease Physicians (SILF), the Swedish Society of World Health Organization (WHO), ICD estimates can be skewed by Acute Care (SWESEM), the Swedish Society of Anesthesiology and Intensive variations in ICD revisions and local modifications (9), as well as factors like Care (SFAI), and the Swedish Intensive Care Register (SIR), as well as a diagnosis accuracy, infection misclassification, documentation quality, and representative from the National Institute of Health and Welfare, department reimbursement incentives (11). of disease classification. This group’s mission was to examine Sepsis-3 and decide how it ought to be applied in Swedish healthcare. The consensus group The growing use of electronic health record (EHR) systems allows for the recommended that the definitions of sepsis and septic shock proposed in investigation sepsis epidemiology using clinical criteria instead of ICD data. Sepsis-3 should replace the previous Swedish definitions. It was also Rhee et al. (2017) analyzed EHR records of over 2.9 million adults hospitalized recommended that the new international criteria be used when diagnosing and across 409 US hospitals between 2009 and 2014, finding a 6% incidence of classifying sepsis and septic shock (8). sepsis (12). Their EHR-based approach demonstrated a sensitivity of 70% and a comparable positive predictive value of 70.4% to ICD-based approach. While EHR data showed constant sepsis prevalence and mortality rate, ICD data indicated an annual increase in prevalence (+10.3% [95% CI, 7.2% to 13.3%], p < .001) and a decrease in mortality (−7.0% [95% CI, −8.8% to 16 17 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani −5.2%], p < .001). This suggests that ICD-based estimates may be affected by 1.3 ETIOLOGY clinical awareness and coding practices, while EHR data offers a more objective measure of sepsis. In Sweden, in 2020, a multicenter study reported an incidence rate of 81 ICU- Humans are always faced with the threat of pathogenic microorganisms. Our treated sepsis cases per 100,000 persons and an in-hospital mortality rate of survival depends on the innate and adaptive immune systems. The body’s 26%, based on sepsis-3 criteria (13). The study highlighted a significant primary defenses against infection, including the skin, enzymes, and mucus, discrepancy between clinical data and ICD discharge codes, with only one- can be compromised, allowing microorganisms to invade and potentially cause third of sepsis patients coded for the condition upon ICU discharge (13). This sepsis (14). discrepancy again underscores the influence of documentation and coding quality on the accuracy of ICD-based sepsis estimates. Gram-negative and gram-positive bacterial infections are the primary causes of sepsis; however, viruses, parasites, and fungi also contribute significantly to sepsis, especially among immunocompromised patients and those with other co-morbidities (15, 16). Currently, gram-negative bacteria constitute 62.2% of patients with positive blood cultures and gram-positive bacteria establish 46.8% of sepsis cases (15). Commonly implicated gram-negative bacteria include Haemophilus influenzae, Escherichia coli (E. coli), Salmonella spp., and Neisseria meningitides. Among gram-positive bacteria causing sepsis, the most common contributors are Streptococcus pneumoniae, and Staphylococcus aureus (S. aureus), especially methicillin-resistance Staphylococcus aureus (MRSA) (15, 16). Pneumonia is the most common infectious disease among patients who die from sepsis as an immediate cause of death, followed by intra-abdominal infections and intravascular infections (17). However, urinary tract, skin, bone, and brain infections (such as meningitis) can also develop into sepsis. These infections are often localized and controlled by the host immune system. Sepsis often progresses when the host is unable to suppress the primary infection due to factors such as a high bacterial load, the presence of virulence factors, or defects in the immune system. The virulence mechanisms of bacteria vary among different species and strains, significantly impacting the progression and severity of sepsis (18). For instance, in gram-negative bacteria like E. coli endotoxins such as lipopolysaccharides (LPS) from the bacterial cell wall can trigger intense inflammatory responses that contribute to sepsis (19, 20) (Figure 2). In contrast, gram-positive bacteria like S. aureus produce exotoxins, such as toxic shock syndrome toxin (TSST), which can also play a role in sepsis by amplifying the inflammatory response and causing tissue damage (21, 22). 18 19 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani −5.2%], p < .001). This suggests that ICD-based estimates may be affected by 1.3 ETIOLOGY clinical awareness and coding practices, while EHR data offers a more objective measure of sepsis. In Sweden, in 2020, a multicenter study reported an incidence rate of 81 ICU- Humans are always faced with the threat of pathogenic microorganisms. Our treated sepsis cases per 100,000 persons and an in-hospital mortality rate of survival depends on the innate and adaptive immune systems. The body’s 26%, based on sepsis-3 criteria (13). The study highlighted a significant primary defenses against infection, including the skin, enzymes, and mucus, discrepancy between clinical data and ICD discharge codes, with only one- can be compromised, allowing microorganisms to invade and potentially cause third of sepsis patients coded for the condition upon ICU discharge (13). This sepsis (14). discrepancy again underscores the influence of documentation and coding quality on the accuracy of ICD-based sepsis estimates. Gram-negative and gram-positive bacterial infections are the primary causes of sepsis; however, viruses, parasites, and fungi also contribute significantly to sepsis, especially among immunocompromised patients and those with other co-morbidities (15, 16). Currently, gram-negative bacteria constitute 62.2% of patients with positive blood cultures and gram-positive bacteria establish 46.8% of sepsis cases (15). Commonly implicated gram-negative bacteria include Haemophilus influenzae, Escherichia coli (E. coli), Salmonella spp., and Neisseria meningitides. Among gram-positive bacteria causing sepsis, the most common contributors are Streptococcus pneumoniae, and Staphylococcus aureus (S. aureus), especially methicillin-resistance Staphylococcus aureus (MRSA) (15, 16). Pneumonia is the most common infectious disease among patients who die from sepsis as an immediate cause of death, followed by intra-abdominal infections and intravascular infections (17). However, urinary tract, skin, bone, and brain infections (such as meningitis) can also develop into sepsis. These infections are often localized and controlled by the host immune system. Sepsis often progresses when the host is unable to suppress the primary infection due to factors such as a high bacterial load, the presence of virulence factors, or defects in the immune system. The virulence mechanisms of bacteria vary among different species and strains, significantly impacting the progression and severity of sepsis (18). For instance, in gram-negative bacteria like E. coli endotoxins such as lipopolysaccharides (LPS) from the bacterial cell wall can trigger intense inflammatory responses that contribute to sepsis (19, 20) (Figure 2). In contrast, gram-positive bacteria like S. aureus produce exotoxins, such as toxic shock syndrome toxin (TSST), which can also play a role in sepsis by amplifying the inflammatory response and causing tissue damage (21, 22). 18 19 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 1.4 PATHOGENESIS Dysregulation of complex processes, including the innate and adaptive immune response, complement activation, the coagulation cascade, and the endothelial vascular system, contributes to the development of sepsis (24, 25). In gram-negative bacterial infection, the pathogen is recognized by the innate Figure 2. Comparison of gram-negative and gram-positive bacterial cell walls. The cell wall of immune system through the interaction of pathogen recognition receptors gram-negative bacteria possesses a thin peptidoglycan layer, along with an additional outer (PRR), with exogenous pathogen-associated molecular patterns (PAMPs), layer made up of lipopolysaccharides. In contrast, gram-positive bacteria feature a thick and endogenous damage-associated molecular patterns (DAMPs) (26, 27). peptidoglycan layer. Adapted from Atanasova KR (23). Toll-like receptors (TLRs), C-type lectin receptors (CLRs), RIG-I-like receptors (RLRs), NOD-like receptors (NLRs), and AIM2-like Receptors (ALRs) are the five types of PRRs that have been discovered thus far (26, 27). Some people are at higher risk of sepsis. Sepsis is more common among the elderly population and infants less than three months old (15). Diabetes, LPS, a component of bacterial cell walls also known as an endotoxin, is the cardiovascular diseases, steroid treatment, organ transplantation, cancer, most frequent factor of gram-negative bacterial sepsis. This unique gram- and chronic obstructive pulmonary disease (COPD) also increase negative bacteria compound is released by bacterial lysis and consists of three susceptibility to bacterial infections that may develop into sepsis. Here, a components. The outer domain is known as the O-antigen, and its chain compromised immune system is the main factor in the progression of sepsis composition varies from strain to strain, resulting in different antigen effects. (15). The middle layer is known as the “core”, and it is a less diverse oligosaccharide domain. The third layer is a conserved hydrophobic inner domain known as lipid A (or endotoxin) (28). Lipid A is a prime example of a PAMP and an immune system inducer. Bacterial endotoxin binds to a lipopolysaccharide binding protein (LBP) resulting in activation of macrophages and initiating coagulation cascade (28, 29). Similarly, lipoteichoic acid (LTA) released in gram-positive infections affects macrophage function and results in the production of mediators. Both toxins stimulate macrophage CD14, TLR4, and TLR2 receptors, resulting in the release of cytokine mediators (28) which are necessary for immune reactions that may develop into sepsis and septic shock (Figure 3). Gram-positive bacteria with exotoxin release can occasionally cause sepsis and septic shock as well. Superantigens are the most common exotoxin produced by gram-positive bacteria. The superantigens activate T-lymphocytes and trigger the coagulation cascade, which results in the production of Interferon gamma (IFN-γ) and Interleukin-2 (IL-2). Both IFN-γ and IL-2 stimulate macrophages to produce IL-1 and Tumor Necrosis Factor alpha (TNF-α). 20 21 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 1.4 PATHOGENESIS Dysregulation of complex processes, including the innate and adaptive immune response, complement activation, the coagulation cascade, and the endothelial vascular system, contributes to the development of sepsis (24, 25). In gram-negative bacterial infection, the pathogen is recognized by the innate Figure 2. Comparison of gram-negative and gram-positive bacterial cell walls. The cell wall of immune system through the interaction of pathogen recognition receptors gram-negative bacteria possesses a thin peptidoglycan layer, along with an additional outer (PRR), with exogenous pathogen-associated molecular patterns (PAMPs), layer made up of lipopolysaccharides. In contrast, gram-positive bacteria feature a thick and endogenous damage-associated molecular patterns (DAMPs) (26, 27). peptidoglycan layer. Adapted from Atanasova KR (23). Toll-like receptors (TLRs), C-type lectin receptors (CLRs), RIG-I-like receptors (RLRs), NOD-like receptors (NLRs), and AIM2-like Receptors (ALRs) are the five types of PRRs that have been discovered thus far (26, 27). Some people are at higher risk of sepsis. Sepsis is more common among the elderly population and infants less than three months old (15). Diabetes, LPS, a component of bacterial cell walls also known as an endotoxin, is the cardiovascular diseases, steroid treatment, organ transplantation, cancer, most frequent factor of gram-negative bacterial sepsis. This unique gram- and chronic obstructive pulmonary disease (COPD) also increase negative bacteria compound is released by bacterial lysis and consists of three susceptibility to bacterial infections that may develop into sepsis. Here, a components. The outer domain is known as the O-antigen, and its chain compromised immune system is the main factor in the progression of sepsis composition varies from strain to strain, resulting in different antigen effects. (15). The middle layer is known as the “core”, and it is a less diverse oligosaccharide domain. The third layer is a conserved hydrophobic inner domain known as lipid A (or endotoxin) (28). Lipid A is a prime example of a PAMP and an immune system inducer. Bacterial endotoxin binds to a lipopolysaccharide binding protein (LBP) resulting in activation of macrophages and initiating coagulation cascade (28, 29). Similarly, lipoteichoic acid (LTA) released in gram-positive infections affects macrophage function and results in the production of mediators. Both toxins stimulate macrophage CD14, TLR4, and TLR2 receptors, resulting in the release of cytokine mediators (28) which are necessary for immune reactions that may develop into sepsis and septic shock (Figure 3). Gram-positive bacteria with exotoxin release can occasionally cause sepsis and septic shock as well. Superantigens are the most common exotoxin produced by gram-positive bacteria. The superantigens activate T-lymphocytes and trigger the coagulation cascade, which results in the production of Interferon gamma (IFN-γ) and Interleukin-2 (IL-2). Both IFN-γ and IL-2 stimulate macrophages to produce IL-1 and Tumor Necrosis Factor alpha (TNF-α). 20 21 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani These mediators play a crucial role in triggering the body’s inflammatory 1.5 DIAGNOSIS response to infections (3, 28-30). Sepsis is challenging to diagnose because the symptoms and signs may overlap with those of other conditions. To accurately diagnose an underlying infection, clinicians often recommend a series of tests. The diagnosis of sepsis relies on evaluating clinical symptoms, analyzing blood biomarkers, reviewing microbiological results, and conducting imaging studies. Figure 3. Activation of coagulation cascade by Lipopolysaccharide (LPS). LPS stimulates TLR4 and CD14 receptors on macrophages either by direct binding to the receptors or, more commonly, by being transferred to the receptors via LPS-binding protein in the serum. Adopted from Raetz CR, Whitfield C (31). In practice, it is the complex interaction of pro-inflammatory and anti- inflammatory mediators that results in sepsis. TNF-α is the main pro- inflammatory mediator responsible for the onset of sepsis (25, 28, 32). Pro- inflammatory cytokines also stimulate neutrophils, platelets, lymphocytes, liver cells, endothelial cells, and macrophages, which can result in tissue damage, vascular dilation, and lung dysfunction (3). The activation of the complement system, which can result in the production of anaphylatoxins and a severe inflammatory response, causes other signs and symptoms of sepsis such as increased levels of C-reactive protein (CRP), inhibition of fibrinolysis, and, eventually, Disseminated Intravascular Coagulation (DIC). DIC frequently occurs as a result of infection by gram-negative bacteria and may lead to homeostasis imbalance and organ dysfunction (26, 33). Figure 4. Overview of sepsis diagnosis. Combination of clinical symptoms, blood biomarkers, microbiological findings, and imaging assessment use to identify the source of infection and assess organ involvement. 22 23 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani These mediators play a crucial role in triggering the body’s inflammatory 1.5 DIAGNOSIS response to infections (3, 28-30). Sepsis is challenging to diagnose because the symptoms and signs may overlap with those of other conditions. To accurately diagnose an underlying infection, clinicians often recommend a series of tests. The diagnosis of sepsis relies on evaluating clinical symptoms, analyzing blood biomarkers, reviewing microbiological results, and conducting imaging studies. Figure 3. Activation of coagulation cascade by Lipopolysaccharide (LPS). LPS stimulates TLR4 and CD14 receptors on macrophages either by direct binding to the receptors or, more commonly, by being transferred to the receptors via LPS-binding protein in the serum. Adopted from Raetz CR, Whitfield C (31). In practice, it is the complex interaction of pro-inflammatory and anti- inflammatory mediators that results in sepsis. TNF-α is the main pro- inflammatory mediator responsible for the onset of sepsis (25, 28, 32). Pro- inflammatory cytokines also stimulate neutrophils, platelets, lymphocytes, liver cells, endothelial cells, and macrophages, which can result in tissue damage, vascular dilation, and lung dysfunction (3). The activation of the complement system, which can result in the production of anaphylatoxins and a severe inflammatory response, causes other signs and symptoms of sepsis such as increased levels of C-reactive protein (CRP), inhibition of fibrinolysis, and, eventually, Disseminated Intravascular Coagulation (DIC). DIC frequently occurs as a result of infection by gram-negative bacteria and may lead to homeostasis imbalance and organ dysfunction (26, 33). Figure 4. Overview of sepsis diagnosis. Combination of clinical symptoms, blood biomarkers, microbiological findings, and imaging assessment use to identify the source of infection and assess organ involvement. 22 23 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 1.5.1 CLINICAL DIAGNOSIS 1.5.2 BACTERIA IDENTIFICATION Physiological scoring systems, such as the SOFA score, are used to assess the Microbiological diagnosis is a complex process involving two main extent of a patient’s organ function or dysfunction (Table 1). Developed in the methodologies for detecting and identifying bacteria in patients: culture- early 1990s, the SOFA score has been implemented in ICU monitoring for dependent methods and culture-independent methods. critically ill patients and has become essential with the adoption of the new sepsis definition in 2016 for sepsis diagnosis. According to the Sepsis-3 Culture-dependent methods criteria, an increase in the SOFA score by 2 or more points indicates organ Blood culture is still the most common method for confirming bacterial dysfunction, and when combined with a suspected or documented infection, it infection in clinical practice, though it detects bacteremia only in about 50% defines sepsis (6). of patients who clinically suffer from sepsis (34). The positivity rate will be lowered with the administration of antibiotics before drawing a blood sample Table 1. Sequential Organ Failure Assessment (SOFA) scoring system* (35). Rapid diagnostic technology such as Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry (MALDI-TOF MS), Indicator/Score 0 1 2 3 4 can assists by identifying pathogens directly from positive blood cultures. PaO2/FIO2,mm ≥400 <400 <300 <200 with <100 with MALDI-TOF MS can determine the type of pathogen within 2-6 hours, Hg respiratory respiratory significantly reducing the time needed to identify the appropriate antibiotic support support therapy and thereby improving patient outcomes (36). Platelets, ≥150 <150 <100 <50 <20 x103/μL Bilirubin, mg/dl <1.2 1.2-1.9 2.0-5.9 6.0-11.9 >12.0 Culture-independent methods Due to the challenges associated with blood culture methods—such as their Hypertension MAP MAP <70 Dopamine Dopamine Dopamine low sensitivity, long turnaround times, and risk of contamination—there has ≥70 mm Hg <5 or 5.1-15 or >15 or been a push towards alternative, culture-independent techniques (37). mm dobutamin epinephrine epinephrine Polymerase chain reaction (PCR) has been used to diagnose bloodstream Hg e (any ≤0.1 or >0.1 or a infections directly from blood samples (38). However, PCR lacks the dose) norepinephrin norepinephrin e ≤0.1 a e >0.1 a capability to identify bacterial pathogenicity. GCS score b 15 13-14 10-12 6-9 <6 Recently, nucleic acid sequencing technologies have gained prominence as Creatinine, <1.2 1.2-1.9 2.0-3.4 3.5-4.9 >5.0 they address the limitations of both traditional blood cultures and PCR mg/dL methods. Sequencing technologies, such as next-generation sequencing Urine output, <500 <200 (NGS), provide a comprehensive view of the microbial community by mL/d analyzing the entire DNA or RNA present in a sample. This allows for the *Adapted from Singer et al. (6) PaO2, partial pressure of oxygen, FIO2, the fraction of inspired oxygen; MAP, mean arterial identification of a broad range of pathogens and the detection of antibiotic pressure resistance genes with high precision. Moreover, sequencing can reveal detailed a Catecholamine doses are given as μg/kg body weight/min for at least 1 hour. genetic information about pathogens, including their virulence factors and b GCS (Glasgow Coma Scale) scores range from 3-15; a higher score indicates better resistance profiles (39-42). This advancement promises not only faster and neurological function. more accurate diagnoses but also deeper insights into the underlying mechanisms of infection. As sequencing technology continues to evolve, its integration into clinical workflows could significantly enhance the management of sepsis and other infections by providing real-time data and supporting more targeted treatment strategies. 24 25 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 1.5.1 CLINICAL DIAGNOSIS 1.5.2 BACTERIA IDENTIFICATION Physiological scoring systems, such as the SOFA score, are used to assess the Microbiological diagnosis is a complex process involving two main extent of a patient’s organ function or dysfunction (Table 1). Developed in the methodologies for detecting and identifying bacteria in patients: culture- early 1990s, the SOFA score has been implemented in ICU monitoring for dependent methods and culture-independent methods. critically ill patients and has become essential with the adoption of the new sepsis definition in 2016 for sepsis diagnosis. According to the Sepsis-3 Culture-dependent methods criteria, an increase in the SOFA score by 2 or more points indicates organ Blood culture is still the most common method for confirming bacterial dysfunction, and when combined with a suspected or documented infection, it infection in clinical practice, though it detects bacteremia only in about 50% defines sepsis (6). of patients who clinically suffer from sepsis (34). The positivity rate will be lowered with the administration of antibiotics before drawing a blood sample Table 1. Sequential Organ Failure Assessment (SOFA) scoring system* (35). Rapid diagnostic technology such as Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry (MALDI-TOF MS), Indicator/Score 0 1 2 3 4 can assists by identifying pathogens directly from positive blood cultures. PaO2/FIO2,mm ≥400 <400 <300 <200 with <100 with MALDI-TOF MS can determine the type of pathogen within 2-6 hours, Hg respiratory respiratory significantly reducing the time needed to identify the appropriate antibiotic support support therapy and thereby improving patient outcomes (36). Platelets, ≥150 <150 <100 <50 <20 x103/μL Bilirubin, mg/dl <1.2 1.2-1.9 2.0-5.9 6.0-11.9 >12.0 Culture-independent methods Due to the challenges associated with blood culture methods—such as their Hypertension MAP MAP <70 Dopamine Dopamine Dopamine low sensitivity, long turnaround times, and risk of contamination—there has ≥70 mm Hg <5 or 5.1-15 or >15 or been a push towards alternative, culture-independent techniques (37). mm dobutamin epinephrine epinephrine Polymerase chain reaction (PCR) has been used to diagnose bloodstream Hg e (any ≤0.1 or >0.1 or a infections directly from blood samples (38). However, PCR lacks the dose) norepinephrin norepinephrin e ≤0.1 a e >0.1 a capability to identify bacterial pathogenicity. GCS score b 15 13-14 10-12 6-9 <6 Recently, nucleic acid sequencing technologies have gained prominence as Creatinine, <1.2 1.2-1.9 2.0-3.4 3.5-4.9 >5.0 they address the limitations of both traditional blood cultures and PCR mg/dL methods. Sequencing technologies, such as next-generation sequencing Urine output, <500 <200 (NGS), provide a comprehensive view of the microbial community by mL/d analyzing the entire DNA or RNA present in a sample. This allows for the *Adapted from Singer et al. (6) PaO2, partial pressure of oxygen, FIO2, the fraction of inspired oxygen; MAP, mean arterial identification of a broad range of pathogens and the detection of antibiotic pressure resistance genes with high precision. Moreover, sequencing can reveal detailed a Catecholamine doses are given as μg/kg body weight/min for at least 1 hour. genetic information about pathogens, including their virulence factors and b GCS (Glasgow Coma Scale) scores range from 3-15; a higher score indicates better resistance profiles (39-42). This advancement promises not only faster and neurological function. more accurate diagnoses but also deeper insights into the underlying mechanisms of infection. As sequencing technology continues to evolve, its integration into clinical workflows could significantly enhance the management of sepsis and other infections by providing real-time data and supporting more targeted treatment strategies. 24 25 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 1.5.3 BLOOD BIOMARKERS For years, researchers have examined blood levels of immune system proteins to identify sepsis. Despite numerous studies, sepsis-specific biomarkers remain elusive. Over 250 biomarkers have been proposed for sepsis diagnosis, but A biomarker is defined as “a characteristic that is measured as an indicator of only a few are used in clinical practice, and they often suffer from low natural biological processes, pathogenic processes, or response to exposure or sensitivity and specificity (26, 44). Nevertheless, these biomarkers continue to intervention” (43). Biomarkers have a wide range of potential applications, be valuable for predicting sepsis risk, stratifying patients, evaluating treatment including disease diagnosis, prognosis, and treatment monitoring, as well as in efficacy, and assessing prognosis (26, 45, 46). Below, some of the most research and drug development (see Table 2 for more details). researched protein biomarkers for sepsis are described. Table 2. Application of biomarkers* Acute phase reactants Peripheral blood contains acute phase reactants like CRP and Procalcitonin Diagnostic To identify a person who has a particular disease. (PCT), both of which rise rapidly in septic patients. These markers are useful Monitoring To examine serially the status of a disease or medical indicators of inflammation and infection and are often used in conjunction with condition for signs of exposure to a medical product or other variables to diagnose sepsis and assess the effectiveness of treatment (26, environmental agent, or a biological agent’s or medical 45-47). product’s effects. Cytokines Pharmacodynamic To demonstrate how the body responds to a medication Cytokines are signaling molecules released by various cell types, including or environmental factor. immune cells like monocytes, macrophages, and lymphocytes, as well as Predictive To predict whether an individual or a group of endothelial cells, fibroblasts, and stromal cells (26). Among the most individuals is more likely to experience a favorable or investigated pro-inflammatory cytokines are TNF-α and IL-6. Both are unfavorable condition. associated with organ damage and mortality, which helps in predicting patient Prognostic To determine a patient’s risk of experiencing a clinical outcomes (26, 46). TNF-α has a very short half-life of about 17 minutes, event, a disease recurrence, or a disease progression in whereas IL-6 persists in the bloodstream for several hours (typically 24 to 48 relation to a certain illness or condition. hours). This stability makes IL-6 more practical for assessing inflammation Safety To determine the toxicity of a medical intervention as and predicting prognosis (26, 45, 46). IL-1 is another key pro-inflammatory an adverse event. cytokine that triggers the body’s inflammatory response during infections. IL- Susceptibility/risk To identify a person who does not already have a 1β, the most extensively studied form of IL-1, is activated by caspase-1 (48). clinically obvious disease or medical condition but has Elevated levels of IL-1β, reaching 1.22 pg/ml, can indicate a high risk of death the risk of getting the disease or medical condition in within 48 hours in septic patients (49). Furthermore, Interleukin-27 (IL-27) is the future. an immunosuppressive cytokine that is often found at elevated levels in septic *Definitions adopted from Califf (43) patients (50, 51). IL-27 is considered a reliable biomarker for sepsis, showing high specificity and sensitivity for bacterial infections (52). Sepsis leads to alterations in the expression and function of various endogenous molecules, which often reflect the level of immune activation or Cell-surface biomarkers suppression. These changes can be monitored to observe shifts in the immune Various cell surface markers are used to assess the activation status of response over time. However, because many of these molecules may only neutrophils, monocytes, and T-cells. CD64 is one of the most extensively temporarily reflect the immunological status, their interpretation must be done studied myeloid markers, reflecting the early stages and prognosis of diseases, with caution. While they may have potential as biomarkers, the heterogeneity and it has shown considerable effectiveness in the early detection of sepsis in in immune responses has posed significant challenges for research. newborns (46). Other surface proteins, such as CD14 and LBP, also increase during infection. LBP interacts with CD14 on the surface of monocytes and macrophages, leading to the release of a soluble form of CD14. Soluble CD14 26 27 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 1.5.3 BLOOD BIOMARKERS For years, researchers have examined blood levels of immune system proteins to identify sepsis. Despite numerous studies, sepsis-specific biomarkers remain elusive. Over 250 biomarkers have been proposed for sepsis diagnosis, but A biomarker is defined as “a characteristic that is measured as an indicator of only a few are used in clinical practice, and they often suffer from low natural biological processes, pathogenic processes, or response to exposure or sensitivity and specificity (26, 44). Nevertheless, these biomarkers continue to intervention” (43). Biomarkers have a wide range of potential applications, be valuable for predicting sepsis risk, stratifying patients, evaluating treatment including disease diagnosis, prognosis, and treatment monitoring, as well as in efficacy, and assessing prognosis (26, 45, 46). Below, some of the most research and drug development (see Table 2 for more details). researched protein biomarkers for sepsis are described. Table 2. Application of biomarkers* Acute phase reactants Peripheral blood contains acute phase reactants like CRP and Procalcitonin Diagnostic To identify a person who has a particular disease. (PCT), both of which rise rapidly in septic patients. These markers are useful Monitoring To examine serially the status of a disease or medical indicators of inflammation and infection and are often used in conjunction with condition for signs of exposure to a medical product or other variables to diagnose sepsis and assess the effectiveness of treatment (26, environmental agent, or a biological agent’s or medical 45-47). product’s effects. Cytokines Pharmacodynamic To demonstrate how the body responds to a medication Cytokines are signaling molecules released by various cell types, including or environmental factor. immune cells like monocytes, macrophages, and lymphocytes, as well as Predictive To predict whether an individual or a group of endothelial cells, fibroblasts, and stromal cells (26). Among the most individuals is more likely to experience a favorable or investigated pro-inflammatory cytokines are TNF-α and IL-6. Both are unfavorable condition. associated with organ damage and mortality, which helps in predicting patient Prognostic To determine a patient’s risk of experiencing a clinical outcomes (26, 46). TNF-α has a very short half-life of about 17 minutes, event, a disease recurrence, or a disease progression in whereas IL-6 persists in the bloodstream for several hours (typically 24 to 48 relation to a certain illness or condition. hours). This stability makes IL-6 more practical for assessing inflammation Safety To determine the toxicity of a medical intervention as and predicting prognosis (26, 45, 46). IL-1 is another key pro-inflammatory an adverse event. cytokine that triggers the body’s inflammatory response during infections. IL- Susceptibility/risk To identify a person who does not already have a 1β, the most extensively studied form of IL-1, is activated by caspase-1 (48). clinically obvious disease or medical condition but has Elevated levels of IL-1β, reaching 1.22 pg/ml, can indicate a high risk of death the risk of getting the disease or medical condition in within 48 hours in septic patients (49). Furthermore, Interleukin-27 (IL-27) is the future. an immunosuppressive cytokine that is often found at elevated levels in septic *Definitions adopted from Califf (43) patients (50, 51). IL-27 is considered a reliable biomarker for sepsis, showing high specificity and sensitivity for bacterial infections (52). Sepsis leads to alterations in the expression and function of various endogenous molecules, which often reflect the level of immune activation or Cell-surface biomarkers suppression. These changes can be monitored to observe shifts in the immune Various cell surface markers are used to assess the activation status of response over time. However, because many of these molecules may only neutrophils, monocytes, and T-cells. CD64 is one of the most extensively temporarily reflect the immunological status, their interpretation must be done studied myeloid markers, reflecting the early stages and prognosis of diseases, with caution. While they may have potential as biomarkers, the heterogeneity and it has shown considerable effectiveness in the early detection of sepsis in in immune responses has posed significant challenges for research. newborns (46). Other surface proteins, such as CD14 and LBP, also increase during infection. LBP interacts with CD14 on the surface of monocytes and macrophages, leading to the release of a soluble form of CD14. Soluble CD14 26 27 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani (sCD14) is emerging as a highly promising biomarker for monocytes, as its 2 AIM levels rise earlier than IL-6 and PCT. Consequently, sCD14 is considered a key predictive biomarker for diagnosing and prognosticating sepsis (26, 46). Other biomarkers The overall aim of this project is to advance the understanding and diagnosis Currently, lactate is the most used biomarker for assessing organ dysfunction capabilities related to sepsis by investigating the molecular characteristics of in sepsis. Elevated lactate levels occur in hypoxic conditions due to increased pathogens and the host biological responses. Specifically, our aims are: glycolysis and reduced tissue oxygenation. This elevation can also indicate impaired lactate clearance by the liver and correlates with an increased risk of mortality in patients with hospital-acquired sepsis. Serial monitoring of lactate levels can aid in predicting mortality rates and assist in risk classification. I. To benchmark a cloud-based diagnostic software for S. aureus Whole Many hospitals use a lactate threshold of greater than 2 mmol/L (or >18 Genome Sequencing (WGS) data against conventional methods mg/dL) in the absence of hypovolemia as a criterion to screen for septic shock currently used in clinical laboratories (6). In addition to lactate, emerging research is exploring other biomarkers for sepsis diagnosis and prognosis. Some studies have found that miRNA and II. To investigate the antibiotic susceptibility pattern and epidemiology plasma cell-free DNA can diagnose sepsis and predict mortality in septic of S. aureus infection in the Skaraborg region, a western region in patients more effectively compared to healthy controls (26, 53, 54). Sweden Identifying precise biomarkers for diagnosing sepsis that can be used in routine III. To identify protein markers that can discriminate between bacterial clinical practice remains challenging. It is unlikely that a single biomarker will infections caused by gram-positive and gram-negative bacteria be sufficient for assessing a patient’s immunological status comprehensively. Therefore, given the limitations of individual markers, combining multiple IV. To determine transcriptomic markers that distinguish between E. coli biomarkers may boost both sensitivity and specificity for detecting sepsis. and S. aureus-induced sepsis In recent years, large-scale molecular analysis techniques have enabled the simultaneous study of multiple biomarkers. Advances in these techniques, such as high-throughput proteomics and transcriptomics, facilitate the comprehensive analysis of biological samples, specifically focusing on proteins and genes. Machine learning methods are being increasingly utilized to extract key predictive biomarkers from these complex datasets, which often involve a limited number of samples but a vast array of molecules to analyze (55, 56). By applying machine learning to proteomic and transcriptomic data, researchers can uncover patterns and correlations that are not readily visible through traditional methods. Integrating robust multi-biomarker profiles with machine learning techniques can lead to more accurate differentiation of disease stages and more precise identification of specific causes of illness in patients. This approach promises to enhance diagnostic and prognostic capabilities by providing deeper insights into the underlying biological processes. 28 29 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani (sCD14) is emerging as a highly promising biomarker for monocytes, as its 2 AIM levels rise earlier than IL-6 and PCT. Consequently, sCD14 is considered a key predictive biomarker for diagnosing and prognosticating sepsis (26, 46). Other biomarkers The overall aim of this project is to advance the understanding and diagnosis Currently, lactate is the most used biomarker for assessing organ dysfunction capabilities related to sepsis by investigating the molecular characteristics of in sepsis. Elevated lactate levels occur in hypoxic conditions due to increased pathogens and the host biological responses. Specifically, our aims are: glycolysis and reduced tissue oxygenation. This elevation can also indicate impaired lactate clearance by the liver and correlates with an increased risk of mortality in patients with hospital-acquired sepsis. Serial monitoring of lactate levels can aid in predicting mortality rates and assist in risk classification. I. To benchmark a cloud-based diagnostic software for S. aureus Whole Many hospitals use a lactate threshold of greater than 2 mmol/L (or >18 Genome Sequencing (WGS) data against conventional methods mg/dL) in the absence of hypovolemia as a criterion to screen for septic shock currently used in clinical laboratories (6). In addition to lactate, emerging research is exploring other biomarkers for sepsis diagnosis and prognosis. Some studies have found that miRNA and II. To investigate the antibiotic susceptibility pattern and epidemiology plasma cell-free DNA can diagnose sepsis and predict mortality in septic of S. aureus infection in the Skaraborg region, a western region in patients more effectively compared to healthy controls (26, 53, 54). Sweden Identifying precise biomarkers for diagnosing sepsis that can be used in routine III. To identify protein markers that can discriminate between bacterial clinical practice remains challenging. It is unlikely that a single biomarker will infections caused by gram-positive and gram-negative bacteria be sufficient for assessing a patient’s immunological status comprehensively. Therefore, given the limitations of individual markers, combining multiple IV. To determine transcriptomic markers that distinguish between E. coli biomarkers may boost both sensitivity and specificity for detecting sepsis. and S. aureus-induced sepsis In recent years, large-scale molecular analysis techniques have enabled the simultaneous study of multiple biomarkers. Advances in these techniques, such as high-throughput proteomics and transcriptomics, facilitate the comprehensive analysis of biological samples, specifically focusing on proteins and genes. Machine learning methods are being increasingly utilized to extract key predictive biomarkers from these complex datasets, which often involve a limited number of samples but a vast array of molecules to analyze (55, 56). By applying machine learning to proteomic and transcriptomic data, researchers can uncover patterns and correlations that are not readily visible through traditional methods. Integrating robust multi-biomarker profiles with machine learning techniques can lead to more accurate differentiation of disease stages and more precise identification of specific causes of illness in patients. This approach promises to enhance diagnostic and prognostic capabilities by providing deeper insights into the underlying biological processes. 28 29 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 3 MATERIALS AND METHODS 3.1 SUBJECTS Papers I, II, and III are part of a prospective observational study of community- onset severe sepsis and septic shock in Swedish adults conducted from September 2011 to June 2012 at Skaraborg Hospital, a secondary hospital with 640 beds, in the western region of Sweden (Skaraborg sepsis study). All patients ≥18 years admitted to the emergency department for suspicion of community-onset sepsis who have given their written consent were enrolled in the study (57). Prior to the administration of antibiotic therapy, signs and symptoms, and clinical and laboratory data were recorded and collected. Two senior infectious disease specialists retrospectively reviewed all medical data to assess if the patients met Sepsis-3 criteria. Bacterial infection was verified by either identification of relevant bacteria by culture, or as typical clinical symptoms, such as erysipelas. Bacterial isolates obtained from various patient culture samples were utilized for Papers I and II. A total of 272 S. aureus isolates from 212 patients were defrosted after being identified by culturing and the MALDI-TOF MS (DB- 4110) technique, then cryopreserved at -80°C. Five isolates were irretrievable after freezing. In Paper I, 267 isolates were processed for DNA extraction and whole genome sequencing. After quality control of the raw data, the output files for three isolates were removed from the dataset. The study is based on the output files for the remaining 264 S. aureus isolates with high-quality sequence data. Genetic analysis for classification of the S. aureus isolates revealed that two of them had been misclassified and were not S. aureus. Therefore, in Paper II, we adjusted the data accordingly and examined laboratory records related to 262 S. aureus strains isolated from 212 patients aged 18 to 97 years. In Paper III, blood samples were obtained from 291 patients with confirmed bacterial infections, including 184 with gram-negative strains and 107 with gram-positive strains, as well as from 40 healthy controls. The blood was drawn into sodium citrate tubes, centrifuged, and the plasma was stored at -80 °C until analysis using proximity extension assay technology. Fifty samples failed quality control after protein quantification, resulting in a final count of 246 patient samples and 35 samples from healthy controls. Of the 246 patient samples, 154 contained gram-negative bacteria and 92 contained gram-positive 30 31 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 3 MATERIALS AND METHODS 3.1 SUBJECTS Papers I, II, and III are part of a prospective observational study of community- onset severe sepsis and septic shock in Swedish adults conducted from September 2011 to June 2012 at Skaraborg Hospital, a secondary hospital with 640 beds, in the western region of Sweden (Skaraborg sepsis study). All patients ≥18 years admitted to the emergency department for suspicion of community-onset sepsis who have given their written consent were enrolled in the study (57). Prior to the administration of antibiotic therapy, signs and symptoms, and clinical and laboratory data were recorded and collected. Two senior infectious disease specialists retrospectively reviewed all medical data to assess if the patients met Sepsis-3 criteria. Bacterial infection was verified by either identification of relevant bacteria by culture, or as typical clinical symptoms, such as erysipelas. Bacterial isolates obtained from various patient culture samples were utilized for Papers I and II. A total of 272 S. aureus isolates from 212 patients were defrosted after being identified by culturing and the MALDI-TOF MS (DB- 4110) technique, then cryopreserved at -80°C. Five isolates were irretrievable after freezing. In Paper I, 267 isolates were processed for DNA extraction and whole genome sequencing. After quality control of the raw data, the output files for three isolates were removed from the dataset. The study is based on the output files for the remaining 264 S. aureus isolates with high-quality sequence data. Genetic analysis for classification of the S. aureus isolates revealed that two of them had been misclassified and were not S. aureus. Therefore, in Paper II, we adjusted the data accordingly and examined laboratory records related to 262 S. aureus strains isolated from 212 patients aged 18 to 97 years. In Paper III, blood samples were obtained from 291 patients with confirmed bacterial infections, including 184 with gram-negative strains and 107 with gram-positive strains, as well as from 40 healthy controls. The blood was drawn into sodium citrate tubes, centrifuged, and the plasma was stored at -80 °C until analysis using proximity extension assay technology. Fifty samples failed quality control after protein quantification, resulting in a final count of 246 patient samples and 35 samples from healthy controls. Of the 246 patient samples, 154 contained gram-negative bacteria and 92 contained gram-positive 30 31 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani bacteria. The proximity extension assay technology quantified a total of 368 3.2 METHODS proteins, including nine duplicate proteins and one triplicate protein due to overlapping panels. In Paper IV, microarray data with accession number GSE33341 was retrieved Reference microbial species identification (Paper I) from the GEO database (www.ncbi.nlm.nih.gov/geo/) using specific keywords Microbiological culturing and species determination for the paper were like ‘sepsis’, ‘S. aureus’, ‘E. coli’ and ‘array’ for further analysis and performed via MALDI-TOF MS, as previously described by Ljungström et al. investigation. This data was part of a prospective observational study (59), using a Microflex LT mass spectrometer (Bruker Daltonics, Leipzig, sponsored by the NIH, conducted between December 2005 and July 2010. The Germany) and BioTyper software v2.0 with default parameter settings. The NIH study aimed to advance the development of innovative diagnostic tests for cut-off value for species identification was set above a score of 2.0. The study severe sepsis and community-acquired pneumonia, involving four medical utilized the Bruker microorganism database MBT Compass Library DB-4110 centers in the USA (ClinicalTrials.gov NCT00258869) (58). (Bruker Daltonics, Germany), released in April 2011. The study included adult patients admitted to the emergency department with Reference antibiotic susceptibility (Paper I, Paper II) sepsis. The subjects in this report had confirmed cases of monomicrobial Antibiotic susceptibility test (AST) was performed by disk diffusion on bloodstream infection caused by either E. coli (n=19, age range 25–91) or S. Mueller Hinton media at clinical laboratory Unilabs at Skaraborg Hospital, aureus (n=32, age range 24–91). Additionally, there were 43 uninfected Skövde, Sweden according to European Committee on Antimicrobial controls (age range 21–59). Whole blood samples were collected either on the Susceptibility Testing (EUCAST) guidelines (www.eucast.org). AST findings day of the hospital presentation or within 24 hours before the initiation of obtained from identified S. aureus are referred to as phenotypic AST in study treatment. The microarray technique was employed for gene analysis in the I. The AST results for isoxazolyl penicillin and cefoxitin resistance were study, revealing a total of 22,277 genes. followed by mecA detection by PCR to confirm the isolate as a methicillin- resistant S. aureus (MRSA). The AST phenotypic results described in Paper I are restricted to the antibiotics available on the 1928 platform. In Paper II, AST was limited to ciprofloxacin, clindamycin, erythromycin, isoxazolyl penicillin, penicillin G, penicillin V, piperacillin, fusidic acid, and vancomycin. Whole genome sequencing (Paper I) Strains of S. aureus were cultured on typical blood agar, and DNAs were extracted using the automatic MagNa Pure 96 instrument DNA and Viral NA Small Volume kit with the Pathogen Universal 200 protocol (Roche Diagnostics, Switzerland) at Unilabs, Skövde. The DNA concentration was determined using the Qubit dsDNA HS assay kit on Qubit 3.0 (Thermo Fisher Scientific, USA) and a NanoDrop spectrophotometer (Thermo Fisher Scientific, USA). Library preparation was performed according to Illumina’s guideline for Nextera XT, and DNAs were sequenced using the Illumina HiSeq 2500 platform by the high-throughput protocol for bacterial genomes at SciLifeLab, Solna, Sweden. Bioinformatics analysis (Paper I) A manual in-house pipeline (INH) was set up for Illumina pair-end (PE) read libraries consisting of various steps for quality control, trimming, assembly, and annotation of the contigs in FASTA format. The PE sequenced reads were 32 33 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani bacteria. The proximity extension assay technology quantified a total of 368 3.2 METHODS proteins, including nine duplicate proteins and one triplicate protein due to overlapping panels. In Paper IV, microarray data with accession number GSE33341 was retrieved Reference microbial species identification (Paper I) from the GEO database (www.ncbi.nlm.nih.gov/geo/) using specific keywords Microbiological culturing and species determination for the paper were like ‘sepsis’, ‘S. aureus’, ‘E. coli’ and ‘array’ for further analysis and performed via MALDI-TOF MS, as previously described by Ljungström et al. investigation. This data was part of a prospective observational study (59), using a Microflex LT mass spectrometer (Bruker Daltonics, Leipzig, sponsored by the NIH, conducted between December 2005 and July 2010. The Germany) and BioTyper software v2.0 with default parameter settings. The NIH study aimed to advance the development of innovative diagnostic tests for cut-off value for species identification was set above a score of 2.0. The study severe sepsis and community-acquired pneumonia, involving four medical utilized the Bruker microorganism database MBT Compass Library DB-4110 centers in the USA (ClinicalTrials.gov NCT00258869) (58). (Bruker Daltonics, Germany), released in April 2011. The study included adult patients admitted to the emergency department with Reference antibiotic susceptibility (Paper I, Paper II) sepsis. The subjects in this report had confirmed cases of monomicrobial Antibiotic susceptibility test (AST) was performed by disk diffusion on bloodstream infection caused by either E. coli (n=19, age range 25–91) or S. Mueller Hinton media at clinical laboratory Unilabs at Skaraborg Hospital, aureus (n=32, age range 24–91). Additionally, there were 43 uninfected Skövde, Sweden according to European Committee on Antimicrobial controls (age range 21–59). Whole blood samples were collected either on the Susceptibility Testing (EUCAST) guidelines (www.eucast.org). AST findings day of the hospital presentation or within 24 hours before the initiation of obtained from identified S. aureus are referred to as phenotypic AST in study treatment. The microarray technique was employed for gene analysis in the I. The AST results for isoxazolyl penicillin and cefoxitin resistance were study, revealing a total of 22,277 genes. followed by mecA detection by PCR to confirm the isolate as a methicillin- resistant S. aureus (MRSA). The AST phenotypic results described in Paper I are restricted to the antibiotics available on the 1928 platform. In Paper II, AST was limited to ciprofloxacin, clindamycin, erythromycin, isoxazolyl penicillin, penicillin G, penicillin V, piperacillin, fusidic acid, and vancomycin. Whole genome sequencing (Paper I) Strains of S. aureus were cultured on typical blood agar, and DNAs were extracted using the automatic MagNa Pure 96 instrument DNA and Viral NA Small Volume kit with the Pathogen Universal 200 protocol (Roche Diagnostics, Switzerland) at Unilabs, Skövde. The DNA concentration was determined using the Qubit dsDNA HS assay kit on Qubit 3.0 (Thermo Fisher Scientific, USA) and a NanoDrop spectrophotometer (Thermo Fisher Scientific, USA). Library preparation was performed according to Illumina’s guideline for Nextera XT, and DNAs were sequenced using the Illumina HiSeq 2500 platform by the high-throughput protocol for bacterial genomes at SciLifeLab, Solna, Sweden. Bioinformatics analysis (Paper I) A manual in-house pipeline (INH) was set up for Illumina pair-end (PE) read libraries consisting of various steps for quality control, trimming, assembly, and annotation of the contigs in FASTA format. The PE sequenced reads were 32 33 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani assessed for initial quality control using FastQC (v.0.11.8) (60). Trimmomatic (v.0.36) (61) was employed to remove adapter sequences and filter out low- quality reads. The tool used a sliding window approach with a window size of four and a quality threshold of Q20. The first 12 bases were trimmed using the HEADCROP argument to correct for nucleotide bias. Reads longer than 30 bases after these trimming steps were retained for subsequent analysis. De novo genome assembly was done using SPAdes (v.3.13.1) (62). Assembly quality control was performed by QUAST (v.5.0.2) (63). The criteria for a good assembly included a large N50, a low number of contigs, sufficiently many contigs longer than 10,000 bp, and a genome size close to 2.8 Mbp, which is the size of S. aureus (GenBank accession number NC_007795.1). For assemblies that did not meet these criteria, the median coverage was also calculated using R (v.3.5) (64). FASTA files with contigs were annotated by tools available in the Center for Genomic Epidemiology (http://www.genomicepidemiology.org/). 16s rRNA-based species Figure 5. Workflow of the in-house pipeline. The in-house pipeline diagram indicating manual identification of S. aureus was performed using SpeciesFinder (65). K-mer- workflow of A. sequencing PE FASTQ files, B. quality control and trimming, C. assembly and based species identification was performed by KmerFinder 3.1 for bacterial evaluation, D. annotation of the assembled contigs in FASTA format using Multi Locus organism’s database with k-mer size 16 and the prefix “ATG” (65-67). Isolates Sequence Typing tool (MLST), virulence genes identification tool (VirulenceFinder), resistance identified as non-S. aureus were further verified through taxonomic analysis gene detection tool (ResFinder), species prediction tool using K-mer algorithm (KmerFinder), species prediction tool using the S16 ribosomal DNA sequence (SpeciesFinder), and JSpecies based on average nucleotide identity using the JSpeciesWS tool (v.3.4.0) (68) Web Server for pairwise genome comparison of prokaryotic species. and were excluded from subsequent analysis. Acquired antimicrobial resistance genes in the total sequenced S. aureus isolates were identified using ResFinder 3.1 (69). The analysis utilized the default settings, which included Comparative evaluation of the bioinformatic workflows (Paper I) all antimicrobial databases, with a threshold of 90% for identity and 60% for Species identification results from MALDI-TOF MS and phenotypic AST minimum length. MLST 2.0 (70) was employed for multi locus sequence were compared with the genotypic results obtained from the INH and 1928 typing of S. aureus genomes. The S. aureus sequences were aligned against workflows. The level of agreement between these results was evaluated. For seven MLST loci including arcc, aroe, glpf, gmk, pta, tpi, and ygil. Virulence the AST comparison, very major errors (VMEs) and major errors (MEs) were factors were predicted using VirulenceFinder 2.0 (71). S. aureus was selected assessed: a VME was defined as a phenotype showing resistance with a as the species with a default setting of a 90% threshold for ID and a minimum genotypic prediction of susceptibility (false negative), while an ME was length of 60% (Figure 5). defined as a phenotype showing susceptibility with a genotypic prediction of The S. aureus WGS data were also analyzed by a commercial cloud-based resistance (false positive) (72). The virulence and sequence types of genes platform; 1928 (1928 Diagnostics, Gothenburg, Sweden). Raw PE fastq.gz obtained from the two bioinformatics workflows were compared with each files were uploaded and processed through the S. aureus pipeline. The platform other. employed a sequencing depth greater than 30x as an initial quality control measure. The pipeline performed species identification, assessed antibiotic Proteomic quantification (Paper III) susceptibility, and analyzed virulence genes using a K-mer-based assembly Protein biomarkers were quantified using Proximity Extension Assay (PEA) method. According to a discussion with the platform maintainer (1928 technology with four Olink panels at TATAA Biocenter, Gothenburg, Sweden. Diagnostics, Sweden), the S. aureus pipeline was not upgraded during the The PEA works by amplifying two complementary DNA molecules that bind access period of June and July 2019. to each other when their attached antibodies bind to the same target protein in close proximity. The readout involves a cycle of quantification values obtained by real-time PCR, which are then converted to normalized protein expression values (NPX) analyzed in log2 scale units (73) (Figure 6). Four Olink panels 34 35 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani assessed for initial quality control using FastQC (v.0.11.8) (60). Trimmomatic (v.0.36) (61) was employed to remove adapter sequences and filter out low- quality reads. The tool used a sliding window approach with a window size of four and a quality threshold of Q20. The first 12 bases were trimmed using the HEADCROP argument to correct for nucleotide bias. Reads longer than 30 bases after these trimming steps were retained for subsequent analysis. De novo genome assembly was done using SPAdes (v.3.13.1) (62). Assembly quality control was performed by QUAST (v.5.0.2) (63). The criteria for a good assembly included a large N50, a low number of contigs, sufficiently many contigs longer than 10,000 bp, and a genome size close to 2.8 Mbp, which is the size of S. aureus (GenBank accession number NC_007795.1). For assemblies that did not meet these criteria, the median coverage was also calculated using R (v.3.5) (64). FASTA files with contigs were annotated by tools available in the Center for Genomic Epidemiology (http://www.genomicepidemiology.org/). 16s rRNA-based species Figure 5. Workflow of the in-house pipeline. The in-house pipeline diagram indicating manual identification of S. aureus was performed using SpeciesFinder (65). K-mer- workflow of A. sequencing PE FASTQ files, B. quality control and trimming, C. assembly and based species identification was performed by KmerFinder 3.1 for bacterial evaluation, D. annotation of the assembled contigs in FASTA format using Multi Locus organism’s database with k-mer size 16 and the prefix “ATG” (65-67). Isolates Sequence Typing tool (MLST), virulence genes identification tool (VirulenceFinder), resistance identified as non-S. aureus were further verified through taxonomic analysis gene detection tool (ResFinder), species prediction tool using K-mer algorithm (KmerFinder), species prediction tool using the S16 ribosomal DNA sequence (SpeciesFinder), and JSpecies based on average nucleotide identity using the JSpeciesWS tool (v.3.4.0) (68) Web Server for pairwise genome comparison of prokaryotic species. and were excluded from subsequent analysis. Acquired antimicrobial resistance genes in the total sequenced S. aureus isolates were identified using ResFinder 3.1 (69). The analysis utilized the default settings, which included Comparative evaluation of the bioinformatic workflows (Paper I) all antimicrobial databases, with a threshold of 90% for identity and 60% for Species identification results from MALDI-TOF MS and phenotypic AST minimum length. MLST 2.0 (70) was employed for multi locus sequence were compared with the genotypic results obtained from the INH and 1928 typing of S. aureus genomes. The S. aureus sequences were aligned against workflows. The level of agreement between these results was evaluated. For seven MLST loci including arcc, aroe, glpf, gmk, pta, tpi, and ygil. Virulence the AST comparison, very major errors (VMEs) and major errors (MEs) were factors were predicted using VirulenceFinder 2.0 (71). S. aureus was selected assessed: a VME was defined as a phenotype showing resistance with a as the species with a default setting of a 90% threshold for ID and a minimum genotypic prediction of susceptibility (false negative), while an ME was length of 60% (Figure 5). defined as a phenotype showing susceptibility with a genotypic prediction of The S. aureus WGS data were also analyzed by a commercial cloud-based resistance (false positive) (72). The virulence and sequence types of genes platform; 1928 (1928 Diagnostics, Gothenburg, Sweden). Raw PE fastq.gz obtained from the two bioinformatics workflows were compared with each files were uploaded and processed through the S. aureus pipeline. The platform other. employed a sequencing depth greater than 30x as an initial quality control measure. The pipeline performed species identification, assessed antibiotic Proteomic quantification (Paper III) susceptibility, and analyzed virulence genes using a K-mer-based assembly Protein biomarkers were quantified using Proximity Extension Assay (PEA) method. According to a discussion with the platform maintainer (1928 technology with four Olink panels at TATAA Biocenter, Gothenburg, Sweden. Diagnostics, Sweden), the S. aureus pipeline was not upgraded during the The PEA works by amplifying two complementary DNA molecules that bind access period of June and July 2019. to each other when their attached antibodies bind to the same target protein in close proximity. The readout involves a cycle of quantification values obtained by real-time PCR, which are then converted to normalized protein expression values (NPX) analyzed in log2 scale units (73) (Figure 6). Four Olink panels 34 35 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani —Cardiometabolic (CM) (v.3603), Cardiovascular II (CVD II) (v.5004), Comparative feature selection (Paper III) Immune Response (IR) (v.3202), and Inflammation (Inf) (v.3021) (Olink The ten datasets, each containing varying degrees of imputed missing data Biosciences, Uppsala, Sweden)—were used for quantifying proteins in (5%, 10%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, and 80%.), were subjected patients with sepsis and healthy control individuals. to feature selection methods, and the classifier’s prediction accuracy was measured. To assess the algorithms’ predictive performance, each dataset was split randomly into training and test datasets, with 80% and 20% of the samples assigned to each group, respectively. Three widely used supervised algorithms were evaluated to identify the most effective method for predicting proteins in our dataset. 1) Random forest (RF) (75) generates multiple classification trees from data samples and selects the class with the highest number of votes as the output for a classification problem. We created RF models with 100 decision trees, each built on bootstrap samples from the training set, using Gini impurity as the quality measure for decision-making. To address the imbalance in sample numbers across groups, we implemented cost-sensitive learning. The number of features randomly selected at each decision point was explored separately in each dataset, ranging from 1 to 20. We performed ten-fold cross- validation with three repetitions on the entire dataset to determine the optimal number of features, calculating the accuracy score and standard deviations (std). The feature number with the highest accuracy and lowest std was selected as the most optimal. The RF model was then used to predict the test set, and we evaluated its Figure 6. Simplified overview of Proximity Extension Assay (PEA) technology. Antigen-specific performance by generating a classification report that included antibodies with individual DNA tags bind to a target protein. The DNA tags are complementary precision, recall (sensitivity), f1-score, and accuracy. to each other for the same protein and will serve as a template in real-time PCR detection. 2) The Least Absolute Shrinkage and Selection Operator (LASSO) is a Missing value handling using GSimp (Paper III) regularization technique that reduces the magnitude of the coefficient The limit of detection (LOD) for each PEA assay is established at three for redundant or irrelevant predictors to zero (76). We generated the standard deviations above the background signal. We removed protein values LASSO model on the training set using a tuned lambda value of 0.02. below LOD based on the manufacturer’s instruction and defined the missing This lambda value was determined automatically by generating the values as Missing Not At Random (MNAR). A set of percentages of MNAR LassoCV model on the dataset, tuning the lambda hyperparameter was determined to create ten datasets with <5% to <80% of expression value within the range of 0 to 1 with a step size of 0.01. Ten-fold cross- below LOD in each group. GSimp (74), implemented in R (v.4.1.1) (64) and validation was employed on the entire dataset to identify the optimal accessible from GitHub (https://github.com/WandeRum/GSimp), was used lambda parameter, with the process repeated three times. The LASSO with standard settings to impute missing values based on the group. After model was then used to predict the test set, and its predictive power imputation, the mean expression values of the duplicate and triple proteins was assessed by calculating the mean square error (MSE). were calculated and replaced. 3) Recursive Feature Elimination (RFE) is a wrapper algorithm that utilizes various machine learning algorithms at its core (77), enabling 36 37 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani —Cardiometabolic (CM) (v.3603), Cardiovascular II (CVD II) (v.5004), Comparative feature selection (Paper III) Immune Response (IR) (v.3202), and Inflammation (Inf) (v.3021) (Olink The ten datasets, each containing varying degrees of imputed missing data Biosciences, Uppsala, Sweden)—were used for quantifying proteins in (5%, 10%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, and 80%.), were subjected patients with sepsis and healthy control individuals. to feature selection methods, and the classifier’s prediction accuracy was measured. To assess the algorithms’ predictive performance, each dataset was split randomly into training and test datasets, with 80% and 20% of the samples assigned to each group, respectively. Three widely used supervised algorithms were evaluated to identify the most effective method for predicting proteins in our dataset. 1) Random forest (RF) (75) generates multiple classification trees from data samples and selects the class with the highest number of votes as the output for a classification problem. We created RF models with 100 decision trees, each built on bootstrap samples from the training set, using Gini impurity as the quality measure for decision-making. To address the imbalance in sample numbers across groups, we implemented cost-sensitive learning. The number of features randomly selected at each decision point was explored separately in each dataset, ranging from 1 to 20. We performed ten-fold cross- validation with three repetitions on the entire dataset to determine the optimal number of features, calculating the accuracy score and standard deviations (std). The feature number with the highest accuracy and lowest std was selected as the most optimal. The RF model was then used to predict the test set, and we evaluated its Figure 6. Simplified overview of Proximity Extension Assay (PEA) technology. Antigen-specific performance by generating a classification report that included antibodies with individual DNA tags bind to a target protein. The DNA tags are complementary precision, recall (sensitivity), f1-score, and accuracy. to each other for the same protein and will serve as a template in real-time PCR detection. 2) The Least Absolute Shrinkage and Selection Operator (LASSO) is a Missing value handling using GSimp (Paper III) regularization technique that reduces the magnitude of the coefficient The limit of detection (LOD) for each PEA assay is established at three for redundant or irrelevant predictors to zero (76). We generated the standard deviations above the background signal. We removed protein values LASSO model on the training set using a tuned lambda value of 0.02. below LOD based on the manufacturer’s instruction and defined the missing This lambda value was determined automatically by generating the values as Missing Not At Random (MNAR). A set of percentages of MNAR LassoCV model on the dataset, tuning the lambda hyperparameter was determined to create ten datasets with <5% to <80% of expression value within the range of 0 to 1 with a step size of 0.01. Ten-fold cross- below LOD in each group. GSimp (74), implemented in R (v.4.1.1) (64) and validation was employed on the entire dataset to identify the optimal accessible from GitHub (https://github.com/WandeRum/GSimp), was used lambda parameter, with the process repeated three times. The LASSO with standard settings to impute missing values based on the group. After model was then used to predict the test set, and its predictive power imputation, the mean expression values of the duplicate and triple proteins was assessed by calculating the mean square error (MSE). were calculated and replaced. 3) Recursive Feature Elimination (RFE) is a wrapper algorithm that utilizes various machine learning algorithms at its core (77), enabling 36 37 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani the exploration of algorithms suited to the data’s structure. Initially, Feature selection strategy (Paper IV) five algorithms were examined: logistic regression (LR), perceptron, The microarray dataset was divided into two subsets, with one containing 80% gradient boosting, decision tree, and support vector machine, as the of the samples for training and the other 20% for testing. Lasso feature base core for the RFE algorithm. After conducting ten-fold cross- selection method was applied with the configuration as previously described, validation on the entire dataset, repeated three times, the accuracy of using a tuned lambda value of 0.07. The predictive power of the model was each core algorithm was determined. Based on the results, logistic evaluated using MSE. regression (RFE-LR) was selected as it provided the highest accuracy among the core algorithms. The RFE-LR model was then generated Sample balance (Paper IV) with default parameters on the training set of each dataset. The model’s To address the imbalance in sample sizes based on age and gender, we predictive performance was assessed by predicting the test set and stratified the data into subgroups and performed upsampling. We first evaluating the classification report, including precision, recall, f1- equalized the sample sizes in each age subgroup and then balanced the genders score, and accuracy. The workflow of the machine learning approach within each bacterial group. Finally, we matched the number of samples is illustrated in Figure 1 paper III. We utilized accuracy as a parameter between the bacterial and healthy control groups. This resulted in a total of 228 to assess the efficacy of each algorithm in the classification of gram- samples and 22,277 genes, allowing us to train and test the predictive genes positive and gram-negative bacterial infection patients. The selected with equal sample sizes for each group. method: Lasso, was applied to our top-ranked dataset for the selection of predictive proteins. External validation (Paper IV) To ensure the generalizability of our results, we validated our predictive gene Protein-protein network interaction (Paper III, IV) set using two additional datasets: GSE13015 (89) and GSE65088 (90). The predictive proteins selected by the Lasso algorithm were followed by a GSE13015 included samples of E. coli and S. aureus from sepsis patients and healthy controls, while GSE65088 consisted exclusively of samples of E. coli protein-protein interaction (PPI) network based on the STRING database (78) and S. aureus cultured in healthy donors’ blood. Both datasets were processed The interactive relationship between the predictive proteins was then examined consistently. We matched our predictive genes with the datasets. Subsequently, using the Cytoscape software (79). To find clusters in the network, the we used GSE13015 to evaluate gene performance in distinguishing E. coli, S. Molecular Complex Detection (MCODE) plug-in (80) of Cytoscape was used aureus, and healthy controls, and GSE65088 to differentiate E. coli from S. which clusters proteins by highly interconnected areas. aureus samples. GO term and pathway enrichment analyses (Paper III, IV) To better explore the biological significance of predictive proteins (Paper III) and genes (Paper IV), first, a list of corresponding genes for the predictive proteins was obtained via the GeneCard resource (www.genecards.org) (81, 82). Then functional and pathway enrichment analysis were performed using PANTHER (Protein ANalysis THrough Evolutionary Relationships) (83) and Reactome (84) resources in the Gene Ontology (GO) (85-87) interface. An FDR p-value <0.05 was considered a significant enrichment. Microarray preprocessing (Paper IV) Affymetrix microarrays were normalized using the Robust Multichip Average (RMA) technique (88). All transcripts detected in at least one sample were included, without any prior screening for differential expression. Subsequently, the probe IDs were converted to unique official gene symbols, and the dataset was transformed, organizing the 94 samples in rows and 22,277 genes in columns. 38 39 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani the exploration of algorithms suited to the data’s structure. Initially, Feature selection strategy (Paper IV) five algorithms were examined: logistic regression (LR), perceptron, The microarray dataset was divided into two subsets, with one containing 80% gradient boosting, decision tree, and support vector machine, as the of the samples for training and the other 20% for testing. Lasso feature base core for the RFE algorithm. After conducting ten-fold cross- selection method was applied with the configuration as previously described, validation on the entire dataset, repeated three times, the accuracy of using a tuned lambda value of 0.07. The predictive power of the model was each core algorithm was determined. Based on the results, logistic evaluated using MSE. regression (RFE-LR) was selected as it provided the highest accuracy among the core algorithms. The RFE-LR model was then generated Sample balance (Paper IV) with default parameters on the training set of each dataset. The model’s To address the imbalance in sample sizes based on age and gender, we predictive performance was assessed by predicting the test set and stratified the data into subgroups and performed upsampling. We first evaluating the classification report, including precision, recall, f1- equalized the sample sizes in each age subgroup and then balanced the genders score, and accuracy. The workflow of the machine learning approach within each bacterial group. Finally, we matched the number of samples is illustrated in Figure 1 paper III. We utilized accuracy as a parameter between the bacterial and healthy control groups. This resulted in a total of 228 to assess the efficacy of each algorithm in the classification of gram- samples and 22,277 genes, allowing us to train and test the predictive genes positive and gram-negative bacterial infection patients. The selected with equal sample sizes for each group. method: Lasso, was applied to our top-ranked dataset for the selection of predictive proteins. External validation (Paper IV) To ensure the generalizability of our results, we validated our predictive gene Protein-protein network interaction (Paper III, IV) set using two additional datasets: GSE13015 (89) and GSE65088 (90). The predictive proteins selected by the Lasso algorithm were followed by a GSE13015 included samples of E. coli and S. aureus from sepsis patients and healthy controls, while GSE65088 consisted exclusively of samples of E. coli protein-protein interaction (PPI) network based on the STRING database (78) and S. aureus cultured in healthy donors’ blood. Both datasets were processed The interactive relationship between the predictive proteins was then examined consistently. We matched our predictive genes with the datasets. Subsequently, using the Cytoscape software (79). To find clusters in the network, the we used GSE13015 to evaluate gene performance in distinguishing E. coli, S. Molecular Complex Detection (MCODE) plug-in (80) of Cytoscape was used aureus, and healthy controls, and GSE65088 to differentiate E. coli from S. which clusters proteins by highly interconnected areas. aureus samples. GO term and pathway enrichment analyses (Paper III, IV) To better explore the biological significance of predictive proteins (Paper III) and genes (Paper IV), first, a list of corresponding genes for the predictive proteins was obtained via the GeneCard resource (www.genecards.org) (81, 82). Then functional and pathway enrichment analysis were performed using PANTHER (Protein ANalysis THrough Evolutionary Relationships) (83) and Reactome (84) resources in the Gene Ontology (GO) (85-87) interface. An FDR p-value <0.05 was considered a significant enrichment. Microarray preprocessing (Paper IV) Affymetrix microarrays were normalized using the Robust Multichip Average (RMA) technique (88). All transcripts detected in at least one sample were included, without any prior screening for differential expression. Subsequently, the probe IDs were converted to unique official gene symbols, and the dataset was transformed, organizing the 94 samples in rows and 22,277 genes in columns. 38 39 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 3.3 STATISTICAL ANALYSIS 3.4 ETHICAL CONSIDERATION Statistical analysis was performed using R (v.3.5) (64), and Jupyter Notebook For Papers I-III, the study was approved by the Regional Ethical Review Board (v.6.0.3) (91) within the Anaconda (v.2-2.4.0) (92) environment, utilizing the of Gothenburg (376-11). Adult patients suspected of sepsis who were admitted Scikit-Learn library (93). In paper I, the percentage of agreement between to the emergency department between 2011 and 2012 provided written conventional microbiological routines and bioinformatics workflows was informed consent. They were fully informed about the study’s purpose, their calculated. This measure was also applied to assess the agreement between the obligations, their right to withdraw from the study at any time, and the study’s two bioinformatic methods. The Agresti-Coull approach was employed to adherence to ethical standards. Plasma and whole blood samples were compute 95% confidence intervals for these agreement percentages. collected and stored in a biobank. Patients had the opportunity to contact Dr. Lars Ljungström for clarification of any information and were informed they In Paper II, descriptive statistics were calculated by mean, standard deviation, could access the study results through published articles. In Paper I, which range, frequencies and percentages. An unpaired t-test was used to compare focused on bacterial strains, patient consent was deemed unnecessary the average age between male and female groups, with a significance level set according to national regulations (2003:460). In Paper II, reviewing the at p < 0.05. Statistical analysis was conducted using IBM SPSS version 25 patients’ electronic health records did not require individual consent at the (IBM Corp, USA). time. For Paper IV, an online dataset with prior ethical approval was utilized, eliminating the need for additional ethical clearance. In Paper III, bioinformatics analysis was conducted in three phases. Initially, a t-test was used to compare patient groups and calculate p-values, which were then adjusted for false discovery rate using the Benjamini and Hochberg method with a significance threshold of p-adj < 0.05. Next, an unsupervised learning approach, involving Principal Component Analysis (PCA) and t- distributed Stochastic Neighbor Embedding (t-SNE), was employed to explore the grouping patterns without prior knowledge of the sample groups. Finally, the Lasso method was applied to select predictive proteins. The performance of these proteins was assessed by calculating the area under the receiver operating characteristic curve (AUC-ROC), and the linear relationship among predictive proteins was evaluated using the Pearson correlation coefficient (PCC). In Paper IV, a two-step analysis was used to identify genes distinguishing between E. coli and S. aureus-induced sepsis. Initially, PCA was employed to visualize sample relationships and gain insights into the data structure. Subsequently, Lasso regression was applied for feature selection to identify the most predictive genes. The performance of these selected genes was assessed using logistic regression (94) and metrics including AUC, precision, recall, F1- score, and accuracy. The Mann-Whitney U test was utilized to evaluate differences in gene expression between E. coli and S. aureus-induced sepsis where applicable. The linear relationship between predictive genes was assessed using PCC, and the predictive performance of these genes on external datasets was evaluated using the AUC-ROC curve. 40 41 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 3.3 STATISTICAL ANALYSIS 3.4 ETHICAL CONSIDERATION Statistical analysis was performed using R (v.3.5) (64), and Jupyter Notebook For Papers I-III, the study was approved by the Regional Ethical Review Board (v.6.0.3) (91) within the Anaconda (v.2-2.4.0) (92) environment, utilizing the of Gothenburg (376-11). Adult patients suspected of sepsis who were admitted Scikit-Learn library (93). In paper I, the percentage of agreement between to the emergency department between 2011 and 2012 provided written conventional microbiological routines and bioinformatics workflows was informed consent. They were fully informed about the study’s purpose, their calculated. This measure was also applied to assess the agreement between the obligations, their right to withdraw from the study at any time, and the study’s two bioinformatic methods. The Agresti-Coull approach was employed to adherence to ethical standards. Plasma and whole blood samples were compute 95% confidence intervals for these agreement percentages. collected and stored in a biobank. Patients had the opportunity to contact Dr. Lars Ljungström for clarification of any information and were informed they In Paper II, descriptive statistics were calculated by mean, standard deviation, could access the study results through published articles. In Paper I, which range, frequencies and percentages. An unpaired t-test was used to compare focused on bacterial strains, patient consent was deemed unnecessary the average age between male and female groups, with a significance level set according to national regulations (2003:460). In Paper II, reviewing the at p < 0.05. Statistical analysis was conducted using IBM SPSS version 25 patients’ electronic health records did not require individual consent at the (IBM Corp, USA). time. For Paper IV, an online dataset with prior ethical approval was utilized, eliminating the need for additional ethical clearance. In Paper III, bioinformatics analysis was conducted in three phases. Initially, a t-test was used to compare patient groups and calculate p-values, which were then adjusted for false discovery rate using the Benjamini and Hochberg method with a significance threshold of p-adj < 0.05. Next, an unsupervised learning approach, involving Principal Component Analysis (PCA) and t- distributed Stochastic Neighbor Embedding (t-SNE), was employed to explore the grouping patterns without prior knowledge of the sample groups. Finally, the Lasso method was applied to select predictive proteins. The performance of these proteins was assessed by calculating the area under the receiver operating characteristic curve (AUC-ROC), and the linear relationship among predictive proteins was evaluated using the Pearson correlation coefficient (PCC). In Paper IV, a two-step analysis was used to identify genes distinguishing between E. coli and S. aureus-induced sepsis. Initially, PCA was employed to visualize sample relationships and gain insights into the data structure. Subsequently, Lasso regression was applied for feature selection to identify the most predictive genes. The performance of these selected genes was assessed using logistic regression (94) and metrics including AUC, precision, recall, F1- score, and accuracy. The Mann-Whitney U test was utilized to evaluate differences in gene expression between E. coli and S. aureus-induced sepsis where applicable. The linear relationship between predictive genes was assessed using PCC, and the predictive performance of these genes on external datasets was evaluated using the AUC-ROC curve. 40 41 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 4 RESULTS AND DISCUSSION 4.1 PAPER I- BENCHMARKING OF TWO BIOINFORMATICS WORKFLOWS NGS technologies, particularly bacterial WGS, offer the potential to transform clinical microbiology by providing rapid and precise identification of bacterial sepsis, antibiotic resistance genes, and virulence genes (95). However, the clinical adoption of WGS has been slow due to the lack of appropriate platforms (96). Recently, an automated bioinformatics platform 1928 Diagnostics (Gothenburg, Sweden), designed for WGS analysis, has been employed in the study of Staphylococcus argenteus (S. argenteus), MRSA, and Klebsiella spp. (97-100). In this study, our objective was to assess the performance of 1928 Diagnostics platform in identifying S. aureus isolates from sepsis patients, as well as in detecting resistance genes, and virulence factors and performing multi locus sequence typing. We developed an INH pipeline (discussed in chapter 3.2) and compared the results of the 1928 platform and the INH pipeline with those from MALDI-TOF MS and phenotypic AST, as well as with each other. During the development of our INH pipeline, we assessed several tools for predicting S. aureus species including, CGE SpeciesFinder (65), CGE KmerFinder (66), and JSpeciesWS (68). CGE SpeciesFinder has previously been reported to have low accuracy for species identification (65). In our study, it also showed limitations in accurately identifying S. aureus, failing to identify 61 isolates and showing a lower agreement (76.5%, 202/264) with MALDI- TOF MS, compared to CGE KmerFinder and JSpeciesWS, which demonstrated a high level of agreement (99.2%, 262/264) with the same method (Table 3). Using 1928 for species identification, nine FASTQ files initially failed quality control due to sequencing depth/coverage requirements exceeding 30X. After adjusting parameters to accept a range of 11-29X, the 1928 platform showed high agreement (99.2%, 262/264) with MALDI-TOF MS (95% CI: 97.1-99.9) (Table 3). This indicates the platform’s effectiveness in identifying S. aureus even with reduced depth/coverage. Post-adjustment, it identified two discrepant isolates: SA 310 as Staphylococcus epidermidis (S. epidermidis) and SA 1413 as a non-staphylococcal species. CGE SpeciesFinder, CGE KmerFinder, and JSpeciesWS produced similar results for SA 310 as S. epidermidis, while CGE KmerFinder and JSpeciesWS identified SA 1413 as Staphylococcus argenteus (S. argenteus). 42 43 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 4 RESULTS AND DISCUSSION 4.1 PAPER I- BENCHMARKING OF TWO BIOINFORMATICS WORKFLOWS NGS technologies, particularly bacterial WGS, offer the potential to transform clinical microbiology by providing rapid and precise identification of bacterial sepsis, antibiotic resistance genes, and virulence genes (95). However, the clinical adoption of WGS has been slow due to the lack of appropriate platforms (96). Recently, an automated bioinformatics platform 1928 Diagnostics (Gothenburg, Sweden), designed for WGS analysis, has been employed in the study of Staphylococcus argenteus (S. argenteus), MRSA, and Klebsiella spp. (97-100). In this study, our objective was to assess the performance of 1928 Diagnostics platform in identifying S. aureus isolates from sepsis patients, as well as in detecting resistance genes, and virulence factors and performing multi locus sequence typing. We developed an INH pipeline (discussed in chapter 3.2) and compared the results of the 1928 platform and the INH pipeline with those from MALDI-TOF MS and phenotypic AST, as well as with each other. During the development of our INH pipeline, we assessed several tools for predicting S. aureus species including, CGE SpeciesFinder (65), CGE KmerFinder (66), and JSpeciesWS (68). CGE SpeciesFinder has previously been reported to have low accuracy for species identification (65). In our study, it also showed limitations in accurately identifying S. aureus, failing to identify 61 isolates and showing a lower agreement (76.5%, 202/264) with MALDI- TOF MS, compared to CGE KmerFinder and JSpeciesWS, which demonstrated a high level of agreement (99.2%, 262/264) with the same method (Table 3). Using 1928 for species identification, nine FASTQ files initially failed quality control due to sequencing depth/coverage requirements exceeding 30X. After adjusting parameters to accept a range of 11-29X, the 1928 platform showed high agreement (99.2%, 262/264) with MALDI-TOF MS (95% CI: 97.1-99.9) (Table 3). This indicates the platform’s effectiveness in identifying S. aureus even with reduced depth/coverage. Post-adjustment, it identified two discrepant isolates: SA 310 as Staphylococcus epidermidis (S. epidermidis) and SA 1413 as a non-staphylococcal species. CGE SpeciesFinder, CGE KmerFinder, and JSpeciesWS produced similar results for SA 310 as S. epidermidis, while CGE KmerFinder and JSpeciesWS identified SA 1413 as Staphylococcus argenteus (S. argenteus). 42 43 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani S. argenteus, a novel species identified in 2015 using the Illumina HiSeq overall agreement of 99.0% (996/1006, 95% CI: 98.1-99.5) (Table 4), while platform (101), presents challenges in evaluating its clinical significance due the INH workflow showed a slightly lower but still strong overall agreement to limited research. However, some studies have noted that its frequency, of 98.4% (990/1006, 95% CI: 97.4-99.0) (Table 4). Across all genotypic AST morbidity, and mortality rates are similar to those of S. aureus (102, 103), with findings, the total agreement was 99.2% (998/1006, 95% CI: 98.4-99.6) (Table reports of methicillin-resistant isolates (101). The first S. argenteus case in 4). Sweden was documented in 2018 (97), just after we completed our WGS data analysis. Since April 2018, both the regional laboratory’s database and the Table 4. Evaluation of predicted genotypic AST using the 1928 platform, the 1928 platform have been revised accordingly. in-house pipeline, and phenotypic AST for 255 S. aureus isolates. Phenotypic AST (n) Predicted genotypic AST by Discordant across Table 3. Genetic predicted species identification by the bioinformatic 1928 platform / INH methods (n [%]) workflows of the 264 isolates identified as S. aureus by MALDI-TOF MS. R/R S/S R/S S/R S (981) 1 979 1 0 2 [0.2] S. aureus Other species No prediction Agreement against Bioinformatic tool R (25) 10 8 7 0 15[60.0] JSpeciesWS % (95% CI) In total n* =1006 11 987 8 0 n** =17 [1.7] (n [%]) (n [%]) (n [%]) *Total number of cases tested with the EUCAST test. **Number of discordant results involving JSpeciesWS 262 [99.2] 2 [0.8] * 0 [0] both bioinformatics workflows. S, Susceptible; R, Resistant. Bolded items indicate 100% agreement. SpeciesFinder 202 [76.5] 1 [0.4] ** 61 [23.1] *** 76.5 (71.0-81.2) KmerFinder 262 [99.2] 2 [0.8] * 0 [0] 100.0 (98.2-100.0) The strong agreement between the 1928 platform, INH, and AST is largely 1928 262 [99.2] 2 [0.8] **** 0 [0] 99.6 (97.6-100.0) attributed to the alignment between susceptible cases in phenotypic AST and *S. epidermidis and S. argenteus **S. epidermidis ***Results lower than 98% ID match were the two bioinformatics workflows. There was a minimal discordance of just excluded ****S. epidermidis and unknown. 0.2% (2/981) when comparing these workflows with the genotypically predicted AST (Table 4). In contrast, the greatest discordance among the three methods was found for antibiotic resistance, with a discordance rate of 60.0% We excluded the two samples identified as S. epidermidis and S. argenteus (15/25; 95% CI: 40.7–76.5) (Table 4). This observation was further supported from further analysis, since the study focused on S. aureus. Similarly, FASTQ by the lower combined ME rate of 0.1% (1/1006) compared to the combined files that failed the 1928 platform’s quality control were excluded to ensure the VME rate of 0.8% (8/ 1006) (Paper I, Table 5). This finding is consistent with utilization of high-quality data and enhance results accuracy and reliability. the Mason et al. study (2018), which observed higher agreement in Consequently, 255 S. aureus isolates were included in the further susceptibility compared to resistance genes using the three bioinformatic benchmarking of AST, virulence gene, and sequence type characterization. methods of Genefinder, Mykrobe, and Typewriter (105). Bioinformatics methods for WGS analysis have been demonstrated to be as Mason et al. (105) also reported VME for ciprofloxacin and fusidic acid, where sensitive and specific as routine antimicrobial susceptibility testing methods phenotypic AST indicated resistance, but genotypic AST suggested (72, 104, 105). We also observed an agreement of 98.0% (989/1006, 95% CI: susceptibility. Similarly, our study identified 26 discrepancies between 97.3-99.0) between the combined predicted genotypic antibiotic susceptibility phenotypic AST and the two bioinformatics methods, including VME for from both bioinformatic workflows (1928 and INH) and phenotypic AST fusidic acid and ciprofloxacin. The 1928 platform exhibited a VME rate of (Table 4). While both methods have their advantages and limitations, the high 1.4% (1/70) for ciprofloxacin, whereas the INH platform had a higher rate of degree of agreement between genotypic and phenotypic AST in this study 5.7% (4/70). For fusidic acid, the 1928 platform had a VME rate of 1.9% suggests that genotypic AST may be a useful tool for testing antibiotic (4/206), while the INH showed a rate of 3.4% (7/206) (Paper I, Table 5). susceptibility in S. aureus, particularly in cases where phenotypic AST is not Consistent with our findings for the 1928 platform, Gordon et al. (2014) (72) feasible. The overall agreement of each workflow (1928 and INH) with the have reported VME rates of 1.4% for ciprofloxacin but a lower rate of 0.6% reference method was also assessed. The 1928 platform demonstrated a high for fusidic acid using bioinformatics workflows like BLASTn and tBLASTn. 44 45 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani S. argenteus, a novel species identified in 2015 using the Illumina HiSeq overall agreement of 99.0% (996/1006, 95% CI: 98.1-99.5) (Table 4), while platform (101), presents challenges in evaluating its clinical significance due the INH workflow showed a slightly lower but still strong overall agreement to limited research. However, some studies have noted that its frequency, of 98.4% (990/1006, 95% CI: 97.4-99.0) (Table 4). Across all genotypic AST morbidity, and mortality rates are similar to those of S. aureus (102, 103), with findings, the total agreement was 99.2% (998/1006, 95% CI: 98.4-99.6) (Table reports of methicillin-resistant isolates (101). The first S. argenteus case in 4). Sweden was documented in 2018 (97), just after we completed our WGS data analysis. Since April 2018, both the regional laboratory’s database and the Table 4. Evaluation of predicted genotypic AST using the 1928 platform, the 1928 platform have been revised accordingly. in-house pipeline, and phenotypic AST for 255 S. aureus isolates. Phenotypic AST (n) Predicted genotypic AST by Discordant across Table 3. Genetic predicted species identification by the bioinformatic 1928 platform / INH methods (n [%]) workflows of the 264 isolates identified as S. aureus by MALDI-TOF MS. R/R S/S R/S S/R S (981) 1 979 1 0 2 [0.2] S. aureus Other species No prediction Agreement against Bioinformatic tool R (25) 10 8 7 0 15[60.0] JSpeciesWS % (95% CI) In total n* =1006 11 987 8 0 n** =17 [1.7] (n [%]) (n [%]) (n [%]) *Total number of cases tested with the EUCAST test. **Number of discordant results involving JSpeciesWS 262 [99.2] 2 [0.8] * 0 [0] both bioinformatics workflows. S, Susceptible; R, Resistant. Bolded items indicate 100% agreement. SpeciesFinder 202 [76.5] 1 [0.4] ** 61 [23.1] *** 76.5 (71.0-81.2) KmerFinder 262 [99.2] 2 [0.8] * 0 [0] 100.0 (98.2-100.0) The strong agreement between the 1928 platform, INH, and AST is largely 1928 262 [99.2] 2 [0.8] **** 0 [0] 99.6 (97.6-100.0) attributed to the alignment between susceptible cases in phenotypic AST and *S. epidermidis and S. argenteus **S. epidermidis ***Results lower than 98% ID match were the two bioinformatics workflows. There was a minimal discordance of just excluded ****S. epidermidis and unknown. 0.2% (2/981) when comparing these workflows with the genotypically predicted AST (Table 4). In contrast, the greatest discordance among the three methods was found for antibiotic resistance, with a discordance rate of 60.0% We excluded the two samples identified as S. epidermidis and S. argenteus (15/25; 95% CI: 40.7–76.5) (Table 4). This observation was further supported from further analysis, since the study focused on S. aureus. Similarly, FASTQ by the lower combined ME rate of 0.1% (1/1006) compared to the combined files that failed the 1928 platform’s quality control were excluded to ensure the VME rate of 0.8% (8/ 1006) (Paper I, Table 5). This finding is consistent with utilization of high-quality data and enhance results accuracy and reliability. the Mason et al. study (2018), which observed higher agreement in Consequently, 255 S. aureus isolates were included in the further susceptibility compared to resistance genes using the three bioinformatic benchmarking of AST, virulence gene, and sequence type characterization. methods of Genefinder, Mykrobe, and Typewriter (105). Bioinformatics methods for WGS analysis have been demonstrated to be as Mason et al. (105) also reported VME for ciprofloxacin and fusidic acid, where sensitive and specific as routine antimicrobial susceptibility testing methods phenotypic AST indicated resistance, but genotypic AST suggested (72, 104, 105). We also observed an agreement of 98.0% (989/1006, 95% CI: susceptibility. Similarly, our study identified 26 discrepancies between 97.3-99.0) between the combined predicted genotypic antibiotic susceptibility phenotypic AST and the two bioinformatics methods, including VME for from both bioinformatic workflows (1928 and INH) and phenotypic AST fusidic acid and ciprofloxacin. The 1928 platform exhibited a VME rate of (Table 4). While both methods have their advantages and limitations, the high 1.4% (1/70) for ciprofloxacin, whereas the INH platform had a higher rate of degree of agreement between genotypic and phenotypic AST in this study 5.7% (4/70). For fusidic acid, the 1928 platform had a VME rate of 1.9% suggests that genotypic AST may be a useful tool for testing antibiotic (4/206), while the INH showed a rate of 3.4% (7/206) (Paper I, Table 5). susceptibility in S. aureus, particularly in cases where phenotypic AST is not Consistent with our findings for the 1928 platform, Gordon et al. (2014) (72) feasible. The overall agreement of each workflow (1928 and INH) with the have reported VME rates of 1.4% for ciprofloxacin but a lower rate of 0.6% reference method was also assessed. The 1928 platform demonstrated a high for fusidic acid using bioinformatics workflows like BLASTn and tBLASTn. 44 45 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani The difference in fusidic acid VMEs could stem from employing distinct (Paper I, Table 8). Both methods agreed in 97.9% (231/236, 95% CI 95.0– algorithms, or as noted by Gordon et al., their study incorporated low-quality 99.2%) of cases in predicting the ST type of S. aureus, except for 19 isolates contigs in the analysis of fusidic acid to enhance prediction accuracy (72). where neither platform could determine the ST. Comparison of different MLST software in NGS analysis has shown different performance in The lower VME rate of the 1928 platform compared to INH was not limited to determining ST type (114). We also noted discrepancies between the two ciprofloxacin and fusidic acids. Upon further comparison of the discrepancies methods: three STs were detected by the 1928 platform but not by the INH, between each bioinformatics workflow and phenotypic AST, the 1928 and two STs were detected by the INH but not by the 1928 platform. These platform exhibited a lower combined VME rate (0.8%, 8/1006) compared to discrepancies may be due to differences in the algorithms used by the INH (1.5%, 15/1006). However, the 1928 platform had a slightly higher platforms, variations in database references, or the quality of the sequencing combined ME rate (0.2%) than INH (0.1%) (Paper I, Table 5). Both methods data. demonstrated high accuracy in analyzing antimicrobial susceptibility, with the 1928 platform showing a comparative advantage in minimizing VME. One of the main requirements for the adaptation of WGS in infection control and public health is speed, as timely identification of infection agents can be Identifying virulence factors in bacterial infections, including S. aureus, is critical in the diagnosis and treatment of patients. Using the same important as these factors influence the infection’s severity and outcome. computational system, the estimated time required to analyze one bacterial Knowing the virulence factors in a specific S. aureus strain helps clinicians isolate with two FASTQ paired-end files using the INH workflow was 5–6 make more informed decisions when selecting antibiotics and treatment (106). hours. In contrast, the 1928 platform completed the same analysis in just 15– During the study, the clinical lab did not perform reference methods for 30 minutes. virulence factor identification. Therefore, we benchmarked the two bioinformatics workflows against each other, focusing on the genes included Another crucial factor is the reliability of results from bioinformatics in the 1928 platform. workflows. Both workflows showed high agreement with clinical diagnoses for S. aureus. However, the INH workflow requires formal bioinformatics The 1928 platform was designed to identify critical virulence genes in S. support, which may make it more complex to implement in clinical settings aureus infections, including etA and etB, which produce exfoliative toxins compared to the 1928 platform, where users simply upload FASTQ files. In responsible for staphylococcal scalded skin syndrome (107), tsst1 associated contrast, the 1928 platform is limited to its built-in analyses, whereas the INH with toxic shock syndrome toxin-1 and severe symptoms in toxic shock pipeline offers the advantage of expansion with additional analyses from CGE. syndrome cases (108), and Panton-Valentine Leucocidin (PVL) exotoxin encoded by lukF-PVL and lukS-PVL genes, contributing significantly to infection severity and adverse outcomes in invasive diseases (109). The comparison between the 1928 platform and INH revealed a high level of agreement in predicting these specific virulence genes in S. aureus strains, achieving an overall agreement of 99.4% (1267/1275, 95% CI: 98.7-99.7) (Paper I, Table 7). However, it was observed that the 1928 platform uniquely identified more isolates harboring virulence genes etA (n=2), etB (n=2), and tsst1 (n=4) than INH (Paper I, Table 7). This highlights potential differences in the sensitivity or specificity of the two bioinformatics workflows for these certain virulence genes. MLST is a widely adopted method for bacterial typing, crucial for investigating outbreaks caused by various pathogens (110-113). We evaluated the 1928 platform and INH in predicting MLST types to assess their consistency. In our study, out of 255 isolates, 236 (92.5%, CI: 88.6-95.2) displayed consistent MLST types between the 1928 platform and the INH 46 47 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani The difference in fusidic acid VMEs could stem from employing distinct (Paper I, Table 8). Both methods agreed in 97.9% (231/236, 95% CI 95.0– algorithms, or as noted by Gordon et al., their study incorporated low-quality 99.2%) of cases in predicting the ST type of S. aureus, except for 19 isolates contigs in the analysis of fusidic acid to enhance prediction accuracy (72). where neither platform could determine the ST. Comparison of different MLST software in NGS analysis has shown different performance in The lower VME rate of the 1928 platform compared to INH was not limited to determining ST type (114). We also noted discrepancies between the two ciprofloxacin and fusidic acids. Upon further comparison of the discrepancies methods: three STs were detected by the 1928 platform but not by the INH, between each bioinformatics workflow and phenotypic AST, the 1928 and two STs were detected by the INH but not by the 1928 platform. These platform exhibited a lower combined VME rate (0.8%, 8/1006) compared to discrepancies may be due to differences in the algorithms used by the INH (1.5%, 15/1006). However, the 1928 platform had a slightly higher platforms, variations in database references, or the quality of the sequencing combined ME rate (0.2%) than INH (0.1%) (Paper I, Table 5). Both methods data. demonstrated high accuracy in analyzing antimicrobial susceptibility, with the 1928 platform showing a comparative advantage in minimizing VME. One of the main requirements for the adaptation of WGS in infection control and public health is speed, as timely identification of infection agents can be Identifying virulence factors in bacterial infections, including S. aureus, is critical in the diagnosis and treatment of patients. Using the same important as these factors influence the infection’s severity and outcome. computational system, the estimated time required to analyze one bacterial Knowing the virulence factors in a specific S. aureus strain helps clinicians isolate with two FASTQ paired-end files using the INH workflow was 5–6 make more informed decisions when selecting antibiotics and treatment (106). hours. In contrast, the 1928 platform completed the same analysis in just 15– During the study, the clinical lab did not perform reference methods for 30 minutes. virulence factor identification. Therefore, we benchmarked the two bioinformatics workflows against each other, focusing on the genes included Another crucial factor is the reliability of results from bioinformatics in the 1928 platform. workflows. Both workflows showed high agreement with clinical diagnoses for S. aureus. However, the INH workflow requires formal bioinformatics The 1928 platform was designed to identify critical virulence genes in S. support, which may make it more complex to implement in clinical settings aureus infections, including etA and etB, which produce exfoliative toxins compared to the 1928 platform, where users simply upload FASTQ files. In responsible for staphylococcal scalded skin syndrome (107), tsst1 associated contrast, the 1928 platform is limited to its built-in analyses, whereas the INH with toxic shock syndrome toxin-1 and severe symptoms in toxic shock pipeline offers the advantage of expansion with additional analyses from CGE. syndrome cases (108), and Panton-Valentine Leucocidin (PVL) exotoxin encoded by lukF-PVL and lukS-PVL genes, contributing significantly to infection severity and adverse outcomes in invasive diseases (109). The comparison between the 1928 platform and INH revealed a high level of agreement in predicting these specific virulence genes in S. aureus strains, achieving an overall agreement of 99.4% (1267/1275, 95% CI: 98.7-99.7) (Paper I, Table 7). However, it was observed that the 1928 platform uniquely identified more isolates harboring virulence genes etA (n=2), etB (n=2), and tsst1 (n=4) than INH (Paper I, Table 7). This highlights potential differences in the sensitivity or specificity of the two bioinformatics workflows for these certain virulence genes. MLST is a widely adopted method for bacterial typing, crucial for investigating outbreaks caused by various pathogens (110-113). We evaluated the 1928 platform and INH in predicting MLST types to assess their consistency. In our study, out of 255 isolates, 236 (92.5%, CI: 88.6-95.2) displayed consistent MLST types between the 1928 platform and the INH 46 47 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 4.2 PAPER II- EPIDEMIOLOGY AND There is no clear consensus on the correlation between gender and S. aureus ANTIBIOTIC RESISTANCE PATTERN infection, with varying prevalence rates reported between males and females (122). Some studies have suggested that males may be at a higher risk of community-acquired S. aureus infections (123, 124), which was also the case in our study. Similarly, there were differences between genders in terms of A major concern associated with S. aureus infections is the emergence of ODR isolates, but no significant differences were observed in terms of MRSA antibiotic resistance. MRSA is a particular concern as it resists many and MDR infections (Paper II, Table 1). commonly used antibiotics, complicating treatment and leading to increased morbidity, mortality, and healthcare costs (115). Age has been associated with both the incidence and severity of S. aureus infection, with older adults having a higher rate than younger individuals. For Our aim in this study was to identify the epidemiology and resistance patterns example, Skogberg et al. (2012) reported that S. aureus bloodstream infections of S. aureus strains isolated in the Skaraborg sepsis study. We explored increased significantly in those aged 65 and older (125), and a study in laboratory records of 262 strains obtained from 212 patients (aged 18 to 97). Denmark similarly found higher infection rates in individuals aged 80 and above (124). Our study also identified the highest percentage of S. aureus Our study identified 1.1% (3/262) of S. aureus strains as MRSA, aligning with strains in those over 70 years old (Paper II, Figure 2). Interestingly, the average the 2021 Swedres-Svarm report from the Public Health Agency of Sweden and age of infection was higher in females (74 years) compared to males (69 years, the National Veterinary Institute, which recorded an MRSA prevalence of p=0.03) (Paper II, Table 1), which may be related to menopause and its impact 1.1% in 2013 (116). Although MRSA prevalence in the Skaraborg region was on immune function, though further research is needed to explore this. relatively low, multidrug resistance to four or more antibiotics (MDR) was found in 3.4% of the strains, and resistance to one to three antibiotics (ODR) S. aureus is commonly found in the nasal passages and on the skin of healthy was observed in 76.7% of the strains (Paper II, Figure 1). This suggests that individuals (126, 127). In our study, nasal carriage of S. aureus strains was also while MRSA was not a major concern locally, other forms of antibiotic- prevalent, with many isolates obtained from upper respiratory tract specimens resistant S. aureus could pose a public health risk. (Paper II, Table 2), reinforcing the importance of nasal carriage as a reservoir for these bacteria. The treatment of S. aureus infections is primarily guided by the bacterial strain’s antibiotic resistance profile and the severity of the infection. According In summary, the region had a low prevalence of MRSA, with most strains to the Swedres-Svarm report, MRSA strains were resistant to clindamycin and resistant to one to three antibiotics, underscoring the ongoing challenge of erythromycin (116), however, our study found that these MRSA isolates were antibiotic resistance in S. aureus infections. The study also highlighted that age susceptible to clindamycin, erythromycin, and fusidic acid (Paper II, Table 3). and gender are significant factors, with higher prevalence in males and The discrepancy highlights variations in resistance patterns reported across individuals over 70 years old. different regions (117-119). Furthermore, our study found that vancomycin, a potent antibiotic commonly used to treat MRSA infections, was effective against all isolates (Paper II, Table 3). This result was consistent with the Swedres-Svarm report, which also indicated no vancomycin resistance among S. aureus isolates (116). Our findings demonstrated high resistance (>80%) of S. aureus strains to several commonly used antibiotics, including penicillin V (MRSA, MDR, and ODR strains), penicillin G, and piperacillin (both affecting MRSA and MDR strains), as well as isoxazolyl penicillin (MRSA strains). These results are consistent with other studies documenting high penicillin resistance rates among S. aureus strains across different geographical locations (120, 121). 48 49 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 4.2 PAPER II- EPIDEMIOLOGY AND There is no clear consensus on the correlation between gender and S. aureus ANTIBIOTIC RESISTANCE PATTERN infection, with varying prevalence rates reported between males and females (122). Some studies have suggested that males may be at a higher risk of community-acquired S. aureus infections (123, 124), which was also the case in our study. Similarly, there were differences between genders in terms of A major concern associated with S. aureus infections is the emergence of ODR isolates, but no significant differences were observed in terms of MRSA antibiotic resistance. MRSA is a particular concern as it resists many and MDR infections (Paper II, Table 1). commonly used antibiotics, complicating treatment and leading to increased morbidity, mortality, and healthcare costs (115). Age has been associated with both the incidence and severity of S. aureus infection, with older adults having a higher rate than younger individuals. For Our aim in this study was to identify the epidemiology and resistance patterns example, Skogberg et al. (2012) reported that S. aureus bloodstream infections of S. aureus strains isolated in the Skaraborg sepsis study. We explored increased significantly in those aged 65 and older (125), and a study in laboratory records of 262 strains obtained from 212 patients (aged 18 to 97). Denmark similarly found higher infection rates in individuals aged 80 and above (124). Our study also identified the highest percentage of S. aureus Our study identified 1.1% (3/262) of S. aureus strains as MRSA, aligning with strains in those over 70 years old (Paper II, Figure 2). Interestingly, the average the 2021 Swedres-Svarm report from the Public Health Agency of Sweden and age of infection was higher in females (74 years) compared to males (69 years, the National Veterinary Institute, which recorded an MRSA prevalence of p=0.03) (Paper II, Table 1), which may be related to menopause and its impact 1.1% in 2013 (116). Although MRSA prevalence in the Skaraborg region was on immune function, though further research is needed to explore this. relatively low, multidrug resistance to four or more antibiotics (MDR) was found in 3.4% of the strains, and resistance to one to three antibiotics (ODR) S. aureus is commonly found in the nasal passages and on the skin of healthy was observed in 76.7% of the strains (Paper II, Figure 1). This suggests that individuals (126, 127). In our study, nasal carriage of S. aureus strains was also while MRSA was not a major concern locally, other forms of antibiotic- prevalent, with many isolates obtained from upper respiratory tract specimens resistant S. aureus could pose a public health risk. (Paper II, Table 2), reinforcing the importance of nasal carriage as a reservoir for these bacteria. The treatment of S. aureus infections is primarily guided by the bacterial strain’s antibiotic resistance profile and the severity of the infection. According In summary, the region had a low prevalence of MRSA, with most strains to the Swedres-Svarm report, MRSA strains were resistant to clindamycin and resistant to one to three antibiotics, underscoring the ongoing challenge of erythromycin (116), however, our study found that these MRSA isolates were antibiotic resistance in S. aureus infections. The study also highlighted that age susceptible to clindamycin, erythromycin, and fusidic acid (Paper II, Table 3). and gender are significant factors, with higher prevalence in males and The discrepancy highlights variations in resistance patterns reported across individuals over 70 years old. different regions (117-119). Furthermore, our study found that vancomycin, a potent antibiotic commonly used to treat MRSA infections, was effective against all isolates (Paper II, Table 3). This result was consistent with the Swedres-Svarm report, which also indicated no vancomycin resistance among S. aureus isolates (116). Our findings demonstrated high resistance (>80%) of S. aureus strains to several commonly used antibiotics, including penicillin V (MRSA, MDR, and ODR strains), penicillin G, and piperacillin (both affecting MRSA and MDR strains), as well as isoxazolyl penicillin (MRSA strains). These results are consistent with other studies documenting high penicillin resistance rates among S. aureus strains across different geographical locations (120, 121). 48 49 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 4.3 PAPER III- IDENTIFYING A POSSIBLE Furthermore, the assessment of accuracy across datasets with 80% and 5% PROTEIN BIOMARKER PANEL missing values revealed that the performance remained comparable, even though the 80% dataset relied heavily on imputed data (Paper III, Figure 2). This result highlights the effectiveness of our imputation method, GSimp, in addressing missing values—a common challenge in proteomics research. Over recent decades, distinct inflammatory biomarker patterns for gram- positive and gram-negative bacterial sepsis have been proposed (128-131), Our study results showed that for the dataset with 40% missing data, presenting promising opportunities for faster diagnosis and targeted encompassing 285 proteins, Lasso achieved the highest classification accuracy treatments. These advancements have the potential to significantly reduce the at 71.2%, outperforming RF (63.0%) and RFE-LR (67.0%) (Paper III, Figure time required to initiate appropriate therapies, and improve patient outcomes, 2). Through Lasso regression, we identified 55 proteins as the most predictive in contrast to conventional blood cultures, which are time-consuming and may biomarkers, effectively distinguishing between the three groups: patients with delay treatment (132). gram-positive bacterial infections, patients with gram-negative bacterial infections, and healthy controls. Evaluation of the performance of the 55 This study aimed to assess whether a panel of protein blood biomarkers could selected proteins showed a perfect classification performance with an AUC of effectively differentiate between gram-positive and gram-negative bacteria 1.0 (95% CI: 0.549-0.771) for distinguishing bacterial infections from healthy directly from blood samples using PEA technology. We selected four Olink controls. However, the model exhibited only moderate performance for gram- panels—cardiometabolic, immune response, inflammation, and cardiovascular positive infections (AUC: 0.66, 95% CI: 0.549-0.771) and gram-negative II— based on the involvement of sepsis in multiple physiological processes infections (AUC: 0.69, 95% CI: 0.586-0.794) (Paper III, Figure 4), indicating (24, 25). that the 55 proteins may not be sufficient for accurate differentiation between these bacterial types. The low AUC values may be due to several factors. A PEA technology is highly sensitive and capable of multiplexed protein possible overlap in protein profiles between gram-positive and gram-negative detection (133), but this sensitivity often results in a considerable number of infections, due to inherent similarities, makes it challenging to distinguish missing values due to measurements falling below the limit of detection between these groups. Moreover, variability within patient groups, including (LOD). Therefore, it is crucial to handle these missing values appropriately to different bacterial types, may have affected the results. Finally, the complexity ensure accurate outcomes in subsequent statistical analyses and machine of sepsis and the limited range of proteins analyzed suggest that broader learning algorithms. In this study, we aimed to determine the optimal number proteomic approaches such as mass spectrometry might improve classification of missing values per protein and identify the most effective algorithm for our accuracy. data. To manage missing data, we used the GSimp imputation method, which estimates missing values based on the distribution of detected values above The selection of biomarkers can depend on the specifics of the study and the LOD (134). We created ten imputed datasets with varying levels of missing biological context. Zhang et al. (2017) (148) employed an RFE-LR model to data and compared the performance of three widely-used feature selection assess 49 blood biomarkers, including leukocytes and cytokines like IFNγ, in algorithms in identifying biological markers for discriminating patients with patients with peritonitis. Their model achieved an AUC of 0.993 for sepsis: RF (135-139), Lasso (140-142), and RFE (143-146). distinguishing gram-negative infections with eight biomarkers and 0.711 for gram-positive infections with five biomarkers. Notably, IFN-γ was among the Our study found that proteins with less than 40% missing values provided the 55 proteins identified in our study. Zhang’s research aimed to differentiate most accurate predictions, resulting in 285 relevant proteins for analysis. In between Streptococcal and Staphylococcal bacteria and gram-negative strains, contrast, both stringent filtration (<5%) and excessively permissive inclusion indicating the potential for species-specific biomarker profiles. However, our (<80%) significantly impaired prediction accuracy. This observation aligns study’s limited bacterial sample size restricted our ability to identify markers with the manufacturer’s recommended range of 25-50% missing values (147). unique to each bacterial species. Additionally, another study using ELISA and These findings offer valuable insights into balancing the retention of essential analyzing eight cytokines found elevated levels of IFN-γ, TNF-α, IL-1ra, and information with maintaining prediction accuracy. IL-10 in gram-negative infections (130), which were also identified in our research, assisting in the differentiation between gram-positive and gram- 50 51 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 4.3 PAPER III- IDENTIFYING A POSSIBLE Furthermore, the assessment of accuracy across datasets with 80% and 5% PROTEIN BIOMARKER PANEL missing values revealed that the performance remained comparable, even though the 80% dataset relied heavily on imputed data (Paper III, Figure 2). This result highlights the effectiveness of our imputation method, GSimp, in addressing missing values—a common challenge in proteomics research. Over recent decades, distinct inflammatory biomarker patterns for gram- positive and gram-negative bacterial sepsis have been proposed (128-131), Our study results showed that for the dataset with 40% missing data, presenting promising opportunities for faster diagnosis and targeted encompassing 285 proteins, Lasso achieved the highest classification accuracy treatments. These advancements have the potential to significantly reduce the at 71.2%, outperforming RF (63.0%) and RFE-LR (67.0%) (Paper III, Figure time required to initiate appropriate therapies, and improve patient outcomes, 2). Through Lasso regression, we identified 55 proteins as the most predictive in contrast to conventional blood cultures, which are time-consuming and may biomarkers, effectively distinguishing between the three groups: patients with delay treatment (132). gram-positive bacterial infections, patients with gram-negative bacterial infections, and healthy controls. Evaluation of the performance of the 55 This study aimed to assess whether a panel of protein blood biomarkers could selected proteins showed a perfect classification performance with an AUC of effectively differentiate between gram-positive and gram-negative bacteria 1.0 (95% CI: 0.549-0.771) for distinguishing bacterial infections from healthy directly from blood samples using PEA technology. We selected four Olink controls. However, the model exhibited only moderate performance for gram- panels—cardiometabolic, immune response, inflammation, and cardiovascular positive infections (AUC: 0.66, 95% CI: 0.549-0.771) and gram-negative II— based on the involvement of sepsis in multiple physiological processes infections (AUC: 0.69, 95% CI: 0.586-0.794) (Paper III, Figure 4), indicating (24, 25). that the 55 proteins may not be sufficient for accurate differentiation between these bacterial types. The low AUC values may be due to several factors. A PEA technology is highly sensitive and capable of multiplexed protein possible overlap in protein profiles between gram-positive and gram-negative detection (133), but this sensitivity often results in a considerable number of infections, due to inherent similarities, makes it challenging to distinguish missing values due to measurements falling below the limit of detection between these groups. Moreover, variability within patient groups, including (LOD). Therefore, it is crucial to handle these missing values appropriately to different bacterial types, may have affected the results. Finally, the complexity ensure accurate outcomes in subsequent statistical analyses and machine of sepsis and the limited range of proteins analyzed suggest that broader learning algorithms. In this study, we aimed to determine the optimal number proteomic approaches such as mass spectrometry might improve classification of missing values per protein and identify the most effective algorithm for our accuracy. data. To manage missing data, we used the GSimp imputation method, which estimates missing values based on the distribution of detected values above The selection of biomarkers can depend on the specifics of the study and the LOD (134). We created ten imputed datasets with varying levels of missing biological context. Zhang et al. (2017) (148) employed an RFE-LR model to data and compared the performance of three widely-used feature selection assess 49 blood biomarkers, including leukocytes and cytokines like IFNγ, in algorithms in identifying biological markers for discriminating patients with patients with peritonitis. Their model achieved an AUC of 0.993 for sepsis: RF (135-139), Lasso (140-142), and RFE (143-146). distinguishing gram-negative infections with eight biomarkers and 0.711 for gram-positive infections with five biomarkers. Notably, IFN-γ was among the Our study found that proteins with less than 40% missing values provided the 55 proteins identified in our study. Zhang’s research aimed to differentiate most accurate predictions, resulting in 285 relevant proteins for analysis. In between Streptococcal and Staphylococcal bacteria and gram-negative strains, contrast, both stringent filtration (<5%) and excessively permissive inclusion indicating the potential for species-specific biomarker profiles. However, our (<80%) significantly impaired prediction accuracy. This observation aligns study’s limited bacterial sample size restricted our ability to identify markers with the manufacturer’s recommended range of 25-50% missing values (147). unique to each bacterial species. Additionally, another study using ELISA and These findings offer valuable insights into balancing the retention of essential analyzing eight cytokines found elevated levels of IFN-γ, TNF-α, IL-1ra, and information with maintaining prediction accuracy. IL-10 in gram-negative infections (130), which were also identified in our research, assisting in the differentiation between gram-positive and gram- 50 51 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani negative infections. Variations in the findings may be attributed to differences Conversely, some proteins exhibited weak negative correlations, with PCC in methodologies, the number of biomarkers analyzed, and study designs. values between 0 and -0.5. Notably, SAA4 (n=34), CFHR5 (n=32), CNDP1 (n=32), GDF2 (n=29), and MBL2 (n=27) were the top five proteins exhibiting In our study, we aimed to assess the selection of predictive proteins made by the highest number of negative correlations. These negative correlations the Lasso algorithm in more detail. We evaluated the distribution and between the predictive proteins for gram-positive and gram-negative bacterial performance of these proteins using skewness and kurtosis as measures of infections suggest that fluctuations in their levels may reflect bacterial type. symmetry and peakedness, respectively. According to indicated criteria, a normal distribution includes skewness values from -2 to +2 and kurtosis from Our literature review revealed that Serum amyloid A-4 protein (SAA4) is a -7 to +7 (149, 150). In gram-positive bacterial infection patients, most proteins significant acute phase reactant and a potential biomarker for sepsis prognosis fell within these normal ranges, except for TNF, TNFRSF13B, and ADA, (151). Complement Factor H Related 5 (CFHR5) plays a role in complement which exhibited skewness (Paper III, Figure 6A) and kurtosis (Paper III, Figure regulation and, alongside other proteins, serves as a biosignature for 6B). For gram-negative bacterial infection patients, most proteins also tuberculosis infections (152). Carnosine dipeptide 1 (CNDP1) may predict remained within normal skewness and kurtosis ranges, except for MFAP5, mortality in S. aureus infections (153). Low serum levels of Mannose Binding TNFRSF13B, and CD8A which showed deviations (Paper III, Figures 6A and Lectin 2 (MBL2) are associated with higher mortality in severe pneumococcal 6B). We further investigated the impact of the five exception proteins— infections caused by Streptococcus pneumoniae (154-157). Growth MFAP5, TNF, TNFRSF13B, CD8A, and ADA—on the performance of the differentiation factor 2 (GDF2), also known as bone morphogenetic protein predictive set. The absence of these proteins led to a slight decrease in 9 (BMP9), is crucial for bone and cartilage development and angiogenesis, sensitivity and specificity for both gram-positive (AUC=0.61, 95% CI: 0.536- though it is not currently recognized as a sepsis biomarker (158-160). These 0.684) and gram-negative bacterial septic patients (AUC=0.66, 95% CI: 0.588- findings suggest that these five proteins could be potential biomarkers for 0.732) (Paper III, Figure 6C). However, this decrease was not statistically bacterial type identification in suspected patients of sepsis, though further significant, as the univariate analysis revealed that the levels of these proteins research is needed to fully understand their clinical relevance. were not markedly different between the two groups. Violin plots also indicated similar expression patterns, though fluctuations suggested a possible subpopulation effect in infection classification (Paper III, Figure 6D). Correlation analysis of the 55 predictive proteins using pairwise PCC in patients with gram-positive and gram-negative infections revealed a range of correlations (Paper III, Figure 7). Most proteins exhibited weak to moderate positive correlations (coefficients ranging from 0 to 0.5), suggesting shared expression patterns and potentially similar roles in bacterial response. This was further supported by network analysis using STRING and Cytoscape, which identified a core group of proteins—such as TNF, IL6, IL-1ra, IL10, IFN-γ, CD8A, CCL19, SELL, CCL17, CCL25, IL7R, CSF-1, and LEP—with extensive interactions and a significant PPI enrichment value (<1.0e-16), indicating a common expression pattern. Additionally, gene ontology analysis with the PANTHER database revealed that while 22 proteins did not map to specific biological processes, the remaining proteins were enriched in categories such as cellular processes, biological regulation, response to stimuli, signaling, immune system processes, and metabolic processes. Notably, 10 proteins showed significant enrichment in interspecies interaction. 52 53 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani negative infections. Variations in the findings may be attributed to differences Conversely, some proteins exhibited weak negative correlations, with PCC in methodologies, the number of biomarkers analyzed, and study designs. values between 0 and -0.5. Notably, SAA4 (n=34), CFHR5 (n=32), CNDP1 (n=32), GDF2 (n=29), and MBL2 (n=27) were the top five proteins exhibiting In our study, we aimed to assess the selection of predictive proteins made by the highest number of negative correlations. These negative correlations the Lasso algorithm in more detail. We evaluated the distribution and between the predictive proteins for gram-positive and gram-negative bacterial performance of these proteins using skewness and kurtosis as measures of infections suggest that fluctuations in their levels may reflect bacterial type. symmetry and peakedness, respectively. According to indicated criteria, a normal distribution includes skewness values from -2 to +2 and kurtosis from Our literature review revealed that Serum amyloid A-4 protein (SAA4) is a -7 to +7 (149, 150). In gram-positive bacterial infection patients, most proteins significant acute phase reactant and a potential biomarker for sepsis prognosis fell within these normal ranges, except for TNF, TNFRSF13B, and ADA, (151). Complement Factor H Related 5 (CFHR5) plays a role in complement which exhibited skewness (Paper III, Figure 6A) and kurtosis (Paper III, Figure regulation and, alongside other proteins, serves as a biosignature for 6B). For gram-negative bacterial infection patients, most proteins also tuberculosis infections (152). Carnosine dipeptide 1 (CNDP1) may predict remained within normal skewness and kurtosis ranges, except for MFAP5, mortality in S. aureus infections (153). Low serum levels of Mannose Binding TNFRSF13B, and CD8A which showed deviations (Paper III, Figures 6A and Lectin 2 (MBL2) are associated with higher mortality in severe pneumococcal 6B). We further investigated the impact of the five exception proteins— infections caused by Streptococcus pneumoniae (154-157). Growth MFAP5, TNF, TNFRSF13B, CD8A, and ADA—on the performance of the differentiation factor 2 (GDF2), also known as bone morphogenetic protein predictive set. The absence of these proteins led to a slight decrease in 9 (BMP9), is crucial for bone and cartilage development and angiogenesis, sensitivity and specificity for both gram-positive (AUC=0.61, 95% CI: 0.536- though it is not currently recognized as a sepsis biomarker (158-160). These 0.684) and gram-negative bacterial septic patients (AUC=0.66, 95% CI: 0.588- findings suggest that these five proteins could be potential biomarkers for 0.732) (Paper III, Figure 6C). However, this decrease was not statistically bacterial type identification in suspected patients of sepsis, though further significant, as the univariate analysis revealed that the levels of these proteins research is needed to fully understand their clinical relevance. were not markedly different between the two groups. Violin plots also indicated similar expression patterns, though fluctuations suggested a possible subpopulation effect in infection classification (Paper III, Figure 6D). Correlation analysis of the 55 predictive proteins using pairwise PCC in patients with gram-positive and gram-negative infections revealed a range of correlations (Paper III, Figure 7). Most proteins exhibited weak to moderate positive correlations (coefficients ranging from 0 to 0.5), suggesting shared expression patterns and potentially similar roles in bacterial response. This was further supported by network analysis using STRING and Cytoscape, which identified a core group of proteins—such as TNF, IL6, IL-1ra, IL10, IFN-γ, CD8A, CCL19, SELL, CCL17, CCL25, IL7R, CSF-1, and LEP—with extensive interactions and a significant PPI enrichment value (<1.0e-16), indicating a common expression pattern. Additionally, gene ontology analysis with the PANTHER database revealed that while 22 proteins did not map to specific biological processes, the remaining proteins were enriched in categories such as cellular processes, biological regulation, response to stimuli, signaling, immune system processes, and metabolic processes. Notably, 10 proteins showed significant enrichment in interspecies interaction. 52 53 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 4.4 PAPER IV- TRANSCRIPTOMIC MARKERS (Paper IV, Figure 2A & B). Filtering these three genes resulted in less visibility of subpopulations within each group, with the PCA analysis suggesting that EIF1AY significantly influenced this change (Paper IV, Figure 2D). This gene appears to be expressed in the female group with E. coli-induced sepsis, In our previous study, we suggested that the host response to gram-negative suggesting a potential association between the gene’s expression and gender- and gram-positive bacterial infections in adults suspected of sepsis might be specific responses to E. coli-induced sepsis, which warrants further species-specific. However, the limited sample sizes for unique species investigation. restricted our ability to explore this aspect comprehensively. We also proposed that analyzing whole blood biomarkers, rather than focusing solely on specific In our study, we aimed to perform a two-group classification analysis to biomarkers, could provide a more accurate model for understanding sepsis. distinguish between E. coli-induced sepsis and S. aureus-induced sepsis Additionally, concentrating exclusively on patients with confirmed sepsis— using the 25 predictive genes. The initial analysis yielded an AUC of 0.75, rather than those merely suspected of having the condition—helps to eliminate indicating an acceptable ability to distinguish between these two sepsis groups. the confounding effects associated with a mixed population of sepsis and non- However, after excluding three differentially expressed genes, the AUC sepsis patients. improved to 0.89, significantly enhancing the model’s discriminatory power. This improved AUC is comparable to the previously reported AUC of 0.8503 Binding on these observations and suggestions, this study aimed to identify achieved with the Bayesian sparse factor classifier (58). Additionally, biomarkers that could differentiate between E. coli-induced sepsis, S. aureus- hierarchical heatmap clustering analysis revealed two distinct expression induced sepsis, and healthy individuals by examining the transcriptional patterns associated with E. coli-induced and S. aureus-induced sepsis, response in adults. We utilized gene expression profiles acquired through highlighting a clear separation between the responses to these pathogens. microarray technology and employed the power of the Lasso regression model Complementary PCC analysis showed weak inter-gene correlations, with most for analysis. values falling within the low positive (0 to 0.25) and low negative (-0.25 to 0) ranges, which aligns with the high predictive performance of the gene set and Our study found that 25 predictive genes from a pool of 22,277 genes suggests its effectiveness in distinguishing between the two types of sepsis. effectively distinguished E. coli- or S. aureus-induced sepsis or healthy controls. The model achieved a predictive accuracy of 80% with an MSE of Imbalanced data can lead to biased algorithm performance, favoring the 0.20. The evaluation of the performance of the 25 genes using LR and AUC majority class and compromising the accuracy and generalizability of the yielded an AUC of 0.96 for distinguishing E. coli-induced sepsis, an AUC of model (161). Our analysis identified an underrepresentation of old males and 0.98 for S. aureus-induced sepsis, and a perfect AUC of 1.0 for differentiating uneven distribution across groups, including E. coli-induced sepsis, S. aureus- healthy controls from the other cases (Paper IV, Figure 1A). These findings induced sepsis, and healthy controls. To address this issue, we implemented an align with those of Ahn et al. (58), who reported high AUC values for upsampling technique and conducted a stability analysis. By applying a multi- discriminating sepsis from healthy controls, with AUCs of 0.92 for E. coli- stage upsampling strategy, we increased the sample size from 94 to 151 and induced sepsis and 0.9898 for S. aureus-induced sepsis. then to 228 while maintaining 22,277 genes. This approach allowed us to train and test our predictive genes using age-gender balanced samples and equal Further, unsupervised clustering analysis using a PCA plot revealed distinct sample sizes across all groups. As a result, model performance improved separation between the healthy control group and the infection-induced groups, significantly, with PCA plots showing enhanced separation among the groups though there was partial overlap between the E. coli- and S. aureus-induced and a perfect AUC of 1 across all groups. These findings demonstrate the sepsis groups, indicating similarities in gene expression or variability within model’s robustness across balanced datasets. Interestingly, after gender each group. PCA analysis also suggested the presence of subpopulations within balancing, the initial subpopulation identified by the PCA plot was softened, these groups (Paper IV, Figure 1B). Additional assessment of the 25 predictive providing valuable insights into the gender-related effects on susceptibility to genes using skewness and kurtosis metrics (with normal ranges defined as ±2 for skewness and ±7 for kurtosis; Hair et al., 2010; Byrne, 2010) alongside the or development of sepsis. Mann-Whitney U test identified three differentially expressed genes— EIF1AY, APOBEC3B, and GUSBP3—in the E. coli-induced sepsis group 54 55 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 4.4 PAPER IV- TRANSCRIPTOMIC MARKERS (Paper IV, Figure 2A & B). Filtering these three genes resulted in less visibility of subpopulations within each group, with the PCA analysis suggesting that EIF1AY significantly influenced this change (Paper IV, Figure 2D). This gene appears to be expressed in the female group with E. coli-induced sepsis, In our previous study, we suggested that the host response to gram-negative suggesting a potential association between the gene’s expression and gender- and gram-positive bacterial infections in adults suspected of sepsis might be specific responses to E. coli-induced sepsis, which warrants further species-specific. However, the limited sample sizes for unique species investigation. restricted our ability to explore this aspect comprehensively. We also proposed that analyzing whole blood biomarkers, rather than focusing solely on specific In our study, we aimed to perform a two-group classification analysis to biomarkers, could provide a more accurate model for understanding sepsis. distinguish between E. coli-induced sepsis and S. aureus-induced sepsis Additionally, concentrating exclusively on patients with confirmed sepsis— using the 25 predictive genes. The initial analysis yielded an AUC of 0.75, rather than those merely suspected of having the condition—helps to eliminate indicating an acceptable ability to distinguish between these two sepsis groups. the confounding effects associated with a mixed population of sepsis and non- However, after excluding three differentially expressed genes, the AUC sepsis patients. improved to 0.89, significantly enhancing the model’s discriminatory power. This improved AUC is comparable to the previously reported AUC of 0.8503 Binding on these observations and suggestions, this study aimed to identify achieved with the Bayesian sparse factor classifier (58). Additionally, biomarkers that could differentiate between E. coli-induced sepsis, S. aureus- hierarchical heatmap clustering analysis revealed two distinct expression induced sepsis, and healthy individuals by examining the transcriptional patterns associated with E. coli-induced and S. aureus-induced sepsis, response in adults. We utilized gene expression profiles acquired through highlighting a clear separation between the responses to these pathogens. microarray technology and employed the power of the Lasso regression model Complementary PCC analysis showed weak inter-gene correlations, with most for analysis. values falling within the low positive (0 to 0.25) and low negative (-0.25 to 0) ranges, which aligns with the high predictive performance of the gene set and Our study found that 25 predictive genes from a pool of 22,277 genes suggests its effectiveness in distinguishing between the two types of sepsis. effectively distinguished E. coli- or S. aureus-induced sepsis or healthy controls. The model achieved a predictive accuracy of 80% with an MSE of Imbalanced data can lead to biased algorithm performance, favoring the 0.20. The evaluation of the performance of the 25 genes using LR and AUC majority class and compromising the accuracy and generalizability of the yielded an AUC of 0.96 for distinguishing E. coli-induced sepsis, an AUC of model (161). Our analysis identified an underrepresentation of old males and 0.98 for S. aureus-induced sepsis, and a perfect AUC of 1.0 for differentiating uneven distribution across groups, including E. coli-induced sepsis, S. aureus- healthy controls from the other cases (Paper IV, Figure 1A). These findings induced sepsis, and healthy controls. To address this issue, we implemented an align with those of Ahn et al. (58), who reported high AUC values for upsampling technique and conducted a stability analysis. By applying a multi- discriminating sepsis from healthy controls, with AUCs of 0.92 for E. coli- stage upsampling strategy, we increased the sample size from 94 to 151 and induced sepsis and 0.9898 for S. aureus-induced sepsis. then to 228 while maintaining 22,277 genes. This approach allowed us to train and test our predictive genes using age-gender balanced samples and equal Further, unsupervised clustering analysis using a PCA plot revealed distinct sample sizes across all groups. As a result, model performance improved separation between the healthy control group and the infection-induced groups, significantly, with PCA plots showing enhanced separation among the groups though there was partial overlap between the E. coli- and S. aureus-induced and a perfect AUC of 1 across all groups. These findings demonstrate the sepsis groups, indicating similarities in gene expression or variability within model’s robustness across balanced datasets. Interestingly, after gender each group. PCA analysis also suggested the presence of subpopulations within balancing, the initial subpopulation identified by the PCA plot was softened, these groups (Paper IV, Figure 1B). Additional assessment of the 25 predictive providing valuable insights into the gender-related effects on susceptibility to genes using skewness and kurtosis metrics (with normal ranges defined as ±2 for skewness and ±7 for kurtosis; Hair et al., 2010; Byrne, 2010) alongside the or development of sepsis. Mann-Whitney U test identified three differentially expressed genes— EIF1AY, APOBEC3B, and GUSBP3—in the E. coli-induced sepsis group 54 55 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani We further evaluated the 25-gene model on two independent datasets to assess immune system responses and cytokine signaling possibly explaining the its generalizability and reliability. The first, GSE13015, comprised whole partial overlap in the PCA plot, while distinct pathways may reflect unique blood samples from adults with sepsis induced by E. coli or S. aureus, as well aspects or variations of each infection-induced sepsis group. Our findings as healthy controls. The model demonstrated robust performance, achieving support the idea that the body’s response to sepsis involves both a general AUC values of 0.79 for E. coli, 0.72 for S. aureus, and 0.87 for healthy immune response and a specific response to each bacterial group, as suggested controls, affirming its reliability in predicting sepsis-related conditions. In the by other researchers (162). In summary, exploring blood transcriptional second dataset, GSE65088, which represents a pre-sepsis stage (bacteremia) markers can aid in distinguishing between patients with E. coli and S. aureus- and was used to validate the model’s predictive ability to differentiate between induced sepsis. Nevertheless, to determine clinical significance, it is essential E. coli and S. aureus infections, the model achieved an AUC of 0.62. Despite to conduct studies with larger cohorts and utilize more sophisticated differences in gene availability (23 in the first dataset and 24 in the second), algorithms. sample sizes (smaller in the validation datasets compared to the training set), and dataset homogeneity, the model’s performance remained impressive. In 2019, Chen et al. (162) aimed to investigate transcriptional biomarkers in E. coli-, and S. aureus-induced sepsis patients in a cross-sectional study. Nevertheless, their analysis specifically targeted the nine genes that consistently appeared in all datasets. Particularly, within this subset, LILRA5 and TNFAIP6 were of significant attention due to their inclusion in the list of 25 genes identified in our study. LILRA5 is recognized as a leukocyte immunoglobulin-like receptor and assumes an important role in modulating immune responses by engaging with various ligands. It has been proposed to possess functions in recognizing both viral and bacterial pathogens and in leading the inflammatory response, as supported by previous research (163- 165). Similarly, TNFAIP6, denoted as Tumor Necrosis Factor Alpha-Inducible Protein 6, has exhibited its involvement in inflammation and immunity. Studies have demonstrated that this protein, by affecting the production of pro- inflammatory cytokines and chemokines, exerts regulatory control over immune cell activities, including macrophages (166). To further explore gene associations, we conducted PPI and pathway analysis. The PPI network analysis using STRING identified a sparse network with 22 nodes and 6 edges but highlighted significant interactions among 8 proteins (IFIT1, IFI27, GBP1, FCGR1A, EIF1AY, DDX3Y, HIST1H1T (H1-6), and HIST1H1AD (H2AC7)) supported by a PPI enrichment p-value (< 0.0187) (Paper IV, Figure 5A), suggesting functional or physical connections among these proteins. Furthermore, the MCODE plug-in of Cytoscape clustered densely connected proteins, revealing a cluster containing IFI27, IFIT1, GBP1, and FCGR1A (Paper IV, Figure 5B), with functional associations also observed between EIF1AY and DDX3Y, as well as HIST1H1T and HIST1H1AD. However, other proteins did not exhibit associations, suggesting distinct pathways triggered by genes related to each bacterium. The pathway analysis revealed both shared and unique pathways, with shared pathways of 56 57 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani We further evaluated the 25-gene model on two independent datasets to assess immune system responses and cytokine signaling possibly explaining the its generalizability and reliability. The first, GSE13015, comprised whole partial overlap in the PCA plot, while distinct pathways may reflect unique blood samples from adults with sepsis induced by E. coli or S. aureus, as well aspects or variations of each infection-induced sepsis group. Our findings as healthy controls. The model demonstrated robust performance, achieving support the idea that the body’s response to sepsis involves both a general AUC values of 0.79 for E. coli, 0.72 for S. aureus, and 0.87 for healthy immune response and a specific response to each bacterial group, as suggested controls, affirming its reliability in predicting sepsis-related conditions. In the by other researchers (162). In summary, exploring blood transcriptional second dataset, GSE65088, which represents a pre-sepsis stage (bacteremia) markers can aid in distinguishing between patients with E. coli and S. aureus- and was used to validate the model’s predictive ability to differentiate between induced sepsis. Nevertheless, to determine clinical significance, it is essential E. coli and S. aureus infections, the model achieved an AUC of 0.62. Despite to conduct studies with larger cohorts and utilize more sophisticated differences in gene availability (23 in the first dataset and 24 in the second), algorithms. sample sizes (smaller in the validation datasets compared to the training set), and dataset homogeneity, the model’s performance remained impressive. In 2019, Chen et al. (162) aimed to investigate transcriptional biomarkers in E. coli-, and S. aureus-induced sepsis patients in a cross-sectional study. Nevertheless, their analysis specifically targeted the nine genes that consistently appeared in all datasets. Particularly, within this subset, LILRA5 and TNFAIP6 were of significant attention due to their inclusion in the list of 25 genes identified in our study. LILRA5 is recognized as a leukocyte immunoglobulin-like receptor and assumes an important role in modulating immune responses by engaging with various ligands. It has been proposed to possess functions in recognizing both viral and bacterial pathogens and in leading the inflammatory response, as supported by previous research (163- 165). Similarly, TNFAIP6, denoted as Tumor Necrosis Factor Alpha-Inducible Protein 6, has exhibited its involvement in inflammation and immunity. Studies have demonstrated that this protein, by affecting the production of pro- inflammatory cytokines and chemokines, exerts regulatory control over immune cell activities, including macrophages (166). To further explore gene associations, we conducted PPI and pathway analysis. The PPI network analysis using STRING identified a sparse network with 22 nodes and 6 edges but highlighted significant interactions among 8 proteins (IFIT1, IFI27, GBP1, FCGR1A, EIF1AY, DDX3Y, HIST1H1T (H1-6), and HIST1H1AD (H2AC7)) supported by a PPI enrichment p-value (< 0.0187) (Paper IV, Figure 5A), suggesting functional or physical connections among these proteins. Furthermore, the MCODE plug-in of Cytoscape clustered densely connected proteins, revealing a cluster containing IFI27, IFIT1, GBP1, and FCGR1A (Paper IV, Figure 5B), with functional associations also observed between EIF1AY and DDX3Y, as well as HIST1H1T and HIST1H1AD. However, other proteins did not exhibit associations, suggesting distinct pathways triggered by genes related to each bacterium. The pathway analysis revealed both shared and unique pathways, with shared pathways of 56 57 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 5 CONCLUSION 6 FUTURE PERSPECTIVES Our research presents comprehensive findings across several key areas of Based on our findings, we plan to pursue several key areas of research. We aim sepsis diagnostics and treatment, each with its implications for clinical to further validate the reliability of genotypic antibiotic susceptibility testing practice. for identifying resistance in other types of bacteria causing sepsis by analyzing whole genome sequences. Additionally, we propose developing standardized Paper I demonstrates the potential of genotypic AST as a reliable tool for WGS workflows to ensure consistent and reliable results in clinical identifying antibiotic resistance. Despite this, the discrepancies observed diagnostics. Achieving this will involve collaborating with researchers and emphasize the need for careful validation and interpretation of bioinformatics clinicians to establish best practices and guidelines. results, particularly for critical antibiotics. Our study underscores the necessity of standardizing WGS workflows to ensure consistent and reliable results. Future studies will focus on identifying risk factors and potential transmission sources for antibiotic-resistant S. aureus strains across various settings, Paper II emphasizes the prevalence of S. aureus and antibiotic-resistant strains including healthcare facilities, community environments, and livestock. We among elderly individuals (>70 years), revealing a gender effect. The study observed that females contract S. aureus strains at a higher average age highlights swab samples as a major reservoir for S. aureus, and nasal carriage compared to males, prompting further investigation into potential underlying being a notable risk factor for infections. The persistence of resistance to factors such as hormonal changes or immune responses. Evaluating the certain antibiotics emphasizes the need for ongoing surveillance and adaptation effectiveness of current infection prevention and control measures in of treatment protocols. healthcare and community settings will also be a priority, including practices like hand hygiene and environmental cleaning. Paper III identifies potential predictive proteins that could differentiate between gram-positive and gram-negative infections, with five candidate In proteome analysis, we will employ techniques such as mass spectrometry or biomarkers emerging from fifty-five proteins studied. This research suggests explore additional protein panels to identify more biomarkers for that linear approaches can be valuable for mining complex biomedical datasets. distinguishing bacterial sepsis. The five candidate biomarkers identified, However, the study’s limitations, including a small sample size and a focus on including a newly discovered putative biomarker, could be further validated specific protein panels, indicate the need for further validation and exploration through experimental studies to confirm their diagnostic potential and of additional biomarkers. understand their role in differentiating between gram-positive and gram- negative bacterial infections. Paper IV supports the dual-level response model of sepsis, comprising a general immune response and a more specific reaction to different bacterial We will also expand our research to assess the applicability of our 25-gene groups. The study also suggests that gender may influence the etiology of model in real-world clinical settings and plan to use the same approach to test sepsis, potentially affecting the type of bacteria responsible for the infection. for other types of bacteria that induce sepsis. In parallel, we will investigate The application of machine learning techniques in identifying predictive genes the dual-level immune response to sepsis, examining both the general immune demonstrates their potential for advancing sepsis diagnostics, with a thorough response and the specific reactions to different bacterial groups. Lastly, we evaluation of genes and metrics needed to refine the models further. plan to explore the role of gender in sepsis etiology, focusing on how gender interacts with bacterial pathogens responsible for sepsis infections. Overall, these studies collectively advance our understanding of sepsis diagnostics and treatment, highlighting the importance of continued research and methodological refinement to improve clinical outcomes. 58 59 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 5 CONCLUSION 6 FUTURE PERSPECTIVES Our research presents comprehensive findings across several key areas of Based on our findings, we plan to pursue several key areas of research. We aim sepsis diagnostics and treatment, each with its implications for clinical to further validate the reliability of genotypic antibiotic susceptibility testing practice. for identifying resistance in other types of bacteria causing sepsis by analyzing whole genome sequences. Additionally, we propose developing standardized Paper I demonstrates the potential of genotypic AST as a reliable tool for WGS workflows to ensure consistent and reliable results in clinical identifying antibiotic resistance. Despite this, the discrepancies observed diagnostics. Achieving this will involve collaborating with researchers and emphasize the need for careful validation and interpretation of bioinformatics clinicians to establish best practices and guidelines. results, particularly for critical antibiotics. Our study underscores the necessity of standardizing WGS workflows to ensure consistent and reliable results. Future studies will focus on identifying risk factors and potential transmission sources for antibiotic-resistant S. aureus strains across various settings, Paper II emphasizes the prevalence of S. aureus and antibiotic-resistant strains including healthcare facilities, community environments, and livestock. We among elderly individuals (>70 years), revealing a gender effect. The study observed that females contract S. aureus strains at a higher average age highlights swab samples as a major reservoir for S. aureus, and nasal carriage compared to males, prompting further investigation into potential underlying being a notable risk factor for infections. The persistence of resistance to factors such as hormonal changes or immune responses. Evaluating the certain antibiotics emphasizes the need for ongoing surveillance and adaptation effectiveness of current infection prevention and control measures in of treatment protocols. healthcare and community settings will also be a priority, including practices like hand hygiene and environmental cleaning. Paper III identifies potential predictive proteins that could differentiate between gram-positive and gram-negative infections, with five candidate In proteome analysis, we will employ techniques such as mass spectrometry or biomarkers emerging from fifty-five proteins studied. This research suggests explore additional protein panels to identify more biomarkers for that linear approaches can be valuable for mining complex biomedical datasets. distinguishing bacterial sepsis. The five candidate biomarkers identified, However, the study’s limitations, including a small sample size and a focus on including a newly discovered putative biomarker, could be further validated specific protein panels, indicate the need for further validation and exploration through experimental studies to confirm their diagnostic potential and of additional biomarkers. understand their role in differentiating between gram-positive and gram- negative bacterial infections. Paper IV supports the dual-level response model of sepsis, comprising a general immune response and a more specific reaction to different bacterial We will also expand our research to assess the applicability of our 25-gene groups. The study also suggests that gender may influence the etiology of model in real-world clinical settings and plan to use the same approach to test sepsis, potentially affecting the type of bacteria responsible for the infection. for other types of bacteria that induce sepsis. In parallel, we will investigate The application of machine learning techniques in identifying predictive genes the dual-level immune response to sepsis, examining both the general immune demonstrates their potential for advancing sepsis diagnostics, with a thorough response and the specific reactions to different bacterial groups. Lastly, we evaluation of genes and metrics needed to refine the models further. plan to explore the role of gender in sepsis etiology, focusing on how gender interacts with bacterial pathogens responsible for sepsis infections. Overall, these studies collectively advance our understanding of sepsis diagnostics and treatment, highlighting the importance of continued research and methodological refinement to improve clinical outcomes. 58 59 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani ACKNOWLEDGEMENT I would like to express my deepest gratitude to my family, especially my mom, Iran. You have been not only my mother but also my friend and my guiding light. I am grateful for your endless patience with my complaints and bad temper. I’m truly sorry for burdening you with my stress, and for the anxiety I I would like to sincerely thank everyone who helped and supported me. A lot caused you when my health struggles took over. Mom, if it weren’t for your of people have contributed to the work in this thesis. encouragement, I would have given up long ago. Last but not least, I want to thank my dear friends—Atousa, Azar, Beny, First and foremost, I want to thank myself—both my body and soul—for the Daniel, Elnaz, Javad, Marjan, Mikael, Roghi, and Shahrum—for your patience, resilience, and strength to endure the journey of this thesis. I am unwavering moral support throughout my journey. We laughed and cried deeply grateful for the perseverance through the long hours, countless together, sharing both the highs and the lows. Your encouragement and revisions, and the challenges that arose along the way. I also owe an apology understanding have been truly invaluable. to my body and soul for the strain and injuries caused throughout this process. Your endurance through the exhaustion and stress is something I truly To everyone else who has supported me along the way, even if your name isn’t appreciate, and I promise to take better care of you in the future. mentioned here, please know that I am deeply grateful. If I’ve missed anyone, I sincerely apologize—your presence in my life has meant so much. I am profoundly grateful to my supervisor, Astrid von Mentzer (University of Gothenburg), and my co-supervisors Anders Ståhlberg (University of Images in the thesis have been created with BioRender.com Gothenburg), Ka-Wei Tang (University of Gothenburg), Mikael Ejdebäk (University of Skövde). Special thanks to my co-supervisor Erik Kristiansson (Chalmers University); for his exceptional guidance and unwavering support throughout this journey. Without his dedication and commitment to my academic growth, this PhD would not have progressed as it did. I am deeply grateful for his contributions to my work and for helping me overcome challenges along the way. I would like to extend my gratitude to Andreas Tilevik (University of Skövde) for his technical support, which was crucial in developing the workflow for the WGS paper. I also wish to thank Diana Tilevik (University of Skövde), Anna- Karin Pernestig (University of Skövde), and Helena Enroth (previously at Unilab and University of Skövde) for their work in designing the WGS and Proteomic projects. To my colleagues and staff at the Division of Biology and Bioinformatics; University of Skövde, the Institute of Biomedicine; University of Gothenburg, and the Umeå Plant Science Centre, especially John Baxter, Anne Uv, Erik Lekholm, Peter Kindgren and Nicolas Delhomme, thank you for your support. I am also grateful to 1928 Diagnostics AB, TATAA Biocenter AB, Olink Proteomics AB, and SciLifeLab for providing technologies for the WGS and Proteomics projects. 60 61 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani ACKNOWLEDGEMENT I would like to express my deepest gratitude to my family, especially my mom, Iran. You have been not only my mother but also my friend and my guiding light. I am grateful for your endless patience with my complaints and bad temper. I’m truly sorry for burdening you with my stress, and for the anxiety I I would like to sincerely thank everyone who helped and supported me. A lot caused you when my health struggles took over. Mom, if it weren’t for your of people have contributed to the work in this thesis. encouragement, I would have given up long ago. Last but not least, I want to thank my dear friends—Atousa, Azar, Beny, First and foremost, I want to thank myself—both my body and soul—for the Daniel, Elnaz, Javad, Marjan, Mikael, Roghi, and Shahrum—for your patience, resilience, and strength to endure the journey of this thesis. I am unwavering moral support throughout my journey. We laughed and cried deeply grateful for the perseverance through the long hours, countless together, sharing both the highs and the lows. Your encouragement and revisions, and the challenges that arose along the way. I also owe an apology understanding have been truly invaluable. to my body and soul for the strain and injuries caused throughout this process. Your endurance through the exhaustion and stress is something I truly To everyone else who has supported me along the way, even if your name isn’t appreciate, and I promise to take better care of you in the future. mentioned here, please know that I am deeply grateful. If I’ve missed anyone, I sincerely apologize—your presence in my life has meant so much. I am profoundly grateful to my supervisor, Astrid von Mentzer (University of Gothenburg), and my co-supervisors Anders Ståhlberg (University of Images in the thesis have been created with BioRender.com Gothenburg), Ka-Wei Tang (University of Gothenburg), Mikael Ejdebäk (University of Skövde). Special thanks to my co-supervisor Erik Kristiansson (Chalmers University); for his exceptional guidance and unwavering support throughout this journey. Without his dedication and commitment to my academic growth, this PhD would not have progressed as it did. I am deeply grateful for his contributions to my work and for helping me overcome challenges along the way. I would like to extend my gratitude to Andreas Tilevik (University of Skövde) for his technical support, which was crucial in developing the workflow for the WGS paper. I also wish to thank Diana Tilevik (University of Skövde), Anna- Karin Pernestig (University of Skövde), and Helena Enroth (previously at Unilab and University of Skövde) for their work in designing the WGS and Proteomic projects. To my colleagues and staff at the Division of Biology and Bioinformatics; University of Skövde, the Institute of Biomedicine; University of Gothenburg, and the Umeå Plant Science Centre, especially John Baxter, Anne Uv, Erik Lekholm, Peter Kindgren and Nicolas Delhomme, thank you for your support. I am also grateful to 1928 Diagnostics AB, TATAA Biocenter AB, Olink Proteomics AB, and SciLifeLab for providing technologies for the WGS and Proteomics projects. 60 61 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani REFERENCES 15. Dolin HH, Papadimos TJ, Chen X, Pan ZK. Characterization of Pathogenic Sepsis Etiologies and Patient Profiles: A Novel Approach to Triage and Treatment. Microbiol Insights. 2019;12:1178636118825081. 1. Geroulanos S, Douka ET. Historical perspective of the word "sepsis". 16. World Health Organization. Improving the prevention, diagnosis and Intensive Care Med. 2006;32(12):2077. clinical management of sepsis. 2017 April 13. Report No.: A70/13. 2. Singh S, Evans TW. Organ dysfunction during sepsis. Intensive Care 17. Rhee C, Jones TM, Hamad Y, Pande A, Varon J, O’Brien C, et al. Med. 2006;32(3):349-60. Prevalence, Underlying Causes, and Preventability of Sepsis-Associated 3. Gyawali B, Ramakrishna K, Dhamoon AS. Sepsis: The evolution in Mortality in US Acute Care Hospitals. JAMA Network Open. definition, pathophysiology, and management. SAGE Open Med. 2019;2(2). 2019;7:2050312119835043. 18. Webb SA, Kahler CM. Bench-to-bedside review: Bacterial virulence 4. Yipp BG, Winston BW. Sepsis without SIRS is still sepsis. Ann Transl and subversion of host defences. Crit Care. 2008;12(6):234. Med. 2015;3(19):294. 19. Gabarin RS, Li M, Zimmel PA, Marshall JC, Li Y, Zhang H. 5. Gul F, Arslantas MK, Cinel I, Kumar A. Changing Definitions of Sepsis. Intracellular and Extracellular Lipopolysaccharide Signaling in Sepsis: Turk J Anaesthesiol Reanim. 2017;45(3):129-38. Avenues for Novel Therapeutic Strategies. J Innate Immun. 6. Singer M, Deutschman CS, Seymour CW, Shankar-Hari M, Annane D, 2021;13(6):323-32. Bauer M, et al. The Third International Consensus Definitions for Sepsis 20. Wang M, Feng J, Zhou D, Wang J. Bacterial lipopolysaccharide- and Septic Shock (Sepsis-3). JAMA. 2016;315(8):801-10. induced endothelial activation and dysfunction: a new predictive and 7. Ljungström L, SteInum O, Brink M, Gårdlund B, Martner J, Sjölin J. therapeutic paradigm for sepsis. Eur J Med Res. 2023;28(1):339. Diagnostik och diagnoskodning av svår sepsis och septisk chock. ICD10 21. Dinges M, Orwin P, Schlievert P. Exotoxins of Staphylococcus aureus. bör kompletteras med tilläggskoder Läkartidning 2011. Clin Microbiol Rev. 2000;13(1):16-34. 8. Andersson M, Brink M, Cronqvist J, Furebring M, Gille-Johnson P, 22. Thomas D, Dauwalder O, Brun V, Badiou C, Ferry T, Etienne J, et al. Ljungström L, et al. Sepsis och septisk chock, tidig identifiering och Staphylococcus aureus superantigens elicit redundant and extensive initial handläggning. Svenska Infektionsläkarföreningen; 2018. human Vbeta patterns. Infect Immun. 2009;77(5):2043-50. 9. Rudd KE, Johnson SC, Agesa KM, Shackelford KA, Tsoi D, Kievlan 23. Atanasova KR. Interactions between porcine respiratory coronavirus DR, et al. Global, regional, and national sepsis incidence and mortality, and bacterial cell wall toxins in the lungs of pigs: Ghent University; 1990–2017: analysis for the Global Burden of Disease Study. The 2010. Lancet. 2020;395(10219):200-11. 24. Chang JC. Sepsis and septic shock: endothelial molecular pathogenesis 10. Fleischmann-Struzek C, Mellhammar L, Rose N, Cassini A, Rudd KE, associated with vascular microthrombotic disease. Thromb J. Schlattmann P, et al. Incidence and mortality of hospital- and ICU- 2019;17:10. treated sepsis: results from an updated and expanded systematic review 25. Nedeva C, Menassa J, Puthalakath H. Sepsis: Inflammation Is a and meta-analysis. Intensive Care Medicine. 2020;46(8):1552-62. Necessary Evil. Frontiers in Cell and Developmental Biology. 11. Seree-aphinan C, Vichitkunakorn P, Navakanitworakul R, Khwannimit 2019;7(108). B. Distinguishing Sepsis From Infection by Neutrophil Dysfunction: A 26. Huang M, Cai S, Su J. The Pathogenesis of Sepsis and Potential Promising Role of CXCR2 Surface Level. Frontiers in Immunology. Therapeutic Targets. Int J Mol Sci. 2019;20(21). 2020;11. 27. Spapen HD, Jacobs R, Honoré PM. Sepsis-induced multi-organ 12. Rhee C, Dantes R, Epstein L, Murphy D, Seymour C, Iwashyna T, et al. dysfunction syndrome—a mechanistic approach. Journal of Emergency Incidence and Trends of Sepsis in US Hospitals Using Clinical vs and Critical Care Medicine. 2017;1(10):27-. Claims Data, 2009-2014. JAMA. 2017;3;318(13):1241-9. 28. Sagy M, Al-Qaqaa Y, Kim P. Definitions and pathophysiology of sepsis. 13. Lengquist M, Lundberg OHM, Spångfors M, Annborn M, Levin H, Curr Probl Pediatr Adolesc Health Care. 2013;43(10):260-3. Friberg H, et al. Sepsis is underreported in Swedish intensive care units: 29. Chousterman BG, Swirski FK, Weber GF. Cytokine storm and sepsis A retrospective observational multicentre study. Acta Anaesthesiologica disease pathogenesis. Seminars in Immunopathology. 2017;39(5):517- Scandinavica. 2020;64(8):1167-76. 28. 14. Janeway CJ, Travers P, Walport M. The front line of host defense. 30. Ramachandran G. Gram-positive and gram-negative bacterial toxins in Immunobiology: The Immune System in Health and Disease. 5th edition sepsis: a brief review. Virulence. 2014;5(1):213-8. ed. New York: Garland Science; 2001. 62 63 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani REFERENCES 15. Dolin HH, Papadimos TJ, Chen X, Pan ZK. Characterization of Pathogenic Sepsis Etiologies and Patient Profiles: A Novel Approach to Triage and Treatment. Microbiol Insights. 2019;12:1178636118825081. 1. Geroulanos S, Douka ET. Historical perspective of the word "sepsis". 16. World Health Organization. Improving the prevention, diagnosis and Intensive Care Med. 2006;32(12):2077. clinical management of sepsis. 2017 April 13. Report No.: A70/13. 2. Singh S, Evans TW. Organ dysfunction during sepsis. Intensive Care 17. Rhee C, Jones TM, Hamad Y, Pande A, Varon J, O’Brien C, et al. Med. 2006;32(3):349-60. Prevalence, Underlying Causes, and Preventability of Sepsis-Associated 3. Gyawali B, Ramakrishna K, Dhamoon AS. Sepsis: The evolution in Mortality in US Acute Care Hospitals. JAMA Network Open. definition, pathophysiology, and management. SAGE Open Med. 2019;2(2). 2019;7:2050312119835043. 18. Webb SA, Kahler CM. Bench-to-bedside review: Bacterial virulence 4. Yipp BG, Winston BW. Sepsis without SIRS is still sepsis. Ann Transl and subversion of host defences. Crit Care. 2008;12(6):234. Med. 2015;3(19):294. 19. Gabarin RS, Li M, Zimmel PA, Marshall JC, Li Y, Zhang H. 5. Gul F, Arslantas MK, Cinel I, Kumar A. Changing Definitions of Sepsis. Intracellular and Extracellular Lipopolysaccharide Signaling in Sepsis: Turk J Anaesthesiol Reanim. 2017;45(3):129-38. Avenues for Novel Therapeutic Strategies. J Innate Immun. 6. Singer M, Deutschman CS, Seymour CW, Shankar-Hari M, Annane D, 2021;13(6):323-32. Bauer M, et al. The Third International Consensus Definitions for Sepsis 20. Wang M, Feng J, Zhou D, Wang J. Bacterial lipopolysaccharide- and Septic Shock (Sepsis-3). JAMA. 2016;315(8):801-10. induced endothelial activation and dysfunction: a new predictive and 7. Ljungström L, SteInum O, Brink M, Gårdlund B, Martner J, Sjölin J. therapeutic paradigm for sepsis. Eur J Med Res. 2023;28(1):339. Diagnostik och diagnoskodning av svår sepsis och septisk chock. ICD10 21. Dinges M, Orwin P, Schlievert P. Exotoxins of Staphylococcus aureus. bör kompletteras med tilläggskoder Läkartidning 2011. Clin Microbiol Rev. 2000;13(1):16-34. 8. Andersson M, Brink M, Cronqvist J, Furebring M, Gille-Johnson P, 22. Thomas D, Dauwalder O, Brun V, Badiou C, Ferry T, Etienne J, et al. Ljungström L, et al. Sepsis och septisk chock, tidig identifiering och Staphylococcus aureus superantigens elicit redundant and extensive initial handläggning. Svenska Infektionsläkarföreningen; 2018. human Vbeta patterns. Infect Immun. 2009;77(5):2043-50. 9. Rudd KE, Johnson SC, Agesa KM, Shackelford KA, Tsoi D, Kievlan 23. Atanasova KR. Interactions between porcine respiratory coronavirus DR, et al. Global, regional, and national sepsis incidence and mortality, and bacterial cell wall toxins in the lungs of pigs: Ghent University; 1990–2017: analysis for the Global Burden of Disease Study. The 2010. Lancet. 2020;395(10219):200-11. 24. Chang JC. Sepsis and septic shock: endothelial molecular pathogenesis 10. Fleischmann-Struzek C, Mellhammar L, Rose N, Cassini A, Rudd KE, associated with vascular microthrombotic disease. Thromb J. Schlattmann P, et al. Incidence and mortality of hospital- and ICU- 2019;17:10. treated sepsis: results from an updated and expanded systematic review 25. Nedeva C, Menassa J, Puthalakath H. Sepsis: Inflammation Is a and meta-analysis. Intensive Care Medicine. 2020;46(8):1552-62. Necessary Evil. Frontiers in Cell and Developmental Biology. 11. Seree-aphinan C, Vichitkunakorn P, Navakanitworakul R, Khwannimit 2019;7(108). B. Distinguishing Sepsis From Infection by Neutrophil Dysfunction: A 26. Huang M, Cai S, Su J. The Pathogenesis of Sepsis and Potential Promising Role of CXCR2 Surface Level. Frontiers in Immunology. Therapeutic Targets. Int J Mol Sci. 2019;20(21). 2020;11. 27. Spapen HD, Jacobs R, Honoré PM. Sepsis-induced multi-organ 12. Rhee C, Dantes R, Epstein L, Murphy D, Seymour C, Iwashyna T, et al. dysfunction syndrome—a mechanistic approach. Journal of Emergency Incidence and Trends of Sepsis in US Hospitals Using Clinical vs and Critical Care Medicine. 2017;1(10):27-. Claims Data, 2009-2014. JAMA. 2017;3;318(13):1241-9. 28. Sagy M, Al-Qaqaa Y, Kim P. Definitions and pathophysiology of sepsis. 13. Lengquist M, Lundberg OHM, Spångfors M, Annborn M, Levin H, Curr Probl Pediatr Adolesc Health Care. 2013;43(10):260-3. Friberg H, et al. Sepsis is underreported in Swedish intensive care units: 29. Chousterman BG, Swirski FK, Weber GF. Cytokine storm and sepsis A retrospective observational multicentre study. Acta Anaesthesiologica disease pathogenesis. Seminars in Immunopathology. 2017;39(5):517- Scandinavica. 2020;64(8):1167-76. 28. 14. Janeway CJ, Travers P, Walport M. The front line of host defense. 30. Ramachandran G. Gram-positive and gram-negative bacterial toxins in Immunobiology: The Immune System in Health and Disease. 5th edition sepsis: a brief review. Virulence. 2014;5(1):213-8. ed. New York: Garland Science; 2001. 62 63 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 31. Raetz CR, Whitfield C. Lipopolysaccharide endotoxins. Annu Rev 45. Bloos F, Reinhart K. Rapid diagnosis of sepsis. Virulence. Biochem. 2002;71:635-700. 2014;5(1):154-60. 32. Rello J, Valenzuela-Sanchez F, Ruiz-Rodriguez M, Moyano S. Sepsis: 46. Faix JD. Biomarkers of sepsis. Crit Rev Clin Lab Sci. 2013;50(1):23- A Review of Advances in Management. Adv Ther. 2017;34(11):2393- 36. 411. 47. Irani-Shemirani M. Biomarkers Approach in the Diagnosis and 33. Pop-Began V, Păunescu V, Grigorean V, Pop-Began D, Popescu C. Prognosis of Sepsis. International Journal of Public Health Research. Molecular Mechanism in the Pathogenesis of Sepsis. J Med Life. 2022;12:1617-24. 2014(2):4. 48. Lopez-Castejon G, Brough D. Understanding the mechanism of IL- 34. Yang L, Lin Y, Wang J, Song J, Wei B, Zhang X, et al. Comparison of 1beta secretion. Cytokine Growth Factor Rev. 2011;22(4):189-95. Clinical Characteristics and Outcomes Between Positive and Negative 49. Bozza FA, Salluh JI, Japiassu AM, Soares M, Assis EF, Gomes RN, et Blood Culture Septic Patients: A Retrospective Cohort Study. Infect al. Cytokine profiles as markers of disease severity in sepsis: a multiplex Drug Resist. 2021;14:4191-205. analysis. Crit Care. 2007;11(2):R49. 35. Previsdomini M, Gini M, Cerutti B, Dolina M, Perren A. Predictors of 50. Morrow KN, Coopersmith CM, Ford ML. IL-17, IL-27, and IL-33: A positive blood cultures in critically ill patients: a retrospective Novel Axis Linked to Immunological Dysfunction During Sepsis. Front evaluation. Croat Med J. 2012;53(1):30-9. Immunol. 2019;10:1982. 36. Luethy PM, Johnson JK. The Use of Matrix-Assisted Laser 51. Cao J, Xu F, Lin S, Song Z, Zhang L, Luo P, et al. IL-27 controls sepsis- Desorption/Ionization Time-of-Flight Mass Spectrometry (MALDI- induced impairment of lung antibacterial host defence. Thorax. TOF MS) for the Identification of Pathogens Causing Sepsis. J Appl Lab 2014;69(10):926-37. Med. 2019;3(4):675-85. 52. Wang Y, Zhao J, Yao Y, Zhao D, Liu S. Interleukin-27 as a Diagnostic 37. Peters RPH, van Agtmael MA, Danner SA, Savelkoul PHM, Biomarker for Patients with Sepsis: A Meta-Analysis. Biomed Res Int. Vandenbroucke-Grauls CMJE. New developments in the diagnosis of 2021;2021:5516940. bloodstream infections. The Lancet Infectious Diseases. 53. Wang JF, Yu ML, Yu G, Bian JJ, Deng XM, Wan XJ, et al. Serum miR- 2004;4(12):751-60. 146a and miR-223 as potential new biomarkers for sepsis. Biochem 38. Opota O, Jaton K, Greub G. Microbial diagnosis of bloodstream Biophys Res Commun. 2010;394(1):184-8. infection: towards molecular diagnosis directly from blood. Clin 54. Shen X, Zhang J, Huang Y, Tong J, Zhang L, Zhang Z, et al. Accuracy Microbiol Infect. 2015;21(4):323-31. of circulating microRNAs in diagnosis of sepsis: a systematic review 39. Lee T, Pang S, Stegger M, Sahibzada S, Abraham S, Daley D, et al. A and meta-analysis. J Intensive Care. 2020;8(1):84. three-year whole genome sequencing perspective of Enterococcus 55. Tagini F, Greub G. Bacterial genome sequencing in clinical faecium sepsis in Australia. PLoS One. 2020;15(2):e0228781. microbiology: a pathogen-oriented review. Eur J Clin Microbiol Infect 40. Taxt AM, Avershina E, Frye SA, Naseer U, Ahmad R. Rapid Dis. 2017;36(11):2007-20. identification of pathogens, antibiotic resistance genes and plasmids in 56. Zheng L, Lin F, Zhu C, Liu G, Wu X, Wu Z, et al. Machine Learning blood cultures by nanopore sequencing. Sci Rep. 2020;10(1):7622. Algorithms Identify Pathogen-Specific Biomarkers of Clinical and 41. Shaidullina E, Shelenkov A, Yanushevich Y, Mikhaylova Y, Shagin D, Metabolomic Characteristics in Septic Patients with Bacterial Alexandrova I, et al. Antimicrobial Resistance and Genomic Infections. Biomed Res Int. 2020;2020:6950576. Characterization of OXA-48- and CTX-M-15-Co-Producing 57. Ljungström L. Community onset sepsis in Sweden: a population based Hypervirulent Klebsiella pneumoniae ST23 Recovered from study Gothenburg, Sweden: Sahlgrenska Academy at University of Nosocomial Outbreak. Antibiotics (Basel). 2020;9(12). Gothenburg; 2017. 42. Rumore J, Tschetter L, Kearney A, Kandar R, McCormick R, Walker 58. Ahn SH, Tsalik EL, Cyr DD, Zhang Y, van Velkinburgh JC, Langley M, et al. Evaluation of whole-genome sequencing for outbreak detection RJ, et al. Gene expression-based classifiers identify Staphylococcus of Verotoxigenic Escherichia coli O157:H7 from the Canadian aureus infection in mice and humans. PLoS One. 2013;8(1):e48979. perspective. BMC Genomics. 2018;19(1):870. 59. Ljungstrom L, Enroth H, Claesson BE, Ovemyr I, Karlsson J, Froberg 43. Califf RM. Biomarker definitions and their applications. Exp Biol Med B, et al. Clinical evaluation of commercial nucleic acid amplification (Maywood). 2018;243(3):213-21. tests in patients with suspected sepsis. BMC Infect Dis. 2015;15:199. 44. Pierrakos C, Velissaris D, Bisdorff M, Marshall JC, Vincent JL. 60. Andrews S. FastQC: a quality control tool for high throughput sequence Biomarkers of sepsis: time for a reappraisal. Crit Care. 2020;24(1):287. data. 2010. 64 65 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 31. Raetz CR, Whitfield C. Lipopolysaccharide endotoxins. Annu Rev 45. Bloos F, Reinhart K. Rapid diagnosis of sepsis. Virulence. Biochem. 2002;71:635-700. 2014;5(1):154-60. 32. Rello J, Valenzuela-Sanchez F, Ruiz-Rodriguez M, Moyano S. Sepsis: 46. Faix JD. Biomarkers of sepsis. Crit Rev Clin Lab Sci. 2013;50(1):23- A Review of Advances in Management. Adv Ther. 2017;34(11):2393- 36. 411. 47. Irani-Shemirani M. Biomarkers Approach in the Diagnosis and 33. Pop-Began V, Păunescu V, Grigorean V, Pop-Began D, Popescu C. Prognosis of Sepsis. International Journal of Public Health Research. Molecular Mechanism in the Pathogenesis of Sepsis. J Med Life. 2022;12:1617-24. 2014(2):4. 48. Lopez-Castejon G, Brough D. Understanding the mechanism of IL- 34. Yang L, Lin Y, Wang J, Song J, Wei B, Zhang X, et al. Comparison of 1beta secretion. Cytokine Growth Factor Rev. 2011;22(4):189-95. Clinical Characteristics and Outcomes Between Positive and Negative 49. Bozza FA, Salluh JI, Japiassu AM, Soares M, Assis EF, Gomes RN, et Blood Culture Septic Patients: A Retrospective Cohort Study. Infect al. Cytokine profiles as markers of disease severity in sepsis: a multiplex Drug Resist. 2021;14:4191-205. analysis. Crit Care. 2007;11(2):R49. 35. Previsdomini M, Gini M, Cerutti B, Dolina M, Perren A. Predictors of 50. Morrow KN, Coopersmith CM, Ford ML. IL-17, IL-27, and IL-33: A positive blood cultures in critically ill patients: a retrospective Novel Axis Linked to Immunological Dysfunction During Sepsis. Front evaluation. Croat Med J. 2012;53(1):30-9. Immunol. 2019;10:1982. 36. Luethy PM, Johnson JK. The Use of Matrix-Assisted Laser 51. Cao J, Xu F, Lin S, Song Z, Zhang L, Luo P, et al. IL-27 controls sepsis- Desorption/Ionization Time-of-Flight Mass Spectrometry (MALDI- induced impairment of lung antibacterial host defence. Thorax. TOF MS) for the Identification of Pathogens Causing Sepsis. J Appl Lab 2014;69(10):926-37. Med. 2019;3(4):675-85. 52. Wang Y, Zhao J, Yao Y, Zhao D, Liu S. Interleukin-27 as a Diagnostic 37. Peters RPH, van Agtmael MA, Danner SA, Savelkoul PHM, Biomarker for Patients with Sepsis: A Meta-Analysis. Biomed Res Int. Vandenbroucke-Grauls CMJE. New developments in the diagnosis of 2021;2021:5516940. bloodstream infections. The Lancet Infectious Diseases. 53. Wang JF, Yu ML, Yu G, Bian JJ, Deng XM, Wan XJ, et al. Serum miR- 2004;4(12):751-60. 146a and miR-223 as potential new biomarkers for sepsis. Biochem 38. Opota O, Jaton K, Greub G. Microbial diagnosis of bloodstream Biophys Res Commun. 2010;394(1):184-8. infection: towards molecular diagnosis directly from blood. Clin 54. Shen X, Zhang J, Huang Y, Tong J, Zhang L, Zhang Z, et al. Accuracy Microbiol Infect. 2015;21(4):323-31. of circulating microRNAs in diagnosis of sepsis: a systematic review 39. Lee T, Pang S, Stegger M, Sahibzada S, Abraham S, Daley D, et al. A and meta-analysis. J Intensive Care. 2020;8(1):84. three-year whole genome sequencing perspective of Enterococcus 55. Tagini F, Greub G. Bacterial genome sequencing in clinical faecium sepsis in Australia. PLoS One. 2020;15(2):e0228781. microbiology: a pathogen-oriented review. Eur J Clin Microbiol Infect 40. Taxt AM, Avershina E, Frye SA, Naseer U, Ahmad R. Rapid Dis. 2017;36(11):2007-20. identification of pathogens, antibiotic resistance genes and plasmids in 56. Zheng L, Lin F, Zhu C, Liu G, Wu X, Wu Z, et al. Machine Learning blood cultures by nanopore sequencing. Sci Rep. 2020;10(1):7622. Algorithms Identify Pathogen-Specific Biomarkers of Clinical and 41. Shaidullina E, Shelenkov A, Yanushevich Y, Mikhaylova Y, Shagin D, Metabolomic Characteristics in Septic Patients with Bacterial Alexandrova I, et al. Antimicrobial Resistance and Genomic Infections. Biomed Res Int. 2020;2020:6950576. Characterization of OXA-48- and CTX-M-15-Co-Producing 57. Ljungström L. Community onset sepsis in Sweden: a population based Hypervirulent Klebsiella pneumoniae ST23 Recovered from study Gothenburg, Sweden: Sahlgrenska Academy at University of Nosocomial Outbreak. Antibiotics (Basel). 2020;9(12). Gothenburg; 2017. 42. Rumore J, Tschetter L, Kearney A, Kandar R, McCormick R, Walker 58. Ahn SH, Tsalik EL, Cyr DD, Zhang Y, van Velkinburgh JC, Langley M, et al. Evaluation of whole-genome sequencing for outbreak detection RJ, et al. Gene expression-based classifiers identify Staphylococcus of Verotoxigenic Escherichia coli O157:H7 from the Canadian aureus infection in mice and humans. PLoS One. 2013;8(1):e48979. perspective. BMC Genomics. 2018;19(1):870. 59. Ljungstrom L, Enroth H, Claesson BE, Ovemyr I, Karlsson J, Froberg 43. Califf RM. Biomarker definitions and their applications. Exp Biol Med B, et al. Clinical evaluation of commercial nucleic acid amplification (Maywood). 2018;243(3):213-21. tests in patients with suspected sepsis. BMC Infect Dis. 2015;15:199. 44. Pierrakos C, Velissaris D, Bisdorff M, Marshall JC, Vincent JL. 60. Andrews S. FastQC: a quality control tool for high throughput sequence Biomarkers of sepsis: time for a reappraisal. Crit Care. 2020;24(1):287. data. 2010. 64 65 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 61. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for 75. Ho TK. Random decision forests. 3rd international conference on Illumina sequence data. Bioinformatics. 2014;30(15):2114-20. document analusis and recognition. 1995;1:278-82. 62. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov 76. Tibshirani R. Regression Shrinkage and Selection via the Lasso. Journal AS, et al. SPAdes: a new genome assembly algorithm and its of the Royal Statistical Society Series B (Methodological). 1996;58(1). applications to single-cell sequencing. J Comput Biol. 2012;19(5):455- 77. Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer 77. classification using support vector machines. Machine Learning. 63. Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality 2002;46:389-422. assessment tool for genome assemblies. Bioinformatics. 78. Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta- 2013;29(8):1072-5. Cepas J, et al. STRING v10: protein-protein interaction networks, 64. Team. RC. R: A language and environment for statistical computing. R integrated over the tree of life. Nucleic Acids Res. 2015;43(Database Foundation for Statistical Computing. Vienna, Austria: R Foundation issue):D447-52. for Statistical Computing; 2019. 79. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. 65. Larsen MV, Cosentino S, Lukjancenko O, Saputra D, Rasmussen S, Cytoscape: a software environment for integrated models of Hasman H, et al. Benchmarking of methods for genomic taxonomy. J biomolecular interaction networks. Genome Res. 2003;13(11):2498- Clin Microbiol. 2014;52(5):1529-39. 504. 66. Clausen P, Aarestrup FM, Lund O. Rapid and precise alignment of raw 80. Bader GD, Hogue CW. An automated method for finding molecular reads against redundant databases with KMA. BMC Bioinformatics. complexes in large protein interaction networks. BMC Bioinformatics. 2018;19(1):307. 2003;4. 67. Hasman H, Saputra D, Sicheritz-Ponten T, Lund O, Svendsen CA, 81. Stelzer G, Rosen N, Plaschkes I, Zimmerman S, Twik M, Fishilevich S, Frimodt-Moller N, et al. Rapid whole-genome sequencing for detection et al. The GeneCards Suite: From Gene Data Mining to Disease Genome and characterization of microorganisms directly from clinical samples. Sequence Analyses. Current protocols in bioinformatics. J Clin Microbiol. 2014;52(1):139-46. 2016;54:1.30.1–1..3. 68. Richter M, Rosselló-Móra R, Oliver Glöckner F, Peplies J. JSpeciesWS: 82. Safran M, Rosen N, Twik M, BarShir R, Iny Stein T, Dahary D, et al. a web server for prokaryotic species circumscription based on pairwise The GeneCards Suite. In: Abugessaisa, I., Kasukawa, T. (eds) Practical genome comparison. Bioinformatics. 2015;32(6):929-31. Guide to Life Science Databases. Singapore: Springer; 2021. p. 27-56. 69. Zankari E, Hasman H, Cosentino S, Vestergaard M, Rasmussen S, Lund 83. Mi H, Thomas P. PANTHER pathway: an ontology-based pathway O, et al. Identification of acquired antimicrobial resistance genes. J database coupled with data analysis tools. Methods Mol Biol. Antimicrob Chemother. 2012;67(11):2640-4. 2009;563:123-40. 70. Larsen MV, Cosentino S, Rasmussen S, Friis C, Hasman H, Marvig RL, 84. Milacic M, Beavers D, Conley P, Gong C, Gillespie M, Griss J, et al. et al. Multilocus sequence typing of total-genome-sequenced bacteria. J The Reactome Pathway Knowledgebase 2024. Nucleic Acids Res. Clin Microbiol. 2012;50(4):1355-61. 2024;52(D1):D672-D8. 71. Joensen KG, Scheutz F, Lund O, Hasman H, Kaas RS, Nielsen EM, et 85. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Real-time whole-genome sequencing for routine typing, surveillance, al. Gene ontology: tool for the unification of biology. The Gene and outbreak detection of verotoxigenic Escherichia coli. J Clin Ontology Consortium. Nat Genet. 2000;25(1):25-9. Microbiol. 2014;52(5):1501-10. 86. Consortium. GO, Aleksander SA, Balhoff J, Carbon S, Cherry JM, 72. Gordon NC, Price JR, Cole K, Everitt R, Morgan M, Finney J, et al. Drabkin HJ, et al. The Gene Ontology knowledgebase in 2023. Genetics. Prediction of Staphylococcus aureus antimicrobial resistance by whole- 2023;224(1). genome sequencing. J Clin Microbiol. 2014;52(4):1182-91. 87. Thomas PD, Ebert D, Muruganujan A, Mushayahama T, Albou LP, Mi 73. Fredriksson S, Gullberg M, Jarvius J, Olsson C, Pietras K, Gústafsdóttir H. PANTHER: Making genome-scale phylogenetics accessible to all. SM, et al. Protein detection using proximity-dependent DNA ligation Protein Sci. 2022;31(1):8-22. assays. Nature Biotechnology. 2002;20(5):473-7. 88. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, 74. Wei R, Wang J, Jia E, Chen T, Ni Y, Jia W. A Gibbs sampler based left- et al. Bioconductor: open software development for computational censored missing value imputation approach for metabolomics studies. biology and bioinformatics. Genome Biology. 2004;5(10):R80. Computational Biology 2018;14(1). 89. Pankla R, Buddhisa S, Berry M, Blankenship DM, Bancroft GJ, Banchereau J, et al. Genomic transcriptional profiling identifies a 66 67 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 61. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for 75. Ho TK. Random decision forests. 3rd international conference on Illumina sequence data. Bioinformatics. 2014;30(15):2114-20. document analusis and recognition. 1995;1:278-82. 62. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov 76. Tibshirani R. Regression Shrinkage and Selection via the Lasso. Journal AS, et al. SPAdes: a new genome assembly algorithm and its of the Royal Statistical Society Series B (Methodological). 1996;58(1). applications to single-cell sequencing. J Comput Biol. 2012;19(5):455- 77. Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer 77. classification using support vector machines. Machine Learning. 63. Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality 2002;46:389-422. assessment tool for genome assemblies. Bioinformatics. 78. Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta- 2013;29(8):1072-5. Cepas J, et al. STRING v10: protein-protein interaction networks, 64. Team. RC. R: A language and environment for statistical computing. R integrated over the tree of life. Nucleic Acids Res. 2015;43(Database Foundation for Statistical Computing. Vienna, Austria: R Foundation issue):D447-52. for Statistical Computing; 2019. 79. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. 65. Larsen MV, Cosentino S, Lukjancenko O, Saputra D, Rasmussen S, Cytoscape: a software environment for integrated models of Hasman H, et al. Benchmarking of methods for genomic taxonomy. J biomolecular interaction networks. Genome Res. 2003;13(11):2498- Clin Microbiol. 2014;52(5):1529-39. 504. 66. Clausen P, Aarestrup FM, Lund O. Rapid and precise alignment of raw 80. Bader GD, Hogue CW. An automated method for finding molecular reads against redundant databases with KMA. BMC Bioinformatics. complexes in large protein interaction networks. BMC Bioinformatics. 2018;19(1):307. 2003;4. 67. Hasman H, Saputra D, Sicheritz-Ponten T, Lund O, Svendsen CA, 81. Stelzer G, Rosen N, Plaschkes I, Zimmerman S, Twik M, Fishilevich S, Frimodt-Moller N, et al. Rapid whole-genome sequencing for detection et al. The GeneCards Suite: From Gene Data Mining to Disease Genome and characterization of microorganisms directly from clinical samples. Sequence Analyses. Current protocols in bioinformatics. J Clin Microbiol. 2014;52(1):139-46. 2016;54:1.30.1–1..3. 68. Richter M, Rosselló-Móra R, Oliver Glöckner F, Peplies J. JSpeciesWS: 82. Safran M, Rosen N, Twik M, BarShir R, Iny Stein T, Dahary D, et al. a web server for prokaryotic species circumscription based on pairwise The GeneCards Suite. In: Abugessaisa, I., Kasukawa, T. (eds) Practical genome comparison. Bioinformatics. 2015;32(6):929-31. Guide to Life Science Databases. Singapore: Springer; 2021. p. 27-56. 69. Zankari E, Hasman H, Cosentino S, Vestergaard M, Rasmussen S, Lund 83. Mi H, Thomas P. PANTHER pathway: an ontology-based pathway O, et al. Identification of acquired antimicrobial resistance genes. J database coupled with data analysis tools. Methods Mol Biol. Antimicrob Chemother. 2012;67(11):2640-4. 2009;563:123-40. 70. Larsen MV, Cosentino S, Rasmussen S, Friis C, Hasman H, Marvig RL, 84. Milacic M, Beavers D, Conley P, Gong C, Gillespie M, Griss J, et al. et al. Multilocus sequence typing of total-genome-sequenced bacteria. J The Reactome Pathway Knowledgebase 2024. Nucleic Acids Res. Clin Microbiol. 2012;50(4):1355-61. 2024;52(D1):D672-D8. 71. Joensen KG, Scheutz F, Lund O, Hasman H, Kaas RS, Nielsen EM, et 85. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Real-time whole-genome sequencing for routine typing, surveillance, al. Gene ontology: tool for the unification of biology. The Gene and outbreak detection of verotoxigenic Escherichia coli. J Clin Ontology Consortium. Nat Genet. 2000;25(1):25-9. Microbiol. 2014;52(5):1501-10. 86. Consortium. GO, Aleksander SA, Balhoff J, Carbon S, Cherry JM, 72. Gordon NC, Price JR, Cole K, Everitt R, Morgan M, Finney J, et al. Drabkin HJ, et al. The Gene Ontology knowledgebase in 2023. Genetics. Prediction of Staphylococcus aureus antimicrobial resistance by whole- 2023;224(1). genome sequencing. J Clin Microbiol. 2014;52(4):1182-91. 87. Thomas PD, Ebert D, Muruganujan A, Mushayahama T, Albou LP, Mi 73. Fredriksson S, Gullberg M, Jarvius J, Olsson C, Pietras K, Gústafsdóttir H. PANTHER: Making genome-scale phylogenetics accessible to all. SM, et al. Protein detection using proximity-dependent DNA ligation Protein Sci. 2022;31(1):8-22. assays. Nature Biotechnology. 2002;20(5):473-7. 88. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, 74. Wei R, Wang J, Jia E, Chen T, Ni Y, Jia W. A Gibbs sampler based left- et al. Bioconductor: open software development for computational censored missing value imputation approach for metabolomics studies. biology and bioinformatics. Genome Biology. 2004;5(10):R80. Computational Biology 2018;14(1). 89. Pankla R, Buddhisa S, Berry M, Blankenship DM, Bancroft GJ, Banchereau J, et al. Genomic transcriptional profiling identifies a 66 67 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani candidate blood biomarker signature for the diagnosis of septicemic 102. Holden MT, Hsu LY, Kurt K, Weinert LA, Mather AE, Harris SR, et al. melioidosis. Genome Biol. 2009;10(11):R127. A genomic portrait of the emergence, evolution, and global spread of a 90. Dix A, Hunniger K, Weber M, Guthke R, Kurzai O, Linde J. Biomarker- methicillin-resistant Staphylococcus aureus pandemic. Genome Res. based classification of bacterial and fungal whole-blood infections in a 2013;23(4):653-64. genome-wide expression study. Front Microbiol. 2015;6:171. 103. Becker K, Schaumburg F, Kearns A, Larsen AR, Lindsay JA, Skov RL, 91. Kluyver T, Ragan-Kelley B, Perez F, Granger B, Bussonnier M, et al. Implications of identifying the recently defined members of the Frederic J, et al. Jupyter Notebooks – a publishing format for Staphylococcus aureus complex S. argenteus and S. schweitzeri: a reproducible computational workflows. Positioning and Power in position paper of members of the ESCMID Study Group for Academic Publishing: Players, Agents and Agendas. 2016:87-90. Staphylococci and Staphylococcal Diseases (ESGS). Clin Microbiol 92. Anaconda Documentation [Internet]. Anaconda Inc. 2020. Available Infect. 2019;25(9):1064-70. from: https://docs.anaconda.com/. 104. Koser CU, Holden MT, Ellington MJ, Cartwright EJ, Brown NM, 93. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Ogilvy-Stuart AL, et al. Rapid whole-genome sequencing for et al. Scikit-learn: Machine Learning in Python. Journal of Machine investigation of a neonatal MRSA outbreak. N Engl J Med. Learning Research. 2011;12:2825-30. 2012;366(24):2267-75. 94. Le Cessie S, Houwelingen JCV. Ridge estimators in logistic regression. 105. Mason A, Foster D, Bradley P, Golubchik T, Doumith M, Gordon NC, Journal of the Royal Statistical Society: Series C (Applied Statistics). et al. Accuracy of Different Bioinformatics Methods in Detecting 1992;41(1):191–201. Antibiotic Resistance and Virulence Factors from Staphylococcus 95. Fournier P, Dubourg G, Raoult D. Clinical detection and aureus Whole-Genome Sequences. J Clin Microbiol. 2018;56(9). characterization of bacterial pathogens in the genomics era. Genome 106. Kane TL, Carothers KE, Lee SW. Virulence Factor Targeting of the Medicine. 2014;6(11). Bacterial Pathogen Staphylococcus aureus for Vaccine and 96. Endrullat C, Glokler J, Franke P, Frohme M. Standardization and quality Therapeutics. Curr Drug Targets. 2018;19(2):111-27. management in next-generation sequencing. Appl Transl Genom. 107. Bukowski M, Wladyka B, Dubin G. Exfoliative toxins of 2016;10:2-9. Staphylococcus aureus. Toxins (Basel). 2010;2(5):1148-65. 97. Tang Hallback E, Karami N, Adlerberth I, Cardew S, Ohlen M, 108. Spaulding AR, Salgado-Pabon W, Kohler PL, Horswill AR, Leung DY, Engstrom Jakobsson H, et al. Methicillin-resistant Staphylococcus Schlievert PM. Staphylococcal and streptococcal superantigen argenteus misidentified as methicillin-resistant Staphylococcus aureus exotoxins. Clin Microbiol Rev. 2013;26(3):422-47. emerging in western Sweden. J Med Microbiol. 2018;67(7):968-71. 109. Shallcross LJ, Fragaszy E, Johnson AM, Hayward AC. The role of the 98. Giske CG, Dyrkell F, Arnellos D, Vestberg N, Hermansson Panna S, Panton-Valentine leucocidin toxin in staphylococcal disease: a Froding I, et al. Transmission events and antimicrobial susceptibilities systematic review and meta-analysis. Lancet Infect Dis. 2013;13(1):43- of methicillin-resistant Staphylococcus argenteus in Stockholm. Clin 54. Microbiol Infect. 2019;25(10):1289 e5- e8. 110. Enright M, Spratt B. Multilocus sequence typing. Trends Microbiol. 99. Enstrom J, Froding I, Giske CG, Ininbergs K, Bai X, Sandh G, et al. 1999 7(12):482-7. USA300 methicillin-resistant Staphylococcus aureus in Stockholm, 111. Urwin R, Maiden MC. Multi-locus sequence typing: a tool for global Sweden, from 2008 to 2016. PLoS One. 2018;13(11):e0205761. epidemiology. Trends Microbiol. 2003;11(10):479-87. 100. Saxenborn P, Baxter J, Tilevik A, Fagerlind M, Dyrkell F, Pernestig AK, 112. Struelens M. Molecular epidemiologic typing systems of bacterial et al. Genotypic Characterization of Clinical Klebsiella spp. Isolates pathogens: current issues and perspectives. Mem Inst Oswaldo Cruz. Collected From Patients With Suspected Community-Onset Sepsis, 1998;93(5):581-5. Sweden. Front Microbiol. 2021;12:640408. 113. Maiden M, Bygraves J, Feil E, Morelli G, Russell J, Urwin R, et al. 101. Tong SYC, Schaumburg F, Ellington MJ, Corander J, Pichon B, Multilocus sequence typing: a portable approach to the identification of Leendertz F, et al. Novel staphylococcal species that form part of a clones within populations of pathogenic microorganisms. Proc Natl Staphylococcus aureus-related complex: the non-pigmented Acad Sci U S A 1998;95(6):3140-5. Staphylococcus argenteus sp. nov. and the non-human primate- 114. Page AJ, Alikhan NF, Carleton HA, Seemann T, Keane JA, Katz LS. associated Staphylococcus schweitzeri sp. nov. Int J Syst Evol Comparison of classical multi-locus sequence typing software for next- Microbiol. 2015;65(Pt 1):15-22. generation sequencing data. Microb Genom. 2017;3(8):e000124. 68 69 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani candidate blood biomarker signature for the diagnosis of septicemic 102. Holden MT, Hsu LY, Kurt K, Weinert LA, Mather AE, Harris SR, et al. melioidosis. Genome Biol. 2009;10(11):R127. A genomic portrait of the emergence, evolution, and global spread of a 90. Dix A, Hunniger K, Weber M, Guthke R, Kurzai O, Linde J. Biomarker- methicillin-resistant Staphylococcus aureus pandemic. Genome Res. based classification of bacterial and fungal whole-blood infections in a 2013;23(4):653-64. genome-wide expression study. Front Microbiol. 2015;6:171. 103. Becker K, Schaumburg F, Kearns A, Larsen AR, Lindsay JA, Skov RL, 91. Kluyver T, Ragan-Kelley B, Perez F, Granger B, Bussonnier M, et al. Implications of identifying the recently defined members of the Frederic J, et al. Jupyter Notebooks – a publishing format for Staphylococcus aureus complex S. argenteus and S. schweitzeri: a reproducible computational workflows. Positioning and Power in position paper of members of the ESCMID Study Group for Academic Publishing: Players, Agents and Agendas. 2016:87-90. Staphylococci and Staphylococcal Diseases (ESGS). Clin Microbiol 92. Anaconda Documentation [Internet]. Anaconda Inc. 2020. Available Infect. 2019;25(9):1064-70. from: https://docs.anaconda.com/. 104. Koser CU, Holden MT, Ellington MJ, Cartwright EJ, Brown NM, 93. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Ogilvy-Stuart AL, et al. Rapid whole-genome sequencing for et al. Scikit-learn: Machine Learning in Python. Journal of Machine investigation of a neonatal MRSA outbreak. N Engl J Med. Learning Research. 2011;12:2825-30. 2012;366(24):2267-75. 94. Le Cessie S, Houwelingen JCV. Ridge estimators in logistic regression. 105. Mason A, Foster D, Bradley P, Golubchik T, Doumith M, Gordon NC, Journal of the Royal Statistical Society: Series C (Applied Statistics). et al. Accuracy of Different Bioinformatics Methods in Detecting 1992;41(1):191–201. Antibiotic Resistance and Virulence Factors from Staphylococcus 95. Fournier P, Dubourg G, Raoult D. Clinical detection and aureus Whole-Genome Sequences. J Clin Microbiol. 2018;56(9). characterization of bacterial pathogens in the genomics era. Genome 106. Kane TL, Carothers KE, Lee SW. Virulence Factor Targeting of the Medicine. 2014;6(11). Bacterial Pathogen Staphylococcus aureus for Vaccine and 96. Endrullat C, Glokler J, Franke P, Frohme M. Standardization and quality Therapeutics. Curr Drug Targets. 2018;19(2):111-27. management in next-generation sequencing. Appl Transl Genom. 107. Bukowski M, Wladyka B, Dubin G. Exfoliative toxins of 2016;10:2-9. Staphylococcus aureus. Toxins (Basel). 2010;2(5):1148-65. 97. Tang Hallback E, Karami N, Adlerberth I, Cardew S, Ohlen M, 108. Spaulding AR, Salgado-Pabon W, Kohler PL, Horswill AR, Leung DY, Engstrom Jakobsson H, et al. Methicillin-resistant Staphylococcus Schlievert PM. Staphylococcal and streptococcal superantigen argenteus misidentified as methicillin-resistant Staphylococcus aureus exotoxins. Clin Microbiol Rev. 2013;26(3):422-47. emerging in western Sweden. J Med Microbiol. 2018;67(7):968-71. 109. Shallcross LJ, Fragaszy E, Johnson AM, Hayward AC. The role of the 98. Giske CG, Dyrkell F, Arnellos D, Vestberg N, Hermansson Panna S, Panton-Valentine leucocidin toxin in staphylococcal disease: a Froding I, et al. Transmission events and antimicrobial susceptibilities systematic review and meta-analysis. Lancet Infect Dis. 2013;13(1):43- of methicillin-resistant Staphylococcus argenteus in Stockholm. Clin 54. Microbiol Infect. 2019;25(10):1289 e5- e8. 110. Enright M, Spratt B. Multilocus sequence typing. Trends Microbiol. 99. Enstrom J, Froding I, Giske CG, Ininbergs K, Bai X, Sandh G, et al. 1999 7(12):482-7. USA300 methicillin-resistant Staphylococcus aureus in Stockholm, 111. Urwin R, Maiden MC. Multi-locus sequence typing: a tool for global Sweden, from 2008 to 2016. PLoS One. 2018;13(11):e0205761. epidemiology. Trends Microbiol. 2003;11(10):479-87. 100. Saxenborn P, Baxter J, Tilevik A, Fagerlind M, Dyrkell F, Pernestig AK, 112. Struelens M. Molecular epidemiologic typing systems of bacterial et al. Genotypic Characterization of Clinical Klebsiella spp. Isolates pathogens: current issues and perspectives. Mem Inst Oswaldo Cruz. Collected From Patients With Suspected Community-Onset Sepsis, 1998;93(5):581-5. Sweden. Front Microbiol. 2021;12:640408. 113. Maiden M, Bygraves J, Feil E, Morelli G, Russell J, Urwin R, et al. 101. Tong SYC, Schaumburg F, Ellington MJ, Corander J, Pichon B, Multilocus sequence typing: a portable approach to the identification of Leendertz F, et al. Novel staphylococcal species that form part of a clones within populations of pathogenic microorganisms. Proc Natl Staphylococcus aureus-related complex: the non-pigmented Acad Sci U S A 1998;95(6):3140-5. Staphylococcus argenteus sp. nov. and the non-human primate- 114. Page AJ, Alikhan NF, Carleton HA, Seemann T, Keane JA, Katz LS. associated Staphylococcus schweitzeri sp. nov. Int J Syst Evol Comparison of classical multi-locus sequence typing software for next- Microbiol. 2015;65(Pt 1):15-22. generation sequencing data. Microb Genom. 2017;3(8):e000124. 68 69 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 115. Deurenberg RH, Stobberingh EE. The evolution of Staphylococcus 128. Karlsson H, Larsson P, Wold AE, Rudin A. Pattern of cytokine aureus. Infect Genet Evol. 2008;8(6):747-63. responses to gram-positive and gram-negative commensal bacteria is 116. Swedres-Svarm. Sales of antibiotics and occurrence of resistance in profoundly changed when monocytes differentiate into dendritic cells. Sweden. Solna/Uppsala ISSN1650-6332 Sweden; 2021. Infect Immun. 2004;72(5):2671-8. 117. Gideskog M, Melhus A. Outbreak of Methicillin-resistant 129. Cross ML, Ganner A, Teilab D, Fray LM. Patterns of cytokine induction Staphylococcus aureus in a Hospital Center for Children's and Women's by gram-positive and gram-negative probiotic bacteria. FEMS Immunol Health in a Swedish County. APMIS. 2019;127(4):181-6. Med Microbiol. 2004;42(2):173-80. 118. Japoni A, Ziyaeyan M, Jmalidoust M, Farshad S, Alborzi A, Rafaatpour 130. Surbatovic M, Popovic N, Vojvodic D, Milosevic I, Acimovic G, N, et al. Antibacterial susceptibility patterns and cross-resistance of Stojicic M, et al. Cytokine profile in severe Gram-positive and Gram- methicillin resistant and sensitive staphyloccus aureus isolated from the negative abdominal sepsis. Sci Rep. 2015;5:11355. hospitalized patients in shiraz, iran. Braz J Microbiol. 2010 41(3):567- 131. Skovbjerg S, Martner A, Hynsjo L, Hessle C, Olsen I, Dewhirst FE, et 73. al. Gram-positive and gram-negative bacteria induce different patterns 119. Rehman LU, Afzal Khan A, Afridi P, Ur Rehman S, Wajahat M, Khan of cytokine production in human mononuclear cells irrespective of F. Prevalence and antibiotic susceptibility of clinical staphylococcus taxonomic relatedness. J Interferon Cytokine Res. 2010;30(1):23-32. aureus isolates in various specimens collected from a tertiary care 132. Arabestani MR, Rastiany S, Kazemi S, Mousavi SM. Conventional, hospital, Hayatabad, Peshawar, Pakistan. Pakistan Journal of Health molecular methods and biomarkers molecules in detection of Sciences. 2022;04(3):105-10. septicemia. Adv Biomed Res. 2015;4:120. 120. Mesbah A, Mashak Z, Abdolmaleki Z. A survey of prevalence and 133. Petrera A, von Toerne C, Behler J, Huth C, Thorand B, Hilgendorff A, phenotypic and genotypic assessment of antibiotic resistance in et al. Multi-platforms approach for plasma proteomics: complementarity Staphylococcus aureus bacteria isolated from ready-to-eat food samples of Olink PEA technology to mass spectrometry-based protein profiling. collected from Tehran Province, Iran. Trop Med Health. 2021;49(1):81. Journal of Proteome Research. 2021;20:751-62. 121. Kayili E, Sanlibaba P. Prevalence, characterization and antibiotic 134. Lenz M, Schulz A, Koeck T, Rapp S, Nagler M, Sauer M, et al. Missing resistance of Staphylococcus aureus isolated from traditional cheeses in value imputation in proximity extension assay-based targeted Turkey. International Journal of Food Properties. 2020;23(1):1441-51. proteomics data. PLoS One. 2020;15(12):e0243487. 122. Castleman MJ, Pokhrel S, Triplett KD, Kusewitt DF, Elmore BO, 135. Wang K, Bhandari V, Giuliano JS, Jr., CS OH, Shattuck MD, Kirby M. Joyner JA, et al. Innate Sex Bias of Staphylococcus aureus Skin Angiopoietin-1, angiopoietin-2 and bicarbonate as diagnostic Infection Is Driven by alpha-Hemolysin. J Immunol. 2018;200(2):657- biomarkers in children with severe sepsis. PLoS One. 68. 2014;9(9):e108461. 123. Kupfer M, Jatzwauk I, Monecke S, Möbius J, Weusten A. MRSA in a 136. Ratzinger F, Haslacher H, Perkmann T, Pinzan M, Anner P, large German University Hospital: Male gender is a significant risk Makristathis A, et al. Machine learning for fast identification of factor for MRSA acquisition. GMS Krankenhhyg Interdiszip. 2010;5(2. bacteraemia in SIRS patients treated on standard care wards: a cohort 124. Thorlacius-Ussing L, Sandholdt H, Larsen AR, Petersen A, Benfield T. study. Sci Rep. 2018;8(1):12233. Age-Dependent Increase in Incidence of Staphylococcus aureus 137. Koga T, Sumiyoshi R, Furukawa K, Sato S, Migita K, Shimizu T, et al. Bacteremia, Denmark, 2008-2015. Emerg Infect Dis. 2019;25(5):875- Interleukin-18 and fibroblast growth factor 2 in combination is a useful 82. diagnostic biomarker to distinguish adult-onset Still's disease from 125. Skogberg K, Lyytikainen O, Ollgren J, Nuorti JP, Ruutu P. Population- sepsis. Arthritis Res Ther. 2020;22(1):108. based burden of bloodstream infections in Finland. Clin Microbiol 138. Fan Y, Han Q, Li J, Ye G, Zhang X, Xu T, et al. Revealing potential Infect. 2012;18(6):E170-6. diagnostic gene biomarkers of septic shock based on machine learning 126. Yang ES, Tan J, Eells S, Rieg G, Tagudar G, Miller LG. Body site analysis. BMC Infect Dis. 2022;22(1):65. colonization in patients with community-associated methicillin-resistant 139. Lien F, Lin HS, Wu YT, Chiueh TS. Bacteremia detection from Staphylococcus aureus and other types of S. aureus skin infections. Clin complete blood count and differential leukocyte count with machine Microbiol Infect. 2010;16(5):425-31. learning: complementary and competitive with C-reactive protein and 127. Sakr A, Bregeon F, Mege JL, Rolain JM, Blin O. Staphylococcus aureus procalcitonin tests. BMC Infect Dis. 2022;22(1):287. Nasal Colonization: An Update on Mechanisms, Epidemiology, Risk 140. Ming T, Dong M, Song X, Li X, Kong Q, Fang Q, et al. Integrated Factors, and Subsequent Infections. Front Microbiol. 2018;9:2419. Analysis of Gene Co-Expression Network and Prediction Model 70 71 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani 115. Deurenberg RH, Stobberingh EE. The evolution of Staphylococcus 128. Karlsson H, Larsson P, Wold AE, Rudin A. Pattern of cytokine aureus. Infect Genet Evol. 2008;8(6):747-63. responses to gram-positive and gram-negative commensal bacteria is 116. Swedres-Svarm. Sales of antibiotics and occurrence of resistance in profoundly changed when monocytes differentiate into dendritic cells. Sweden. Solna/Uppsala ISSN1650-6332 Sweden; 2021. Infect Immun. 2004;72(5):2671-8. 117. Gideskog M, Melhus A. Outbreak of Methicillin-resistant 129. Cross ML, Ganner A, Teilab D, Fray LM. Patterns of cytokine induction Staphylococcus aureus in a Hospital Center for Children's and Women's by gram-positive and gram-negative probiotic bacteria. FEMS Immunol Health in a Swedish County. APMIS. 2019;127(4):181-6. Med Microbiol. 2004;42(2):173-80. 118. Japoni A, Ziyaeyan M, Jmalidoust M, Farshad S, Alborzi A, Rafaatpour 130. Surbatovic M, Popovic N, Vojvodic D, Milosevic I, Acimovic G, N, et al. Antibacterial susceptibility patterns and cross-resistance of Stojicic M, et al. Cytokine profile in severe Gram-positive and Gram- methicillin resistant and sensitive staphyloccus aureus isolated from the negative abdominal sepsis. Sci Rep. 2015;5:11355. hospitalized patients in shiraz, iran. Braz J Microbiol. 2010 41(3):567- 131. Skovbjerg S, Martner A, Hynsjo L, Hessle C, Olsen I, Dewhirst FE, et 73. al. Gram-positive and gram-negative bacteria induce different patterns 119. Rehman LU, Afzal Khan A, Afridi P, Ur Rehman S, Wajahat M, Khan of cytokine production in human mononuclear cells irrespective of F. Prevalence and antibiotic susceptibility of clinical staphylococcus taxonomic relatedness. J Interferon Cytokine Res. 2010;30(1):23-32. aureus isolates in various specimens collected from a tertiary care 132. Arabestani MR, Rastiany S, Kazemi S, Mousavi SM. Conventional, hospital, Hayatabad, Peshawar, Pakistan. Pakistan Journal of Health molecular methods and biomarkers molecules in detection of Sciences. 2022;04(3):105-10. septicemia. Adv Biomed Res. 2015;4:120. 120. Mesbah A, Mashak Z, Abdolmaleki Z. A survey of prevalence and 133. Petrera A, von Toerne C, Behler J, Huth C, Thorand B, Hilgendorff A, phenotypic and genotypic assessment of antibiotic resistance in et al. Multi-platforms approach for plasma proteomics: complementarity Staphylococcus aureus bacteria isolated from ready-to-eat food samples of Olink PEA technology to mass spectrometry-based protein profiling. collected from Tehran Province, Iran. Trop Med Health. 2021;49(1):81. Journal of Proteome Research. 2021;20:751-62. 121. Kayili E, Sanlibaba P. Prevalence, characterization and antibiotic 134. Lenz M, Schulz A, Koeck T, Rapp S, Nagler M, Sauer M, et al. Missing resistance of Staphylococcus aureus isolated from traditional cheeses in value imputation in proximity extension assay-based targeted Turkey. International Journal of Food Properties. 2020;23(1):1441-51. proteomics data. PLoS One. 2020;15(12):e0243487. 122. Castleman MJ, Pokhrel S, Triplett KD, Kusewitt DF, Elmore BO, 135. Wang K, Bhandari V, Giuliano JS, Jr., CS OH, Shattuck MD, Kirby M. Joyner JA, et al. Innate Sex Bias of Staphylococcus aureus Skin Angiopoietin-1, angiopoietin-2 and bicarbonate as diagnostic Infection Is Driven by alpha-Hemolysin. J Immunol. 2018;200(2):657- biomarkers in children with severe sepsis. PLoS One. 68. 2014;9(9):e108461. 123. Kupfer M, Jatzwauk I, Monecke S, Möbius J, Weusten A. MRSA in a 136. Ratzinger F, Haslacher H, Perkmann T, Pinzan M, Anner P, large German University Hospital: Male gender is a significant risk Makristathis A, et al. Machine learning for fast identification of factor for MRSA acquisition. GMS Krankenhhyg Interdiszip. 2010;5(2. bacteraemia in SIRS patients treated on standard care wards: a cohort 124. Thorlacius-Ussing L, Sandholdt H, Larsen AR, Petersen A, Benfield T. study. Sci Rep. 2018;8(1):12233. Age-Dependent Increase in Incidence of Staphylococcus aureus 137. Koga T, Sumiyoshi R, Furukawa K, Sato S, Migita K, Shimizu T, et al. Bacteremia, Denmark, 2008-2015. Emerg Infect Dis. 2019;25(5):875- Interleukin-18 and fibroblast growth factor 2 in combination is a useful 82. diagnostic biomarker to distinguish adult-onset Still's disease from 125. Skogberg K, Lyytikainen O, Ollgren J, Nuorti JP, Ruutu P. Population- sepsis. Arthritis Res Ther. 2020;22(1):108. based burden of bloodstream infections in Finland. Clin Microbiol 138. Fan Y, Han Q, Li J, Ye G, Zhang X, Xu T, et al. Revealing potential Infect. 2012;18(6):E170-6. diagnostic gene biomarkers of septic shock based on machine learning 126. Yang ES, Tan J, Eells S, Rieg G, Tagudar G, Miller LG. Body site analysis. BMC Infect Dis. 2022;22(1):65. colonization in patients with community-associated methicillin-resistant 139. Lien F, Lin HS, Wu YT, Chiueh TS. Bacteremia detection from Staphylococcus aureus and other types of S. aureus skin infections. Clin complete blood count and differential leukocyte count with machine Microbiol Infect. 2010;16(5):425-31. learning: complementary and competitive with C-reactive protein and 127. Sakr A, Bregeon F, Mege JL, Rolain JM, Blin O. Staphylococcus aureus procalcitonin tests. BMC Infect Dis. 2022;22(1):287. Nasal Colonization: An Update on Mechanisms, Epidemiology, Risk 140. Ming T, Dong M, Song X, Li X, Kong Q, Fang Q, et al. Integrated Factors, and Subsequent Infections. Front Microbiol. 2018;9:2419. Analysis of Gene Co-Expression Network and Prediction Model 70 71 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani Indicates Immune-Related Roles of the Identified Biomarkers in Sepsis 154. Eisen DP, Dean MM, Boermeester MA, Fidler KJ, Gordon AC, and Sepsis-Induced Acute Respiratory Distress Syndrome. Front Kronborg G, et al. Low serum mannose-binding lectin level increases Immunol. 2022;13:897390. the risk of death due to pneumococcal infection. Clin Infect Dis. 141. Mikacenic C, Price BL, Harju-Baker S, O'Mahony DS, Robinson-Cohen 2008;47(4):510-6. C, Radella F, et al. A Two-Biomarker Model Predicts Mortality in the 155. Jacobson S, Larsson P, Aberg AM, Johansson G, Winso O, Soderberg Critically Ill with Sepsis. Am J Respir Crit Care Med. S. Levels of mannose-binding lectin (MBL) associates with sepsis- 2017;196(8):1004-11. related in-hospital mortality in women. J Inflamm (Lond). 2020;17:28. 142. Li J, Zhou M, Feng JQ, Hong SM, Yang SY, Zhi LX, et al. Bulk RNA 156. Liu L, Ning B. The role of MBL2 gene polymorphism in sepsis Sequencing With Integrated Single-Cell RNA Sequencing Identifies incidence. In J Clin Exp Pathol. 2015;8(11):15123-7. BCL2A1 as a Potential Diagnostic and Prognostic Biomarker for Sepsis. 157. Li Y, Lyu C-Z, Cheng S-W, Xian L-N, Lin Z-X, Ao X, et al. The clinical Front Public Health. 2022;10:937303. relevance of MBL2 gene polymorphism and sepsis. Asian Pacific 143. She H, Tan L, Zhou Y, Zhu Y, Ma C, Wu Y, et al. The Landscape of Journal of Tropical Medicine. 2018;11(3). Featured Metabolism-Related Genes and Imbalanced Immune Cell 158. Morgan BJ, Bauza-Mayol G, Gardner OFW, Zhang Y, Levato R, Archer Subsets in Sepsis. Front Genet. 2022;13:821275. CW, et al. Bone Morphogenetic Protein-9 Is a Potent Chondrogenic and 144. Yao Y, Zhao J, Hu J, Song H, Wang S, Wang Y. Identification of a Four- Morphogenic Factor for Articular Cartilage Chondroprogenitors. Stem Gene Signature for Diagnosing Paediatric Sepsis. Biomed Res Int. Cells Dev. 2020;29(14):882-94. 2022;2022:5217885. 159. Mostafa S, Pakvasa M, Coalson E, Zhu A, Alverdy A, Castillo H, et al. 145. Fang Q, Wang Q, Zhou Z, Xie A. Consensus analysis via weighted gene The wonders of BMP9: From mesenchymal stem cell differentiation, co-expression network analysis (WGCNA) reveals genes participating angiogenesis, neurogenesis, tumorigenesis, and metabolism to in early phase of acute respiratory distress syndrome (ARDS) induced regenerative medicine. Genes Dis. 2019;6(3):201-23. by sepsis. Bioengineered. 2021;12(1):1161-72. 160. Faiotto VB, Franci D, Enz Hubert RM, de Souza GR, Fiusa MML, 146. Bandyopadhyay S, Lysak N, Adhikari L, Velez LM, Sautina L, Hounkpe BW, et al. Circulating levels of the angiogenesis mediators Mohandas R, et al. Discovery and Validation of Urinary Molecular endoglin, HB-EGF, BMP-9 and FGF-2 in patients with severe sepsis Signature of Early Sepsis. Crit Care Explor. 2020;2(10):e0195. and septic shock. J Crit Care. 2017;42:162-7. 147. Olink. How is the Limit of Detection (LOD) estimated and how is this 161. He H, Garcia EA. Learning from Imbalanced Data. IEEE Transactions handled in the data analysis? 2018 [updated 2018 Dec 18; cited 2022 on Knowledge and Data Engineering. 2009;21:1263-84. Nov 08]. 162. Chen H, Li Y, Li T, Sun H, Tan C, Gao M, et al. Identification of 148. Zhang J, Friberg IM, Kift-Morgan A, Parekh G, Morgan MP, Liuzzi AR, Potential Transcriptional Biomarkers Differently Expressed in Both S. et al. Machine-learning algorithms define pathogen-specific local aureus- and E. coli-Induced Sepsis via Integrated Analysis. Biomed Res immune fingerprints in peritoneal dialysis patients with bacterial Int. 2019;2019:2487921. infections. Kidney Int. 2017;92(1):179-91. 163. Mitchell A, Rentero C, Endoh Y, Hsu K, Gaus K, Geczy C, et al. 149. Hair J, J F, Black JW, Babin BJ, Anderson ER. Multivariate Data LILRA5 is expressed by synovial tissue macrophages in rheumatoid Analysis. Seventh ed: Edinburgh: Pearson Education Limited; 2010. arthritis, selectively induces pro-inflammatory cytokines and IL-10 and 150. Byrne BM. Structural equation modeling with AMOS: Basic concepts, is regulated by TNF-alpha, IL-10 and IFN-gamma. Eur J Immunol. applications, and programming: New York: Routledge; 2010. 2008;38(12):3459-73. 151. Yu MH, Chen MH, Han F, Li Q, Sun RH, Tu YX. Prognostic value of 164. Abdallah F, Coindre S, Gardet M, Meurisse F, Naji A, Suganuma N, et the biomarkers serum amyloid A and nitric oxide in patients with sepsis. al. Leukocyte Immunoglobulin-Like Receptors in Regulating the Int Immunopharmacol. 2018;62:287-92. Immune Response in Infectious Diseases: A Window of Opportunity to 152. Garay-Baquero DJ, White CH, Walker NF, Tebruegge M, Schiff HF, Pathogen Persistence and a Sound Target in Therapeutics. Front Ugarte-Gil C, et al. Comprehensive plasma proteomic profiling reveals Immunol. 2021;12:717998. biomarkers for active tuberculosis. JCI Insight. 2020;5(18). 165. Lewis Marffy AL, McCarthy AJ. Leukocyte Immunoglobulin-Like 153. Wozniak JM, Mills RH, Olson J, Caldera JR, Sepich-Poore GD, Receptors (LILRs) on Human Neutrophils: Modulators of Infection and Carrillo-Terrazas M, et al. Mortality Risk Profiling of Staphylococcus Immunity. Front Immunol. 2020;11:857. aureus Bacteremia by Multi-omic Serum Analysis Reveals Early 166. Evrard C, Faway E, De Vuyst E, Svensek O, De Glas V, Bergerat D, et Predictive and Pathogenic Signatures. Cell. 2020;182(5):1311-27 e14. al. Deletion of TNFAIP6 Gene in Human Keratinocytes Demonstrates a 72 73 Biomarker profiling in sepsis diagnostics Mahnaz Irani Shemirani Indicates Immune-Related Roles of the Identified Biomarkers in Sepsis 154. Eisen DP, Dean MM, Boermeester MA, Fidler KJ, Gordon AC, and Sepsis-Induced Acute Respiratory Distress Syndrome. Front Kronborg G, et al. Low serum mannose-binding lectin level increases Immunol. 2022;13:897390. the risk of death due to pneumococcal infection. Clin Infect Dis. 141. Mikacenic C, Price BL, Harju-Baker S, O'Mahony DS, Robinson-Cohen 2008;47(4):510-6. C, Radella F, et al. A Two-Biomarker Model Predicts Mortality in the 155. Jacobson S, Larsson P, Aberg AM, Johansson G, Winso O, Soderberg Critically Ill with Sepsis. Am J Respir Crit Care Med. S. Levels of mannose-binding lectin (MBL) associates with sepsis- 2017;196(8):1004-11. related in-hospital mortality in women. J Inflamm (Lond). 2020;17:28. 142. Li J, Zhou M, Feng JQ, Hong SM, Yang SY, Zhi LX, et al. Bulk RNA 156. Liu L, Ning B. The role of MBL2 gene polymorphism in sepsis Sequencing With Integrated Single-Cell RNA Sequencing Identifies incidence. In J Clin Exp Pathol. 2015;8(11):15123-7. BCL2A1 as a Potential Diagnostic and Prognostic Biomarker for Sepsis. 157. Li Y, Lyu C-Z, Cheng S-W, Xian L-N, Lin Z-X, Ao X, et al. The clinical Front Public Health. 2022;10:937303. relevance of MBL2 gene polymorphism and sepsis. Asian Pacific 143. She H, Tan L, Zhou Y, Zhu Y, Ma C, Wu Y, et al. The Landscape of Journal of Tropical Medicine. 2018;11(3). Featured Metabolism-Related Genes and Imbalanced Immune Cell 158. Morgan BJ, Bauza-Mayol G, Gardner OFW, Zhang Y, Levato R, Archer Subsets in Sepsis. Front Genet. 2022;13:821275. CW, et al. Bone Morphogenetic Protein-9 Is a Potent Chondrogenic and 144. Yao Y, Zhao J, Hu J, Song H, Wang S, Wang Y. Identification of a Four- Morphogenic Factor for Articular Cartilage Chondroprogenitors. Stem Gene Signature for Diagnosing Paediatric Sepsis. Biomed Res Int. Cells Dev. 2020;29(14):882-94. 2022;2022:5217885. 159. Mostafa S, Pakvasa M, Coalson E, Zhu A, Alverdy A, Castillo H, et al. 145. Fang Q, Wang Q, Zhou Z, Xie A. Consensus analysis via weighted gene The wonders of BMP9: From mesenchymal stem cell differentiation, co-expression network analysis (WGCNA) reveals genes participating angiogenesis, neurogenesis, tumorigenesis, and metabolism to in early phase of acute respiratory distress syndrome (ARDS) induced regenerative medicine. Genes Dis. 2019;6(3):201-23. by sepsis. Bioengineered. 2021;12(1):1161-72. 160. Faiotto VB, Franci D, Enz Hubert RM, de Souza GR, Fiusa MML, 146. Bandyopadhyay S, Lysak N, Adhikari L, Velez LM, Sautina L, Hounkpe BW, et al. Circulating levels of the angiogenesis mediators Mohandas R, et al. Discovery and Validation of Urinary Molecular endoglin, HB-EGF, BMP-9 and FGF-2 in patients with severe sepsis Signature of Early Sepsis. Crit Care Explor. 2020;2(10):e0195. and septic shock. J Crit Care. 2017;42:162-7. 147. Olink. How is the Limit of Detection (LOD) estimated and how is this 161. He H, Garcia EA. Learning from Imbalanced Data. IEEE Transactions handled in the data analysis? 2018 [updated 2018 Dec 18; cited 2022 on Knowledge and Data Engineering. 2009;21:1263-84. Nov 08]. 162. Chen H, Li Y, Li T, Sun H, Tan C, Gao M, et al. Identification of 148. Zhang J, Friberg IM, Kift-Morgan A, Parekh G, Morgan MP, Liuzzi AR, Potential Transcriptional Biomarkers Differently Expressed in Both S. et al. Machine-learning algorithms define pathogen-specific local aureus- and E. coli-Induced Sepsis via Integrated Analysis. Biomed Res immune fingerprints in peritoneal dialysis patients with bacterial Int. 2019;2019:2487921. infections. Kidney Int. 2017;92(1):179-91. 163. Mitchell A, Rentero C, Endoh Y, Hsu K, Gaus K, Geczy C, et al. 149. Hair J, J F, Black JW, Babin BJ, Anderson ER. Multivariate Data LILRA5 is expressed by synovial tissue macrophages in rheumatoid Analysis. Seventh ed: Edinburgh: Pearson Education Limited; 2010. arthritis, selectively induces pro-inflammatory cytokines and IL-10 and 150. Byrne BM. Structural equation modeling with AMOS: Basic concepts, is regulated by TNF-alpha, IL-10 and IFN-gamma. Eur J Immunol. applications, and programming: New York: Routledge; 2010. 2008;38(12):3459-73. 151. Yu MH, Chen MH, Han F, Li Q, Sun RH, Tu YX. Prognostic value of 164. Abdallah F, Coindre S, Gardet M, Meurisse F, Naji A, Suganuma N, et the biomarkers serum amyloid A and nitric oxide in patients with sepsis. al. Leukocyte Immunoglobulin-Like Receptors in Regulating the Int Immunopharmacol. 2018;62:287-92. Immune Response in Infectious Diseases: A Window of Opportunity to 152. Garay-Baquero DJ, White CH, Walker NF, Tebruegge M, Schiff HF, Pathogen Persistence and a Sound Target in Therapeutics. Front Ugarte-Gil C, et al. Comprehensive plasma proteomic profiling reveals Immunol. 2021;12:717998. biomarkers for active tuberculosis. JCI Insight. 2020;5(18). 165. Lewis Marffy AL, McCarthy AJ. Leukocyte Immunoglobulin-Like 153. Wozniak JM, Mills RH, Olson J, Caldera JR, Sepich-Poore GD, Receptors (LILRs) on Human Neutrophils: Modulators of Infection and Carrillo-Terrazas M, et al. Mortality Risk Profiling of Staphylococcus Immunity. Front Immunol. 2020;11:857. aureus Bacteremia by Multi-omic Serum Analysis Reveals Early 166. Evrard C, Faway E, De Vuyst E, Svensek O, De Glas V, Bergerat D, et Predictive and Pathogenic Signatures. Cell. 2020;182(5):1311-27 e14. al. Deletion of TNFAIP6 Gene in Human Keratinocytes Demonstrates a 72 73 Biomarker profiling in sepsis diagnostics Role for TSG-6 to Retain Hyaluronan Inside Epidermis. JID Innov. 2021;1(4):100054. 74