KU Leuven
Computational Systems Biology; Laboratory of Gene Technology
Hannelore Longin is a PhD student in the Computational Systems Biology group and the Laboratory of Gene Technology at KU Leuven. In her research, she leverages the latest advances in structural bioinformatics to improve our understanding of post-translational modifications in the virus-bacteria interplay. As part of this investigation, she has set up a large-scale effort to reannotate proteins of unknown function from Pseudomonas infecting phages, which she’ll be presenting at the Phage Protein Meeting.
Affiliations: (1). Computational Systems Biology, KU Leuven (Belgium) (2). Laboratory of Gene Technology, KU Leuven (Belgium) (3). Adelaide Medical School, the University of Adelaide (Australia) (4). Flinders Accelerator for Microbiome Exploration, Flinders University (Australia)
Phages proteins are an inexhaustible source of inspiration for biotechnological and clinical applications. However, many more could be hiding in plain sight. Indeed, up to 70% of predicted phage proteins are annotated as proteins of unknown function. Despite significant interest in unravelling these proteins’ functions, phage proteins are absent from recent large-scale structure-based efforts (such as AlphaFold database). Here, we investigate the efficacy of structure-based protein annotation for Pseudomonas-infecting phages. Briefly, we collected every protein annotated as ‘hypothetical/phage protein’ in NCBI and of at least 100 amino acids in length, of 887 Pseudomonas-infecting phages. These 38,025 proteins (31% of all proteins) were then clustered into 10,453 groups of homologs. Protein structures were predicted with ColabFold and structural similarity to the PDB and AlphaFold database was assessed with FoldSeek. Of all proteins, 59% displayed significant similarity to at least one structure in these databases. We benchmarked various post-processing strategies for extracting function from these FoldSeek hits (different information resources, hit selection methods, and structure-based clustering of hits). The resulting annotations were then compared with state-of-the-art phage annotation tools Pharokka and Phold. On average, up to 42% of the phage proteins of unknown function could be annotated using structure-based methods, depending on the post-processing strategies applied. While caution is warranted when transferring protein annotations based on similarity, these methods can significantly speed up research into new antimicrobials and biotechnological applications inspired by nature’s finest bioengineers: phages.