Literature Watch
A Deep Learning-Enabled Workflow to Estimate Real-World Progression-Free Survival in Patients With Metastatic Breast Cancer: Study Using Deidentified Electronic Health Records
JMIR Cancer. 2025 May 15;11:e64697. doi: 10.2196/64697.
ABSTRACT
BACKGROUND: Progression-free survival (PFS) is a crucial endpoint in cancer drug research. Clinician-confirmed cancer progression, namely real-world PFS (rwPFS) in unstructured text (ie, clinical notes), serves as a reasonable surrogate for real-world indicators in ascertaining progression endpoints. Response evaluation criteria in solid tumors (RECIST) is traditionally used in clinical trials using serial imaging evaluations but is impractical when working with real-world data. Manual abstraction of clinical progression from unstructured notes remains the gold standard. However, this process is a resource-intensive, time-consuming process. Natural language processing (NLP), a subdomain of machine learning, has shown promise in accelerating the extraction of tumor progression from real-world data in recent years.
OBJECTIVES: We aim to configure a pretrained, general-purpose health care NLP framework to transform free-text clinical notes and radiology reports into structured progression events for studying rwPFS on metastatic breast cancer (mBC) cohorts.
METHODS: This study developed and validated a novel semiautomated workflow to estimate rwPFS in patients with mBC using deidentified electronic health record data from the Nference nSights platform. The developed workflow was validated in a cohort of 316 patients with hormone receptor-positive, human epidermal growth factor receptor-2 (HER-2) 2-negative mBC, who were started on palbociclib and letrozole combination therapy between January 2015 and December 2021. Ground-truth datasets were curated to evaluate the workflow's performance at both the sentence and patient levels. NLP-captured progression or a change in therapy line were considered outcome events, while death, loss to follow-up, and end of the study period were considered censoring events for rwPFS computation. Peak reduction and cumulative decline in Patient Health Questionnaire-8 (PHQ-8) scores were analyzed in the progressed and nonprogressed patient subgroups.
RESULTS: The configured clinical NLP engine achieved a sentence-level progression capture accuracy of 98.2%. At the patient level, initial progression was captured within ±30 days with 88% accuracy. The median rwPFS for the study cohort (N=316) was 20 (95% CI 18-25) months. In a validation subset (n=100), rwPFS determined by manual curation was 25 (95% CI 15-35) months, closely aligning with the computational workflow's 22 (95% CI 15-35) months. A subanalysis revealed rwPFS estimates of 30 (95% CI 24-39) months from radiology reports and 23 (95% CI 19-28) months from clinical notes, highlighting the importance of integrating multiple note sources. External validation also demonstrated high accuracy (92.5% sentence level; 90.2% patient level). Sensitivity analysis revealed stable rwPFS estimates across varying levels of missing source data and event definitions. Peak reduction in PHQ-8 scores during the study period highlighted significant associations between patient-reported outcomes and disease progression.
CONCLUSIONS: This workflow enables rapid and reliable determination of rwPFS in patients with mBC receiving combination therapy. Further validation across more diverse external datasets and other cancer types is needed to ensure broader applicability and generalizability.
PMID:40372953 | DOI:10.2196/64697
Automated Microbubble Discrimination in Ultrasound Localization Microscopy by Vision Transformer
IEEE Trans Ultrason Ferroelectr Freq Control. 2025 May 15;PP. doi: 10.1109/TUFFC.2025.3570496. Online ahead of print.
ABSTRACT
Ultrasound localization microscopy (ULM) has revolutionized microvascular imaging by breaking the acoustic diffraction limit. However, different ULM workflows depend heavily on distinct prior knowledge, such as the impulse response and empirical selection of parameters (e.g., the number of microbubbles (MBs) per frame M), or the consistency of training-test dataset in deep learning (DL)-based studies. We hereby propose a general ULM pipeline that reduces priors. Our approach leverages a DL model that simultaneously distills microbubble signals and reduces speckle from every frame without estimating the impulse response and M. Our method features an efficient channel attention vision transformer (ViT) and a progressive learning strategy, enabling it to learn global information through training on progressively increasing patch sizes. Ample synthetic data were generated using the k-Wave toolbox to simulate various MB patterns, thus overcoming the deficiency of labeled data. The ViT output was further processed by a standard radial symmetry method for sub-pixel localization. Our method performed well on model-unseen public datasets: one in silico dataset with ground truth and four in vivo datasets of mouse tumor, rat brain, rat brain bolus, and rat kidney. Our pipeline outperformed conventional ULM, achieving higher positive predictive values (precision in DL, 0.88-0.41 vs. 0.83-0.16) and improved accuracy (root-mean-square errors: 0.25-0.14 λ vs. 0.31-0.13 λ) across a range of signal-to-noise ratios from 60 dB to 10 dB. Our model could detect more vessels in diverse in vivo datasets while achieving comparable resolutions to the standard method. The proposed ViT-based model, seamlessly integrated with state-of-the-art downstream ULM steps, improved the overall ULM performance with no priors.
PMID:40372868 | DOI:10.1109/TUFFC.2025.3570496
Toward Ultralow-Power Neuromorphic Speech Enhancement With Spiking-FullSubNet
IEEE Trans Neural Netw Learn Syst. 2025 May 15;PP. doi: 10.1109/TNNLS.2025.3566021. Online ahead of print.
ABSTRACT
Speech enhancement (SE) is critical for improving speech intelligibility and quality in various audio devices. In recent years, deep learning-based methods have significantly improved SE performance, but they often come with a high computational cost, which is prohibitive for a large number of edge devices, such as headsets and hearing aids. This work proposes an ultralow-power SE system based on the brain-inspired spiking neural network (SNN) called Spiking-FullSubNet. Spiking-FullSubNet follows a full-band and subband fusioned approach to effectively capture both global and local spectral information. To enhance the efficiency of computationally expensive subband modeling, we introduce a frequency partitioning method inspired by the sensitivity profile of the human peripheral auditory system. Furthermore, we introduce a novel spiking neuron model that can dynamically control the input information integration and forgetting, enhancing the multiscale temporal processing capability of SNN, which is critical for speech denoising. Experiments conducted on the recent Intel Neuromorphic Deep Noise Suppression (N-DNS) Challenge dataset show that the Spiking-FullSubNet surpasses state-of-the-art (SOTA) methods by large margins in terms of both speech quality and energy efficiency metrics. Notably, our system won the championship of the Intel N-DNS Challenge (algorithmic track), opening up a myriad of opportunities for ultralow-power SE at the edge. Our source code and model checkpoints are publicly available at github.com/haoxiangsnr/spiking-fullsubnet.
PMID:40372867 | DOI:10.1109/TNNLS.2025.3566021
2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction
IEEE Trans Med Imaging. 2025 May 15;PP. doi: 10.1109/TMI.2025.3570342. Online ahead of print.
ABSTRACT
Positron Emission Tomography (PET) is an important clinical imaging tool but inevitably introduces radiation exposure to patients and healthcare providers. Reducing the tracer injection dose and eliminating the CT acquisition for attenuation correction can reduce the overall radiation dose, but often results in PET with high noise and bias. Thus, it is desirable to develop 3D methods to translate the non-attenuation-corrected low-dose PET (NAC-LDPET) into attenuation-corrected standard-dose PET (AC-SDPET). Recently, diffusion models have emerged as a new state-of-the-art deep learning method for image-to-image translation, better than traditional CNN-based methods. However, due to the high computation cost and memory burden, it is largely limited to 2D applications. To address these challenges, we developed a novel 2.5D Multi-view Averaging Diffusion Model (MADM) for 3D image-to-image translation with application on NAC-LDPET to AC-SDPET translation. Specifically, MADM employs separate diffusion models for axial, coronal, and sagittal views, whose outputs are averaged in each sampling step to ensure the 3D generation quality from multiple views. To accelerate the 3D sampling process, we also proposed a strategy to use the CNN-based 3D generation as a prior for the diffusion model. Our experimental results on human patient studies suggested that MADM can generate high-quality 3D translation images, outperforming previous CNN-based and Diffusion-based baseline methods. The code is available at https://github.com/tianqic/MADM.
PMID:40372846 | DOI:10.1109/TMI.2025.3570342
From North Asia to South America: Tracing the longest human migration through genomic sequencing
Science. 2025 May 15;388(6748):eadk5081. doi: 10.1126/science.adk5081. Epub 2025 May 15.
ABSTRACT
Genome sequencing of 1537 individuals from 139 ethnic groups reveals the genetic characteristics of understudied populations in North Asia and South America. Our analysis demonstrates that West Siberian ancestry, represented by the Kets and Nenets, contributed to the genetic ancestry of most Siberian populations. West Beringians, including the Koryaks, Inuit, and Luoravetlans, exhibit genetic adaptation to Arctic climate, including medically relevant variants. In South America, early migrants split into four groups-Amazonians, Andeans, Chaco Amerindians, and Patagonians-~13,900 years ago. Their longest migration led to population decline, whereas settlement in South America's diverse environments caused instant spatial isolation, reducing genetic and immunogenic diversity. These findings highlight how population history and environmental pressures shaped the genetic architecture of human populations across North Asia and South America.
PMID:40373127 | DOI:10.1126/science.adk5081
A nuclear house divided
Science. 2025 May 15;388(6748):703-704. doi: 10.1126/science.adx8689. Epub 2025 May 15.
ABSTRACT
Certain fungal plant pathogens maintain varying chromosome distributions across multiple nuclei.
PMID:40373126 | DOI:10.1126/science.adx8689
A wave of Thetis cells imparts tolerance to food antigens early in life
Science. 2025 May 15:eadp0535. doi: 10.1126/science.adp0535. Online ahead of print.
ABSTRACT
Within the intestine, peripherally-induced regulatory T (pTreg) cells play an essential role in suppressing inflammatory responses to food proteins. However, the identity of antigen-presenting cells (APC) that instruct food-specific pTreg cells is poorly understood. Here, we found that a subset of Thetis cells, TC IV, is required for food-specific pTreg cell differentiation. TC IV were almost exclusively present within mesenteric lymph nodes suggesting that the presence of TC IV underlies the phenomenon of oral tolerance. A wave of TC IV differentiation in the peri-weaning period was associated with a window of opportunity for enhanced pTreg generation in response to food antigens. Our findings indicate that TC IV may represent a therapeutic target for the treatment of food-associated allergic and inflammatory diseases.
PMID:40373113 | DOI:10.1126/science.adp0535
Chromosome-length genome assembly of the critically endangered Mountain bongo (Tragelaphus eurycerus isaaci): A resource for conservation and comparative genomics
G3 (Bethesda). 2025 May 15:jkaf109. doi: 10.1093/g3journal/jkaf109. Online ahead of print.
ABSTRACT
The Mountain bongo (Tragelaphus eurycerus isaaci), a critically endangered tragelaphine antelope native to the montane forests of Kenya, faces significant threats from habitat loss and hunting. Although the Mountain bongo is a flagship species in Kenya, the majority are found in small, isolated populations of less than 100 animals total, making it a species of high conservation concern. In this report, we present a chromosome-length draft genome assembly for the Mountain bongo, generated using a combination of linked-read and proximity ligation (Hi-C) sequencing techniques. The assembly resulted in a 2.96 Gb sized genome with a contig N50 of 79.5 kb and a scaffold N50 of 192 Mb. Assembly completeness was 95.1% based on 12,234 Benchmarking Universal Single-Copy Orthologs (BUSCO) and annotation revealed 29,820 protein coding genes, of which 27,761 were functionally annotated, and a repetitive content of 47.31%. Synteny analysis against the domestic cattle (Bos taurus) genome assembly revealed numerous chromosomal rearrangements between the two species. Our analysis also revealed insights into the evolutionary and demographic history of the Mountain bongo, offering valuable information for conservation management. We also assembled and annotated the mitochondrial genome which showed <1% differences from the Lowland bongo subspecies, T. e. eurycerus. By integrating genomic data with traditional conservation methods, this reference lays the foundation to evaluate and preserve genetic diversity of both in-situ and ex-situ populations of the Mountain bongo amidst growing environmental pressures.
PMID:40373072 | DOI:10.1093/g3journal/jkaf109
Diagnostic, prognostic, and immunological roles of FUT8 in lung adenocarcinoma and lung squamous cell carcinoma
PLoS One. 2025 May 15;20(5):e0321756. doi: 10.1371/journal.pone.0321756. eCollection 2025.
ABSTRACT
Lung cancer remains the leading cause of malignant tumors worldwide in terms of the incidence and mortality, posing a significant threat to human health. Given that distant metastases typically occur at the time of initial diagnosis, leading to a poor 5-year survival rate among patients, it is crucial to identify markers for diagnosis, prognosis, and therapeutic efficacy monitoring. Abnormal glycosylation is a hallmark of cancer cells, characterized by the disruption of core fucosylation, which is predominantly driven by the enzyme fucosyltransferase 8 (FUT8). Evidence indicates that FUT8 is a pivotal enzyme in cancer onset and progression, influencing cellular glycosylation pathways. Utilizing bioinformatics approaches, we have investigated FUT8 in lung cancer, resulting in a more systematic and comprehensive understanding of its role in the disease's pathogenesis. In this study, we employed bioinformatics to analyze the differential expression of FUT8 between LUAD and LUSC. We observed upregulation of FUT8 in both LUAD and LUSC, associated with unfavorable prognosis, and higher diagnostic utility in LUAD. GO/KEGG analysis revealed a primary association between LUAD and the spliceosome. Immunologically, FUT8 expression was significantly associated with immune cell infiltration and immune checkpoint activity, with a notable positive correlation with M2 macrophage infiltration. Our analysis of FUT8 indicates that it may serve as a potential biomarker for lung cancer diagnosis and prognosis, and could represent a therapeutic target for LUAD and LUSC immunotherapy.
PMID:40373023 | DOI:10.1371/journal.pone.0321756
Identification, validation, and characterization of approved and investigational drugs interfering with the SARS-CoV-2 endoribonuclease Nsp15
Protein Sci. 2025 Jun;34(6):e70156. doi: 10.1002/pro.70156.
ABSTRACT
Since the emergence of SARS-CoV-2 at the end of 2019, the virus has caused significant global health and economic disruptions. Despite the rapid development of antiviral vaccines and some approved treatments such as remdesivir and paxlovid, effective antiviral pharmacological treatments for COVID-19 patients remain limited. This study explores Nsp15, a 3'-uridylate-specific RNA endonuclease, which has a critical role in immune system evasion and hence in escaping the innate immune sensors. We conducted a comprehensive drug repurposing screen and identified 44 compounds that showed more than 55% inhibition of Nsp15 activity in a real-time fluorescence assay. A validation pipeline was employed to exclude unspecific interactions, and dose-response assays confirmed 29 compounds with an IC50 below 10 μM. Structural studies, including molecular docking and x-ray crystallography, revealed key interactions of identified inhibitors, such as TAS-103 and YM-155, with the Nsp15 active site and other critical regions. Our findings show that the identified compounds, particularly those retaining potency under different assay conditions, could serve as promising hits for developing Nsp15 inhibitors. Additionally, the study emphasizes the potential of combination therapies targeting multiple viral processes to enhance treatment efficacy and reduce the risk of drug resistance. This research contributes to the ongoing efforts to develop effective antiviral therapies for SARS-CoV-2 and possibly other coronaviruses.
PMID:40371758 | DOI:10.1002/pro.70156
In-silico discovery of type-2 diabetes-causing host key genes that are associated with the complexity of monkeypox and repurposing common drugs
Brief Bioinform. 2025 May 1;26(3):bbaf215. doi: 10.1093/bib/bbaf215.
ABSTRACT
Monkeypox (Mpox) is a major global human health threat after COVID-19. Its treatment becomes complicated with type-2 diabetes (T2D). It may happen due to the influence of both disease-causing common host key genes (cHKGs). Therefore, it is necessary to explore both disease-causing cHKGs to reveal their shared pathogenetic mechanisms and candidate drugs as their common treatments without adverse side effect. This study aimed to address these issues. At first, 3 transcriptomics datasets for each of Mpox and 6 T2D datasets were analyzed and found 52 common host differentially expressed genes (cHDEGs) that can separate both T2D and Mpox patients from the control samples. Then top-ranked six cHDEGs (HSP90AA1, B2M, IGF1R, ALD1HA1, ASS1, and HADHA) were detected as the T2D-causing cHKGs that are associated with the complexity of Mpox through the protein-protein interaction network analysis. Then common pathogenetic processes between T2D and Mpox were disclosed by cHKG-set enrichment analysis with biological processes, molecular functions, cellular components and Kyoto Encyclopedia of Genes and Genomes pathways, and regulatory network analysis with transcription factors and microRNAs. Finally, cHKG-guided top-ranked three drug molecules (tecovirimat, vindoline, and brincidofovir) were recommended as the repurposable common therapeutic agents for both Mpox and T2D by molecular docking. The absorption, distribution, metabolism, excretion, and toxicity and drug-likeness analysis of these drug molecules indicated their good pharmacokinetics properties. The 100-ns molecular dynamics simulation results (root mean square deviation, root mean square fluctuation, and molecular mechanics generalized born surface area) with the top-ranked three complexes ASS1-tecovirimat, ALDH1A1-vindoline, and B2M-brincidofovir exhibited good pharmacodynamics properties. Therefore, the results provided in this article might be important resources for diagnosis and therapies of Mpox patients who are also suffering from T2D.
PMID:40370100 | DOI:10.1093/bib/bbaf215
Trends in Cystic Fibrosis-Related Diabetes Epidemiology Between 2003 and 2018 From the U.S. Cystic Fibrosis Foundation Patient Registry
Diabetes Care. 2025 May 15:dc250044. doi: 10.2337/dc25-0044. Online ahead of print.
ABSTRACT
OBJECTIVE: A number of disease-modifying therapies have been introduced for people with cystic fibrosis (CF) over the past two decades. The cumulative effects of this changing landscape on CF-related diabetes (CFRD) are unclear. We examined trends in CFRD epidemiology over time using data from the U.S. Cystic Fibrosis Foundation Patient Registry (CFFPR).
RESEARCH DESIGN AND METHODS: CFFPR data from 2003 to 2018 were queried to determine annual screening, incidence, and prevalence rates of CFRD. Individuals with incident CFRD were compared with individuals without CFRD. Survival analyses were performed to estimate the cumulative hazard of CFRD given predictors of interest over the 15 years of study. Data were also grouped into three time periods (2003-2008, 2009-2013, and 2014-2018) to investigate whether the hazard of developing CFRD varied over time.
RESULTS: CFRD screening rates increased from 2003 to 2018, particularly in 10- to 18-year-olds. Although screening rates increased in adults, overall rates remained low. In 10- to 18-year-olds, the incidence of CFRD was stable over time, while incident cases in adults steadily decreased, approaching incident rates in adolescents. Despite this, the prevalence of CFRD has gradually increased in adults, likely reflecting increased longevity. Age, female sex, Black race, severe mutation class, liver disease, poorer lung function, pancreatic insufficiency, enteric feeds, and low and high BMI were all risk factors associated with CFRD.
CONCLUSIONS: Findings support the need for the development of tailored CFRD screening algorithms and increased subspecialists to care for a growing population of adults with CF and CF-associated comorbidities.
PMID:40372381 | DOI:10.2337/dc25-0044
Ivacaftor affects the susceptibility of standard-of-care drugs used to treat <em>Mycobacterium abscessus</em> lung disease
Antimicrob Agents Chemother. 2025 May 15:e0003025. doi: 10.1128/aac.00030-25. Online ahead of print.
ABSTRACT
Patients with cystic fibrosis have dysfunctional cystic fibrosis transmembrane conductance regulator (CFTR), and this predisposes them to nontuberculous mycobacteria (NTM), including Mycobacterium abscessus (MAB), infection. We found that one of the CFTR modulators, ivacaftor, kills MAB in a concentration-dependent manner, with killing efficacy comparable to amikacin and imipenem, drugs in guideline-based regimens. Using clinical isolates of MAB, amikacin 1/4× MIC concentration combined with ivacaftor killed 2.67 log10 CFU/mL MAB.
PMID:40372084 | DOI:10.1128/aac.00030-25
Nasal rinsing with probiotics in rhinosinusitis - analysis of symptoms and safety assessment
Otolaryngol Pol. 2025 Mar 19;79(3):1-8. doi: 10.5604/01.3001.0055.0503.
ABSTRACT
<b>Introduction:</b> In the pathophysiology of chronic upper respiratory tract inflammation, an important role is attributed to the disturbances of the patient's microbiome in terms of diversity and functioning, to the decreased abundance of commensal bacteria and the increase of pathogenic bacteria. In recent years, there has been growing scientific interest in the role of probiotics - administered both locally and orally - in the management of various diseases, particularly inflammatory conditions such as chronic rhinosinusitis. <br><br><b>Aim:</b> To assess the use of nasal rinsing with probiotics in patients with rhinitis and rhinosinusitis (primary and secondary). <br><br><b>Material and methods:</b> A total of 51 patients (31 women and 20 men) were included in the study, including 24 patients with granulomatosis with polyangiitis during immunosuppressive therapy (12 women and 12 men) and 27 patients (19 women and 8 men) with rhinitis (chronic rhinosinusitis with polyps, chronic rhinosinusitis without nasal polyps, atrophic rhinitis with nasal septum perforation, and allergic rhinitis). Exclusion criteria were: cystic fibrosis, primary ciliary dyskinesia, pregnancy, severe lung, heart, kidney disease, use of oral probiotics, use of intranasal probiotics in the last 6 months, sinus surgery in the last 6 months, lack of consent to participate in the study, antibiotic therapy in the last 2 months. Patients were scheduled to undergo nasal rinsing with a probiotic solution, with the following parameters assessed before and after the procedure: SNOT-22 scores and the severity of nasal lesions according to the Lund-Kennedy scale. In the group of patients with rhinitis, the ENS-6 questionnaire was also conducted and symptoms assessed on the VAS scale (visual analogue scale): nasal discharge, nasal obstruction, facial pain, impaired sense of smell, nasal irritation, nasal itching, and severity of crusting. <br><br><b>Results:</b> The study showed that nasal rinsing with a probiotic solution is well tolerated and does not cause any adverse effects. In both groups, a reduction in symptoms was observed based on the SNOT-22 questionnaire (p = 0.002 in GPA, ns - in rhinitis/ rhinosinusitis). According to the Lund-Kennedy scale, the reduction in the intensity of changes in both groups was statistically significant. In addition, patients with primary rhinitis or rhinosinusitis also experienced a reduction in nasal mucosa irritation and crusting intranasal (p<0.05). <br><br><b>Conclusions:</b> Probiotic nasal rinsing appears to have a beneficial effect on the condition of the nasal mucosa in patients with both primary and secondary (GPA-related) rhinosinusitis and is generally well tolerated.
PMID:40371957 | DOI:10.5604/01.3001.0055.0503
Abridging the 22-Item Sino-Nasal Outcome Test (SNOT-22) in People With Cystic Fibrosis: Limiting Survey Burden
Int Forum Allergy Rhinol. 2025 May 15:e23591. doi: 10.1002/alr.23591. Online ahead of print.
NO ABSTRACT
PMID:40371719 | DOI:10.1002/alr.23591
Evidence of secondary Notch signaling within the rat small intestine
Development. 2025 May 15:dev.204277. doi: 10.1242/dev.204277. Online ahead of print.
ABSTRACT
The small intestine is well known for its nutrient-absorbing enterocytes; yet equally critical for homeostasis is a diverse set of secretory cells, all presumed to originate from the same intestinal stem cell. Despite their major roles in intestinal function and health, understanding how the full spectrum of secretory cell types arises remains a longstanding challenge, largely due to their comparative rarity. Here, we investigate the specification of a rare population of small intestinal epithelial cells found in rats and humans but not mice: CFTR High Expressers (CHEs). Using pseudotime trajectory analysis of single-cell RNA-seq data from rat jejunum, we provide evidence that CHEs are specified along the secretory lineage and appear to employ a second wave of Notch-based signaling to distinguish themselves from other secretory cells. We validate the transcription factors directing these cells from crypt progenitors and demonstrate that Notch signaling is necessary to induce CHE fate in vivo and in vitro. Our findings suggest that Notch reactivation along the secretory lineage specifies CHEs, which may help regulate luminal pH and have direct relevance in cystic fibrosis pathophysiology.
PMID:40371707 | DOI:10.1242/dev.204277
HVSeeker: a deep-learning-based method for identification of host and viral DNA sequences
Gigascience. 2025 Jan 6;14:giaf037. doi: 10.1093/gigascience/giaf037.
ABSTRACT
BACKGROUND: Bacteriophages are among the most abundant organisms on Earth, significantly impacting ecosystems and human society. The identification of viral sequences, especially novel ones, from mixed metagenomes is a critical first step in analyzing the viral components of host samples. This plays a key role in many downstream tasks. However, this is a challenging task due to their rapid evolution rate. The identification process typically involves two steps: distinguishing viral sequences from the host and identifying if they come from novel viral genomes. Traditional metagenomic techniques that rely on sequence similarity with known entities often fall short, especially when dealing with short or novel genomes. Meanwhile, deep learning has demonstrated its efficacy across various domains, including the bioinformatics field.
RESULTS: We have developed HVSeeker-a host/virus seeker method-based on deep learning to distinguish between bacterial and phage sequences. HVSeeker consists of two separate models: one analyzing DNA sequences and the other focusing on proteins. In addition to the robust architecture of HVSeeker, three distinct preprocessing methods were introduced to enhance the learning process: padding, contigs assembly, and sliding window. This method has shown promising results on sequences with various lengths, ranging from 200 to 1,500 base pairs. Tested on both NCBI and IMGVR databases, HVSeeker outperformed several methods from the literature such as Seeker, Rnn-VirSeeker, DeepVirFinder, and PPR-Meta. Moreover, when compared with other methods on benchmark datasets, HVSeeker has shown better performance, establishing its effectiveness in identifying unknown phage genomes.
CONCLUSIONS: These results demonstrate the exceptional structure of HVSeeker, which encompasses both the preprocessing methods and the model design. The advancements provided by HVSeeker are significant for identifying viral genomes and developing new therapeutic approaches, such as phage therapy. Therefore, HVSeeker serves as an essential tool in prokaryotic and phage taxonomy, offering a crucial first step toward analyzing the host-viral component of samples by identifying the host and viral sequences in mixed metagenomes.
PMID:40372723 | DOI:10.1093/gigascience/giaf037
Application of deep learning with fractal images to sparse-view CT
Int J Comput Assist Radiol Surg. 2025 May 15. doi: 10.1007/s11548-025-03378-1. Online ahead of print.
ABSTRACT
PURPOSE: Deep learning has been widely used in research on sparse-view computed tomography (CT) image reconstruction. While sufficient training data can lead to high accuracy, collecting medical images is often challenging due to legal or ethical concerns, making it necessary to develop methods that perform well with limited data. To address this issue, we explored the use of nonmedical images for pre-training. Therefore, in this study, we investigated whether fractal images could improve the quality of sparse-view CT images, even with a reduced number of medical images.
METHODS: Fractal images generated by an iterated function system (IFS) were used for nonmedical images, and medical images were obtained from the CHAOS dataset. Sinograms were then generated using 36 projections in sparse-view and the images were reconstructed by filtered back-projection (FBP). FBPConvNet and WNet (first module: learning fractal images, second module: testing medical images, and third module: learning output) were used as networks. The effectiveness of pre-training was then investigated for each network. The quality of the reconstructed images was evaluated using two indices: structural similarity (SSIM) and peak signal-to-noise ratio (PSNR).
RESULTS: The network parameters pre-trained with fractal images showed reduced artifacts compared to the network trained exclusively with medical images, resulting in improved SSIM. WNet outperformed FBPConvNet in terms of PSNR. Pre-training WNet with fractal images produced the best image quality, and the number of medical images required for main-training was reduced from 5000 to 1000 (80% reduction).
CONCLUSION: Using fractal images for network training can reduce the number of medical images required for artifact reduction in sparse-view CT. Therefore, fractal images can improve accuracy even with a limited amount of training data in deep learning.
PMID:40372595 | DOI:10.1007/s11548-025-03378-1
DeepAllo: Allosteric Site Prediction using Protein Language Model (pLM) with Multitask Learning
Bioinformatics. 2025 May 15:btaf294. doi: 10.1093/bioinformatics/btaf294. Online ahead of print.
ABSTRACT
MOTIVATION: Allostery, the process by which binding at one site perturbs a distant site, is being rendered as a key focus in the field of drug development with its substantial impact on protein function. The identification of allosteric pockets (sites) is a challenging task and several techniques have been developed, including Machine Learning (ML) to predict allosteric pockets that utilize both static and pocket features.
RESULTS: Our work, DeepAllo, is the first study that combines fine-tuned protein language model (pLM) with FPocket features and shows an increase in prediction performance of allosteric sites over previous studies. The pLM model was fine-tuned on Allosteric Dataset (ASD) in Multitask Learning (MTL) setting and was further used as a feature extractor to train XGBoost and AutoML models. The best model predicts allosteric pockets with 89.66% F1 score and 90.5% of allosteric pockets in the top 3 positions, outperforming previous results. A case study has been performed on proteins with known allosteric pockets, which shows the proof of our approach. Moreover, an effort was made to explain the pLM by visualizing its attention mechanism among allosteric and non-allosteric residues.
AVAILABILITY: The source code is available on GitHub (https://github.com/MoaazK/deepallo) and archived on Zenodo (DOI: 10.5281/zenodo.15255379). The trained model is hosted on Hugging Face (DOI: 10.57967/hf/5198). The dataset used for training and evaluation is archived on Zenodo (DOI: 10.5281/zenodo.15255437).
SUPPLEMENTARY INFORMATION: Supplementary data, including the full list of proteins used in the study with their PDB IDs, t-SNE analysis of pocket features, confusion matrix breakdown, and interpretation of borderline classification cases are available as supplementary material along this article.
PMID:40372465 | DOI:10.1093/bioinformatics/btaf294
Video-estimated peak jump power using deep learning is associated with sarcopenia and low physical performance in adults
Osteoporos Int. 2025 May 15. doi: 10.1007/s00198-025-07515-z. Online ahead of print.
ABSTRACT
Video-estimated peak jump power (vJP) using deep learning showed strong agreement with ground truth jump power (gJP). vJP was associated with sarcopenia, age, and muscle parameters in adults, with providing a proof-of-concept that markerless monitoring of peak jump power could be feasible in daily life space.
OBJECTIVES: Low peak countermovement jump power measured by ground force plate (GFP) is associated with sarcopenia, impaired physical function, and elevated risk of fracture in older adults. GFP is available at research setting yet, limiting its clinical applicability. Video-based estimation of peak jump power could enhance clinical applicability of jump power measurement over research setting.
METHODS: Data were collected prospectively in osteoporosis clinic of Severance Hospital, Korea, between March and August 2022. Individuals performed three jump attempts on GFP (ground truth, gJP) under video recording, along with measurement of handgrip strength (HGS), 5-time chair rise (CRT) test, and appendicular lean mass (ALM). Open source deep learning pose estimation and machine learning algorithms were used to estimate video-estimated peak jump power (vJP) in 80% train set. Sarcopenia was defined by Korean Working Group for Sarcopenia 2023 definition.
RESULTS: A total of 658 jump motion data from 220 patients (mean age 62 years; 77% women; sarcopenia 19%) were analyzed. In test set (20% hold-out set), average difference between predicted and actual jump power was 0.27 W/kg (95% limit of agreement - 5.01 to + 5.54 W/kg; correlation coefficient 0.93). vJP detected gJP-defined low jump power with 81.8% sensitivity and 91.3% specificity. vJP showed a steep decline across age like gJP, with modest to strong correlation with HGS and CRT. Eight landmarks (both shoulders, hip, knee joints, and ears) were the most contributing features to vJP estimation. vJP was associated with the presence of sarcopenia (unadjusted and adjusted, - 3.95 and - 2.30 W/kg), HGS (- 3.69 and - 1.96 W/kg per 1 SD decrement), and CRT performance (- 2.79 and - 1.87 W/kg per 1 SD decrement in log-CRT) similar to that of gJP.
CONCLUSION: vJP was associated with sarcopenia, age, and muscle parameters in adults, with good agreement with ground truth jump power.
PMID:40372459 | DOI:10.1007/s00198-025-07515-z
Pages
