Deep learning
EchoSegDiff: a diffusion-based model for left ventricular segmentation in echocardiography
Med Biol Eng Comput. 2024 Dec 14. doi: 10.1007/s11517-024-03255-0. Online ahead of print.
ABSTRACT
Echocardiography is a primary tool for cardiac diagnosis. Accurate delineation of the left ventricle is a prerequisite for echocardiography-based clinical decision-making. In this work, we propose an echocardiographic left ventricular segmentation method based on the diffusion probability model, which is named EchoSegDiff. The EchoSegDiff takes an encoder-decoder structure in the reverse diffusion process. A diffusion encoder residual block (DEResblock) based on the atrous pyramid squeeze attention (APSA) block is coined as the main module of the encoder, so that the EchoSegDiff can catch multiscale features effectively. A novel feature fusion module (FFM) is further proposed, which can adaptively fuse the features from encoder and decoder to reduce semantic gap between encoder and decoder. The proposed EchoSegDiff is validated on two publicly available echocardiography datasets. In terms of left ventricular segmentation performance, it outperforms other state-of-the-art networks. The segmentation accuracy on the two datasets reached 93.69% and 89.95%, respectively. This demonstrates the excellent potential of EchoSegDiff in the task of left ventricular segmentation in echocardiography.
PMID:39672990 | DOI:10.1007/s11517-024-03255-0
Advance drought prediction through rainfall forecasting with hybrid deep learning model
Sci Rep. 2024 Dec 13;14(1):30459. doi: 10.1038/s41598-024-80099-6.
ABSTRACT
Drought is a natural disaster that can affect a larger area over time. Damage caused by the drought can only be reduced through its accurate prediction. In this context, we proposed a hybrid stacked model for rainfall prediction, which is crucial for effective drought forecasting and management. In the first layer of stacked models, Bi-directional LSTM is used to extract the features, and then in the second layer, the LSTM model will make the predictions. The model captures complex temporal dependencies by processing multivariate time series data in both forward and backward directions using bi-directional LSTM layers. Trained with the Mean Squared Error loss and Adam optimizer, the model demonstrates improved forecasting accuracy, offering significant potential for proactive drought management.
PMID:39672936 | DOI:10.1038/s41598-024-80099-6
Author Correction: AutoTransOP: translating omics signatures without orthologue requirements using deep learning
NPJ Syst Biol Appl. 2024 Dec 13;10(1):148. doi: 10.1038/s41540-024-00456-z.
NO ABSTRACT
PMID:39672816 | DOI:10.1038/s41540-024-00456-z
Artificial intelligence improves mammography-based breast cancer risk prediction
Trends Cancer. 2024 Nov 5:S2405-8033(24)00226-7. doi: 10.1016/j.trecan.2024.10.007. Online ahead of print.
ABSTRACT
Artificial intelligence (AI) is enabling us to delve deeply into the information inherent in a mammogram and identify novel features associated with high risk of a future breast cancer diagnosis. Here, we discuss how AI is improving mammographic density-associated risk prediction and shaping the future of screening and risk-reducing strategies.
PMID:39672755 | DOI:10.1016/j.trecan.2024.10.007
Artificial intelligence in emergency neuroradiology: Current applications and perspectives
Diagn Interv Imaging. 2024 Dec 12:S2211-5684(24)00257-2. doi: 10.1016/j.diii.2024.11.002. Online ahead of print.
ABSTRACT
Emergency neuroradiology provides rapid diagnostic decision-making and guidance for management for a wide range of acute conditions involving the brain, head and neck, and spine. This narrative review aims at providing an up-to-date discussion about the state of the art of applications of artificial intelligence in emergency neuroradiology, which have substantially expanded in depth and scope in the past few years. A detailed analysis of machine learning and deep learning algorithms in several tasks related to acute ischemic stroke involving various imaging modalities, including a description of existing commercial products, is provided. The applications of artificial intelligence in acute intracranial hemorrhage and other vascular pathologies such as intracranial aneurysm and arteriovenous malformation are discussed. Other areas of emergency neuroradiology including infection, fracture, cord compression, and pediatric imaging are further discussed in turn. Based on these discussions, this article offers insight into practical considerations regarding the applications of artificial intelligence in emergency neuroradiology, calling for more development driven by clinical needs, attention to pediatric neuroimaging, and analysis of real-world performance.
PMID:39672753 | DOI:10.1016/j.diii.2024.11.002
A Novel Approach in Cancer Diagnosis: Integrating Holography Microscopic Medical Imaging and Deep Learning Techniques - Challenges and Future Trends
Biomed Phys Eng Express. 2024 Dec 13. doi: 10.1088/2057-1976/ad9eb7. Online ahead of print.
ABSTRACT
Medical imaging is pivotal in early disease diagnosis, providing essential insights that enable timely and accurate detection of health anomalies. Traditional imaging techniques, such as Magnetic Resonance Imaging (MRI), Computer Tomography (CT), ultrasound, and Positron Emission Tomography (PET), offer vital insights into three-dimensional structures but frequently fall short of delivering a comprehensive and detailed anatomical analysis, capturing only amplitude details. Three-dimensional holography microscopic medical imaging provides a promising solution by capturing the amplitude (brightness) and phase (structural information) details of biological structures. In this study, we investigate the novel collaborative potential of Deep Learning (DL) and holography microscopic phase imaging for cancer diagnosis. The study comprehensively examines existing literature, analyzes advancements, identifies research gaps, and proposes future research directions in cancer diagnosis through the integrated Quantitative Phase Imaging (QPI) and DL methodology. This novel approach addresses a critical limitation of traditional imaging by capturing detailed structural information, paving the way for more accurate diagnostics. The proposed approach comprises tissue sample collection, holographic image scanning, pre-processing in case of imbalanced datasets, and training on annotated datasets using DL architectures like U-Net and Vision Transformer(ViT's). Furthermore, sophisticated concepts in DL, like the incorporation of Explainable AI techniques (XAI), are suggested for comprehensive disease diagnosis and identification. The study thoroughly investigates the advantages of integrating holography imaging and DL for precise cancer diagnosis. Additionally, meticulous insights are presented by identifying the challenges associated with this integration methodology.
PMID:39671712 | DOI:10.1088/2057-1976/ad9eb7
Real-World and Clinical Trial Validation of a Deep Learning Radiomic Biomarker for PD-(L)1 Immune Checkpoint Inhibitor Response in Advanced Non-Small Cell Lung Cancer
JCO Clin Cancer Inform. 2024 Dec;8:e2400133. doi: 10.1200/CCI.24.00133. Epub 2024 Dec 13.
ABSTRACT
PURPOSE: This study developed and validated a novel deep learning radiomic biomarker to estimate response to immune checkpoint inhibitor (ICI) therapy in advanced non-small cell lung cancer (NSCLC) using real-world data (RWD) and clinical trial data.
MATERIALS AND METHODS: Retrospective RWD of 1,829 patients with advanced NSCLC treated with PD-(L)1 ICIs were collected from 10 academic and community institutions in the United States and Europe. The RWD included data sets for discovery (Data Set A-Discovery, n = 1,173) and independent test (Data Set B, n = 458). A radiomic pipeline, containing a deep learning feature extractor and a survival model, generated the computed tomography (CT) response score (CTRS) applied to the pretreatment routine CT/positron emission tomography (PET)-CT scan. An enhanced CTRS (eCTRS) also incorporated age, sex, treatment line, and lesion annotations. Performance was evaluated against progression-free survival (PFS) and overall survival (OS). Biomarker generalizability was further evaluated using a secondary analysis of a prospective clinical trial (ClinicalTrials.gov identifier: NCT02573259) evaluating the PD-1 inhibitor sasanlimab in second or later line of treatment (Data Set C, n = 54).
RESULTS: In RWD Test Data Set B, the CTRS identified patients with a high probability of response to ICI with a PFS hazard ratio (HR) of 0.46 (95% CI, 0.26 to 0.82) and an OS HR of 0.50 (95% CI, 0.28 to 0.92) in the first-line ICI monotherapy cohort, after adjustment for baseline covariates including the PD-L1 tumor proportion score. In Clinical Trial Data Set C, the CTRS demonstrated an adjusted PFS HR of 1.03 (95% CI, 0.43 to 2.47) and an OS HR of 0.33 (95% CI, 0.14 to 0.91). The CTRS and eCTRS outperformed traditional imaging biomarkers of lesion size in PFS and OS for RWD Test Data Set B and in OS for the Clinical Trial Data Set.
CONCLUSION: The study developed and validated a deep learning radiomic biomarker using pretreatment routine CT/PET-CT scans to identify ICI benefit in advanced NSCLC.
PMID:39671539 | DOI:10.1200/CCI.24.00133
Advanced vision transformers and open-set learning for robust mosquito classification: A novel approach to entomological studies
PLoS Comput Biol. 2024 Dec 13;20(12):e1012654. doi: 10.1371/journal.pcbi.1012654. eCollection 2024 Dec.
ABSTRACT
Mosquito-related diseases pose a significant threat to global public health, necessitating efficient and accurate mosquito classification for effective surveillance and control. This work presents an innovative approach to mosquito classification by leveraging state-of-the-art vision transformers and open-set learning techniques. A novel framework has been introduced that integrates Transformer-based deep learning models with comprehensive data augmentation and preprocessing methods, enabling robust and precise identification of ten mosquito species. The Swin Transformer model achieves the best performance for traditional closed-set learning with 99.60% accuracy and 0.996 F1 score. The lightweight MobileViT technique attains an almost equivalent accuracy of 98.90% with significantly reduced parameters and model complexities. Next, the applied deep learning models' adaptability and generalizability in a static environment have been enhanced by using new classes of data samples during the inference stage that have not been included in the training set. The proposed framework's ability to handle unseen classes like insects similar to mosquitoes, even humans, through open-set learning further enhances its practical applicability employing the OpenMax technique and Weibull distribution. The traditional CNN model, Xception, outperforms the latest transformer with higher accuracy and F1 score for open-set learning. The study's findings highlight the transformative potential of advanced deep-learning architectures in entomology, providing a strong groundwork for future research and development in mosquito surveillance and vector control. The implications of this work extend beyond mosquito classification, offering valuable insights for broader ecological and environmental monitoring applications.
PMID:39671336 | DOI:10.1371/journal.pcbi.1012654
Deep learning-based prediction of tumor aggressiveness in RCC using multiparametric MRI: a pilot study
Int Urol Nephrol. 2024 Dec 13. doi: 10.1007/s11255-024-04300-5. Online ahead of print.
ABSTRACT
OBJECTIVE: To investigate the value of multiparametric magnetic resonance imaging (MRI) as a non-invasive method to predict the aggressiveness of renal cell carcinoma (RCC) by developing a convolutional neural network (CNN) model and fusing it with clinical characteristics.
METHODS: Multiparametric abdominal MRI was performed on 47 pathologically confirmed RCC patients between 2019 and 2023. Preoperative MRI was performed on all patients to assess their clinical characteristics. The CNN model was developed and validated to assess the predictive value of b value images, combined b value images, apparent diffusion coefficient (ADC), intravoxel incoherent motion (IVIM), diffusion kurtosis imaging (DKI), and their parametric maps for RCC aggressiveness. The least absolute shrinkage and selection operator (LASSO) regression was used to identify clinical features highly correlated with RCC aggressiveness. These clinical features were combined with selected b values to develop a fusion model. All models were evaluated using receiver operating characteristic (ROC) curve analysis.
RESULTS: A total of 47 patients (mean age, 56.17 ± 1.70 years; 37 men, 10 women) were evaluated. LASSO regression identified renal sinus/perirenal fat invasion, tumor stage, and tumor size as the most significant clinical features. The combined b values of b = 0,1000 achieved an area under the curve (AUC) of 0.642 (95% CI: 0.623-0.661), and b = 0,100,1000 achieved an AUC of 0.657 (95% CI: 0.647-0.667). The fusion model combining clinical features with b = 0,1000 yielded the highest performance with an AUC of 0.861 (95% CI: 0.667-0.992), demonstrating superior predictive accuracy compared to the other models.
CONCLUSION: Deep learning using a CNN fusion model, integrating multiple b value images and clinical features, could effectively promote the preoperative prediction of tumor aggressiveness in RCC patients.
PMID:39671158 | DOI:10.1007/s11255-024-04300-5
Deep learning for segmentation of colorectal carcinomas on endoscopic ultrasound
Tech Coloproctol. 2024 Dec 13;29(1):20. doi: 10.1007/s10151-024-03056-5.
ABSTRACT
BACKGROUND: Bowel-preserving local resection of early rectal cancer is less successful if the tumor infiltrates the muscularis propria as opposed to submucosal infiltration only. Magnetic resonance imaging currently lacks the spatial resolution to provide a reliable estimation of the infiltration depth. Endoscopic ultrasound (EUS) has better resolution, but its interpretation is investigator dependent. We hypothesize that automated image segmentation of EUS could be a way to standardize EUS interpretation.
METHODS: EUS media and outcome data were collected prospectively. Based on 373 expert manual segmentations, a convolutional neural network was developed to perform segmentation of the submucosa, muscularis propria, and tumors. The mean surface distance (MSD), maximal distance between segmentations (Hausdorff distance; HDD), and overlap (Dice similarity index; DSI) were calculated.
RESULTS: The median MSD and HDD values were 3.2 and 17.7 pixels for the tumor, 3.4 and 24.7 pixels for the submucosa, and 2.6 and 20.0 pixels for the muscularis propria, respectively. The median DSI values for the tumor, submucosa, and muscularis propria were 0.82, 0.57, and 0.59, respectively. These values reflect good agreement between manual and deep learning segmentation.
CONCLUSIONS: This study found encouraging results of using automated analysis of EUS images of early rectal cancer, supporting further exploration in clinical practice.
PMID:39671056 | DOI:10.1007/s10151-024-03056-5
Development of Periapical Index Score Classification System in Periapical Radiographs Using Deep Learning
J Imaging Inform Med. 2024 Dec 13. doi: 10.1007/s10278-024-01360-y. Online ahead of print.
ABSTRACT
Periapical index (PAI) scoring system is the most popular index for evaluating apical periodontitis (AP) on radiographs. It provides an ordinal scale of 1 to 5, ranging from healthy to severe AP. Scoring PAI is a time-consuming process and requires experienced dentists; thus, deep learning has been applied to hasten the process. However, most models failed to score the early stage of AP or the score 2 accurately since it shares very similar characteristics with its adjacent scores. In this study, we developed and compared binary classification methods for PAI scores which were normality classification method and health-disease classification method. The normality classification method classified PAI score 1 as Normal and Abnormal for the rest of the scores while the health-disease method classified PAI scores 1 and 2 as Healthy and Diseased for the rest of the scores. A total of 2266 periapical root areas (PRAs) from 520 periapical radiographs (Pas) were selected and scored by experts. GoogLeNet, AlexNet, and ResNet convolutional neural networks (CNNs) were used in this study. Trained models' performances were evaluated and then compared. The models in the normality classification method achieved the highest accuracy of 75.00%, while the health-disease method models performed better with the highest accuracy of 83.33%. In conclusion, CNN models performed better in classification when grouping PAI scores 1 and 2 together in the same class, supporting the health-disease PAI scoring usage in clinical practice.
PMID:39671050 | DOI:10.1007/s10278-024-01360-y
Automated identification of Chagas disease vectors using AlexNet pre-trained convolutional neural networks
Med Vet Entomol. 2024 Dec 13. doi: 10.1111/mve.12780. Online ahead of print.
ABSTRACT
The 158 bug species that make up the subfamily Triatominae are the potential vectors of Trypanosoma cruzi, the etiological agent of Chagas disease. Despite recent progress in developing a picture-based automated system for identification of triatomines, an extensive and diverse image database is required for a broadly useful automated application for identifying these vectors. We evaluated performance of a deep-learning network (AlexNet) for identifying triatomine species from a database of dorsal images of adult insects. We used a sample of photos of 6397 triatomines belonging to seven genera and 65 species from 27 countries. AlexNet had an accuracy of ~0.93 (95% confidence interval [CI], 0.91-0.94) for identifying triatomine species from pictures of varying resolutions. Highest specific accuracy was observed for 21 species in the genera Rhodnius and Panstrongylus. AlexNet performance improved to ~0.95 (95% CI, 0.93-0.96) when only the species with highest vectorial capacity were considered. These results show that AlexNet, when trained with a large, diverse, and well-structured picture set, exhibits excellent performance for identifying triatomine species. This study contributed to the development of an automated Chagas disease vector identification system.
PMID:39670626 | DOI:10.1111/mve.12780
A Dynamic Model for Early Prediction of Alzheimer's Disease by Leveraging Graph Convolutional Networks and Tensor Algebra
Pac Symp Biocomput. 2025;30:675-689.
ABSTRACT
Alzheimer's disease (AD) is a neurocognitive disorder that deteriorates memory and impairs cognitive functions. Mild Cognitive Impairment (MCI) is generally considered as an intermediate phase between normal cognitive aging and more severe conditions such as AD. Although not all individuals with MCI will develop AD, they are at an increased risk of developing AD. Diagnosing AD once strong symptoms are already present is of limited value, as AD leads to irreversible cognitive decline and brain damage. Thus, it is crucial to develop methods for the early prediction of AD in individuals with MCI. Recurrent Neural Networks (RNN)-based methods have been effectively used to predict the progression from MCI to AD by analyzing electronic health records (EHR). However, despite their widespread use, existing RNN-based tools may introduce increased model complexity and often face difficulties in capturing long-term dependencies. In this study, we introduced a novel Dynamic deep learning model for Early Prediction of AD (DyEPAD) to predict MCI subjects' progression to AD utilizing EHR data. In the first phase of DyEPAD, embeddings for each time step or visit are captured through Graph Convolutional Networks (GCN) and aggregation functions. In the final phase, DyEPAD employs tensor algebraic operations for frequency domain analysis of these embeddings, capturing the full scope of evolutionary patterns across all time steps. Our experiments on the Alzheimer's Disease Neuroimaging Initiative (ADNI) and National Alzheimer's Coordinating Center (NACC) datasets demonstrate that our proposed model outperforms or is in par with the state-of-the-art and baseline methods.
PMID:39670404
Enhancing Privacy-Preserving Cancer Classification with Convolutional Neural Networks
Pac Symp Biocomput. 2025;30:565-579.
ABSTRACT
Precision medicine significantly enhances patients prognosis, offering personalized treatments. Particularly for metastatic cancer, incorporating primary tumor location into the diagnostic process greatly improves survival rates. However, traditional methods rely on human expertise, requiring substantial time and financial resources. To address this challenge, Machine Learning (ML) and Deep Learning (DL) have proven particularly effective. Yet, their application to medical data, especially genomic data, must consider and encompass privacy due to the highly sensitive nature of data. In this paper, we propose OGHE, a convolutional neural network-based approach for privacy-preserving cancer classification designed to exploit spatial patterns in genomic data, while maintaining confidentiality by means of Homomorphic Encryption (HE). This encryption scheme allows the processing directly on encrypted data, guaranteeing its confidentiality during the entire computation. The design of OGHE is specific for privacy-preserving applications, taking into account HE limitations from the outset, and introducing an efficient packing mechanism to minimize the computational overhead introduced by HE. Additionally, OGHE relies on a novel feature selection method, VarScout, designed to extract the most significant features through clustering and occurrence analysis, while preserving inherent spatial patterns. Coupled with VarScout, OGHE has been compared with existing privacy-preserving solutions for encrypted cancer classification on the iDash 2020 dataset, demonstrating their effectiveness in providing accurate privacy-preserving cancer classification, and reducing latency thanks to our packing mechanism. The code is released to the scientific community.
PMID:39670396
Investigating the Differential Impact of Psychosocial Factors by Patient Characteristics and Demographics on Veteran Suicide Risk Through Machine Learning Extraction of Cross-Modal Interactions
Pac Symp Biocomput. 2025;30:167-184.
ABSTRACT
Accurate prediction of suicide risk is crucial for identifying patients with elevated risk burden, helping ensure these patients receive targeted care. The US Department of Veteran Affairs' suicide prediction model primarily leverages structured electronic health records (EHR) data. This approach largely overlooks unstructured EHR, a data format that could be utilized to enhance predictive accuracy. This study aims to enhance suicide risk models' predictive accuracy by developing a model that incorporates both structured EHR predictors and semantic NLP-derived variables from unstructured EHR. XGBoost models were fit to predict suicide risk- the interactions identified by the model were extracted using SHAP, validated using logistic regression models, added to a ridge regression model, which was subsequently compared to a ridge regression approach without the use of interactions. By introducing a selection parameter, α, to balance the influence of structured (α=1) and unstructured (α=0) data, we found that intermediate α values achieved optimal performance across various risk strata, improved model performance of the ridge regression approach and uncovered significant cross-modal interactions between psychosocial constructs and patient characteristics. These interactions highlight how psychosocial risk factors are influenced by individual patient contexts, potentially informing improved risk prediction methods and personalized interventions. Our findings underscore the importance of incorporating nuanced narrative data into predictive models and set the stage for future research that will expand the use of advanced machine learning techniques, including deep learning, to further refine suicide risk prediction methods.
PMID:39670369
Session Introduction: AI and Machine Learning in Clinical Medicine: Generative and Interactive Systems at the Human-Machine Interface
Pac Symp Biocomput. 2025;30:33-39.
ABSTRACT
Artificial Intelligence (AI) technologies are increasingly capable of processing complex and multilayered datasets. Innovations in generative AI and deep learning have notably enhanced the extraction of insights from both unstructured texts, images, and structured data alike. These breakthroughs in AI technology have spurred a wave of research in the medical field, leading to the creation of a variety of tools aimed at improving clinical decision-making, patient monitoring, image analysis, and emergency response systems. However, thorough research is essential to fully understand the broader impact and potential consequences of deploying AI within the healthcare sector.
PMID:39670359
A dataset of shallow soil moisture for alfalfa in the Ningxia irrigation area of the Yellow River
Front Plant Sci. 2024 Nov 28;15:1472930. doi: 10.3389/fpls.2024.1472930. eCollection 2024.
NO ABSTRACT
PMID:39670272 | PMC:PMC11634607 | DOI:10.3389/fpls.2024.1472930
Explainable light-weight deep learning pipeline for improved drought stress identification
Front Plant Sci. 2024 Nov 28;15:1476130. doi: 10.3389/fpls.2024.1476130. eCollection 2024.
ABSTRACT
INTRODUCTION: Early identification of drought stress in crops is vital for implementing effective mitigation measures and reducing yield loss. Non-invasive imaging techniques hold immense potential by capturing subtle physiological changes in plants under water deficit. Sensor-based imaging data serves as a rich source of information for machine learning and deep learning algorithms, facilitating further analysis that aims to identify drought stress. While these approaches yield favorable results, real-time field applications require algorithms specifically designed for the complexities of natural agricultural conditions.
METHODS: Our work proposes a novel deep learning framework for classifying drought stress in potato crops captured by unmanned aerial vehicles (UAV) in natural settings. The novelty lies in the synergistic combination of a pre-trained network with carefully designed custom layers. This architecture leverages the pre-trained network's feature extraction capabilities while the custom layers enable targeted dimensionality reduction and enhanced regularization, ultimately leading to improved performance. A key innovation of our work is the integration of gradient-based visualization inspired by Gradient-Class Activation Mapping (Grad-CAM), an explainability technique. This visualization approach sheds light on the internal workings of the deep learning model, often regarded as a "black box". By revealing the model's focus areas within the images, it enhances interpretability and fosters trust in the model's decision-making process.
RESULTS AND DISCUSSION: Our proposed framework achieves superior performance, particularly with the DenseNet121 pre-trained network, reaching a precision of 97% to identify the stressed class with an overall accuracy of 91%. Comparative analysis of existing state-of-the-art object detection algorithms reveals the superiority of our approach in achieving higher precision and accuracy. Thus, our explainable deep learning framework offers a powerful approach to drought stress identification with high accuracy and actionable insights.
PMID:39670267 | PMC:PMC11635298 | DOI:10.3389/fpls.2024.1476130
A method of identification and localization of tea buds based on lightweight improved YOLOV5
Front Plant Sci. 2024 Nov 28;15:1488185. doi: 10.3389/fpls.2024.1488185. eCollection 2024.
ABSTRACT
The low degree of intelligence and standardization of tea bud picking, as well as laborious and time-consuming manual harvesting, bring significant challenges to the sustainable development of the high-quality tea industry. There is an urgent need to investigate the critical technologies of intelligent picking robots for tea. The complexity of the model requires high hardware computing resources, which limits the deployment of the tea bud detection model in tea-picking robots. Therefore, in this study, we propose the YOLOV5M-SBSD tea bud lightweight detection model to address the above issues. The Fuding white tea bud image dataset was established by collecting Fuding white tea images; then the lightweight network ShuffleNetV2 was used to replace the YOLOV5 backbone network; the up-sampling algorithm of YOLOV5 was optimized by using CARAFE modular structure, which increases the sensory field of the network while maintaining the lightweight; then BiFPN was used to achieve more efficient multi-scale feature fusion; and the introduction of the parameter-free attention SimAm to enhance the feature extraction ability of the model while not adding extra computation. The improved model was denoted as YOLOV5M-SBSD and compared and analyzed with other mainstream target detection models. Then, the YOLOV5M-SBSD recognition model is experimented on with the tea bud dataset, and the tea buds are recognized using YOLOV5M-SBSD. The experimental results show that the recognition accuracy of tea buds is 88.7%, the recall rate is 86.9%, and the average accuracy is 93.1%, which is 0.5% higher than the original YOLOV5M algorithm's accuracy, the average accuracy is 0.2% higher, the Size is reduced by 82.89%, and the Params, and GFlops are reduced by 83.7% and 85.6%, respectively. The improved algorithm has higher detection accuracy while reducing the amount of computation and parameters. Also, it reduces the dependence on hardware, provides a reference for deploying the tea bud target detection model in the natural environment of the tea garden, and has specific theoretical and practical significance for the identification and localization of the intelligent picking robot of tea buds.
PMID:39670263 | PMC:PMC11634601 | DOI:10.3389/fpls.2024.1488185
Representing Part-Whole Hierarchies in Foundation Models by Learning Localizability, Composability, and Decomposability from Anatomy via Self-Supervision
Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2024 Jun;abs/210504906(2024):11269-11281. doi: 10.1109/cvpr52733.2024.01071. Epub 2024 Sep 16.
ABSTRACT
Humans effortlessly interpret images by parsing them into part-whole hierarchies; deep learning excels in learning multi-level feature spaces, but they often lack explicit coding of part-whole relations, a prominent property of medical imaging. To overcome this limitation, we introduce Adam-v2, a new self-supervised learning framework extending Adam [79] by explicitly incorporating part-whole hierarchies into its learning objectives through three key branches: (1) Localizability, acquiring discriminative representations to distinguish different anatomical patterns; (2) Composability, learning each anatomical structure in a parts-to-whole manner; and (3) Decomposability, comprehending each anatomical structure in a whole-to-parts manner. Experimental results across 10 tasks, compared to 11 baselines in zero-shot, few-shot transfer, and full fine-tuning settings, showcase Adam-v2's superior performance over large-scale medical models and existing SSL methods across diverse downstream tasks. The higher generality and robustness of Adam-v2's representations originate from its explicit construction of hierarchies for distinct anatomical structures from unlabeled medical images. Adam-v2 preserves a semantic balance of anatomical diversity and harmony in its embedding, yielding representations that are both generic and semantically meaningful, yet overlooked in existing SSL methods. All code and pretrained models are available at GitHub.com/JLiangLab/Eden.
PMID:39670210 | PMC:PMC11636527 | DOI:10.1109/cvpr52733.2024.01071