Deep learning
Auto-segmentation of neck nodal metastases using self-distilled masked image transformer on longitudinal MR images
BJR Artif Intell. 2024 Mar 4;1(1):ubae004. doi: 10.1093/bjrai/ubae004. eCollection 2024 Jan.
ABSTRACT
OBJECTIVES: Auto-segmentation promises greater speed and lower inter-reader variability than manual segmentations in radiation oncology clinical practice. This study aims to implement and evaluate the accuracy of the auto-segmentation algorithm, "Masked Image modeling using the vision Transformers (SMIT)," for neck nodal metastases on longitudinal T2-weighted (T2w) MR images in oropharyngeal squamous cell carcinoma (OPSCC) patients.
METHODS: This prospective clinical trial study included 123 human papillomaviruses (HPV-positive [+]) related OSPCC patients who received concurrent chemoradiotherapy. T2w MR images were acquired on 3 T at pre-treatment (Tx), week 0, and intra-Tx weeks (1-3). Manual delineations of metastatic neck nodes from 123 OPSCC patients were used for the SMIT auto-segmentation, and total tumor volumes were calculated. Standard statistical analyses compared contour volumes from SMIT vs manual segmentation (Wilcoxon signed-rank test [WSRT]), and Spearman's rank correlation coefficients (ρ) were computed. Segmentation accuracy was evaluated on the test data set using the dice similarity coefficient (DSC) metric value. P-values <0.05 were considered significant.
RESULTS: No significant difference in manual and SMIT delineated tumor volume at pre-Tx (8.68 ± 7.15 vs 8.38 ± 7.01 cm3, P = 0.26 [WSRT]), and the Bland-Altman method established the limits of agreement as -1.71 to 2.31 cm3, with a mean difference of 0.30 cm3. SMIT model and manually delineated tumor volume estimates were highly correlated (ρ = 0.84-0.96, P < 0.001). The mean DSC metric values were 0.86, 0.85, 0.77, and 0.79 at the pre-Tx and intra-Tx weeks (1-3), respectively.
CONCLUSIONS: The SMIT algorithm provides sufficient segmentation accuracy for oncological applications in HPV+ OPSCC.
ADVANCES IN KNOWLEDGE: First evaluation of auto-segmentation with SMIT using longitudinal T2w MRI in HPV+ OPSCC.
PMID:38476956 | PMC:PMC10928808 | DOI:10.1093/bjrai/ubae004
High-Throughput Spike Detection in Greenhouse Cultivated Grain Crops with Attention Mechanisms-Based Deep Learning Models
Plant Phenomics. 2024 Mar 11;6:0155. doi: 10.34133/plantphenomics.0155. eCollection 2024.
ABSTRACT
Detection of spikes is the first important step toward image-based quantitative assessment of crop yield. However, spikes of grain plants occupy only a tiny fraction of the image area and often emerge in the middle of the mass of plant leaves that exhibit similar colors to spike regions. Consequently, accurate detection of grain spikes renders, in general, a non-trivial task even for advanced, state-of-the-art deep neural networks (DNNs). To improve pattern detection in spikes, we propose architectural changes to Faster-RCNN (FRCNN) by reducing feature extraction layers and introducing a global attention module. The performance of our extended FRCNN-A vs. conventional FRCNN was compared on images of different European wheat cultivars, including "difficult" bushy phenotypes from 2 different phenotyping facilities and optical setups. Our experimental results show that introduced architectural adaptations in FRCNN-A helped to improve spike detection accuracy in inner regions. The mean average precision (mAP) of FRCNN and FRCNN-A on inner spikes is 76.0% and 81.0%, respectively, while on the state-of-the-art detection DNNs, Swin Transformer mAP is 83.0%. As a lightweight network, FRCNN-A is faster than FRCNN and Swin Transformer on both baseline and augmented training datasets. On the FastGAN augmented dataset, FRCNN achieved a mAP of 84.24%, FRCNN-A attained a mAP of 85.0%, and the Swin Transformer achieved a mAP of 89.45%. The increase in mAP of DNNs on the augmented datasets is proportional to the amount of the IPK original and augmented images. Overall, this study indicates a superior performance of attention mechanisms-based deep learning models in detecting small and subtle features of grain spikes.
PMID:38476818 | PMC:PMC10927539 | DOI:10.34133/plantphenomics.0155
Machine learning in action: Revolutionizing intracranial hematoma detection and patient transport decision-making
J Neurosci Rural Pract. 2024 Jan-Mar;15(1):62-68. doi: 10.25259/JNRP_93_2023. Epub 2023 Dec 16.
ABSTRACT
OBJECTIVES: Traumatic intracranial hematomas represent a critical clinical situation where early detection and management are of utmost importance. Machine learning has been recently used in the detection of neuroradiological findings. Hence, it can be used in the detection of intracranial hematomas and furtherly initiate a management cascade of patient transfer, diagnostics, admission, and emergency intervention. We aim, here, to develop a diagnostic tool based on artificial intelligence to detect hematomas instantaneously, and automatically start a cascade of actions that support the management protocol depending on the early diagnosis.
MATERIALS AND METHODS: A plot was designed as a staged model: The first stage of initiating and training the machine with the provisional evaluation of its accuracy and the second stage of supervised use in a tertiary care hospital and a third stage of its generalization in primary and secondary care hospitals. Two datasets were used: CQ500, a public dataset, and our dataset collected retrospectively from our tertiary hospital.
RESULTS: A mean dice score of 0.83 was achieved on the validation set of CQ500. Moreover, the detection of intracranial hemorrhage was successful in 94% of cases for the CQ500 test set and 93% for our local institute cases. Poor detection was present in only 6-7% of the total test set. Moderate false-positive results were encountered in 18% and major false positives reached 5% for the total test set.
CONCLUSION: The proposed approach for the early detection of acute intracranial hematomas provides a reliable outset for generating an automatically initiated management cascade in high-flow hospitals.
PMID:38476429 | PMC:PMC10927054 | DOI:10.25259/JNRP_93_2023
Deep learning-based radiomics model from pretreatment ADC to predict biochemical recurrence in advanced prostate cancer
Front Oncol. 2024 Feb 27;14:1342104. doi: 10.3389/fonc.2024.1342104. eCollection 2024.
ABSTRACT
PURPOSE: To develop deep-learning radiomics model for predicting biochemical recurrence (BCR) of advanced prostate cancer (PCa) based on pretreatment apparent diffusion coefficient (ADC) maps.
METHODS: Data were collected retrospectively from 131 patients diagnosed with advanced PCa, randomly divided into training (n = 93) and test (n = 38) datasets. Pre-treatment ADC images were segmented using a pre-trained artificial intelligence (AI) model to identify suspicious PCa areas. Three models were constructed, including a clinical model, a conventional radiomics model and a deep-radiomics model. The receiver operating characteristic (ROC), precision-recall (PR) curve and decision curve analysis (DCA) were used to assess predictive performance in test dataset. The net reclassification index (NRI) and integrated discrimination improvement (IDI) were employed to compare the performance enhancement of the deep-radiomics model in relation to the other two models.
RESULTS: The deep-radiomics model exhibited a significantly higher area under the curve (AUC) of ROC than the other two (P = 0.033, 0.026), as well as PR curve (AUC difference 0.420, 0.432). The DCA curve demonstrated superior performance for the deep-radiomics model across all risk thresholds than the other two. Taking the clinical model as reference, the NRI and IDI was 0.508 and 0.679 for the deep-radiomics model with significant difference. Compared with the conventional radiomics model, the NRI and IDI was 0.149 and 0.164 for the deep-radiomics model without significant difference.
CONCLUSION: The deep-radiomics model exhibits promising potential in predicting BCR in advanced PCa, compared to both the clinical model and the conventional radiomics model.
PMID:38476369 | PMC:PMC10928490 | DOI:10.3389/fonc.2024.1342104
Deep learning based ultrasound analysis facilitates precise distinction between parotid pleomorphic adenoma and Warthin tumor
Front Oncol. 2024 Feb 27;14:1337631. doi: 10.3389/fonc.2024.1337631. eCollection 2024.
ABSTRACT
BACKGROUND: Pleomorphic adenoma (PA), often with the benign-like imaging appearances similar to Warthin tumor (WT), however, is a potentially malignant tumor with a high recurrence rate. It is worse that pathological fine-needle aspiration cytology (FNAC) is difficult to distinguish PA and WT for inexperienced pathologists. This study employed deep learning (DL) technology, which effectively utilized ultrasound images, to provide a reliable approach for discriminating PA from WT.
METHODS: 488 surgically confirmed patients, including 266 with PA and 222 with WT, were enrolled in this study. Two experienced ultrasound physicians independently evaluated all images to differentiate between PA and WT. The diagnostic performance of preoperative FNAC was also evaluated. During the DL study, all ultrasound images were randomly divided into training (70%), validation (20%), and test (10%) sets. Furthermore, ultrasound images that could not be diagnosed by FNAC were also randomly allocated to training (60%), validation (20%), and test (20%) sets. Five DL models were developed to classify ultrasound images as PA or WT. The robustness of these models was assessed using five-fold cross-validation. The Gradient-weighted Class Activation Mapping (Grad-CAM) technique was employed to visualize the region of interest in the DL models.
RESULTS: In Grad-CAM analysis, the DL models accurately identified the mass as the region of interest. The area under the receiver operating characteristic curve (AUROC) of the two ultrasound physicians were 0.351 and 0.598, and FNAC achieved an AUROC of only 0.721. Meanwhile, for DL models, the AUROC value for discriminating between PA and WT in the test set was from 0.828 to 0.908. ResNet50 demonstrated the optimal performance with an AUROC of 0.908, an accuracy of 0.833, a sensitivity of 0.736, and a specificity of 0.904. In the test set of cases that FNAC failed to provide a diagnosis, DenseNet121 demonstrated the optimal performance with an AUROC of 0.897, an accuracy of 0.806, a sensitivity of 0.789, and a specificity of 0.824.
CONCLUSION: For the discrimination of PA and WT, DL models are superior to ultrasound and FNAC, thereby facilitating surgeons in making informed decisions regarding the most appropriate surgical approach.
PMID:38476360 | PMC:PMC10927830 | DOI:10.3389/fonc.2024.1337631
Evaluation of Methods for Detection and Semantic Segmentation of the Anterior Capsulotomy in Cataract Surgery Video
Clin Ophthalmol. 2024 Mar 5;18:647-657. doi: 10.2147/OPTH.S453073. eCollection 2024.
ABSTRACT
BACKGROUND: The capsulorhexis is one of the most important and challenging maneuvers in cataract surgery. Automated analysis of the anterior capsulotomy could aid surgical training through the provision of objective feedback and guidance to trainees.
PURPOSE: To develop and evaluate a deep learning-based system for the automated identification and semantic segmentation of the anterior capsulotomy in cataract surgery video.
METHODS: In this study, we established a BigCat-Capsulotomy dataset comprising 1556 video frames extracted from 190 recorded cataract surgery videos for developing and validating the capsulotomy recognition system. The proposed system involves three primary stages: video preprocessing, capsulotomy video frame classification, and capsulotomy segmentation. To thoroughly evaluate its efficacy, we examined the performance of a total of eight deep learning-based classification models and eleven segmentation models, assessing both accuracy and time consumption. Furthermore, we delved into the factors influencing system performance by deploying it across various surgical phases.
RESULTS: The ResNet-152 model employed in the classification step of the proposed capsulotomy recognition system attained strong performance with an overall Dice coefficient of 92.21%. Similarly, the UNet model with the DenseNet-169 backbone emerged as the most effective segmentation model among those investigated, achieving an overall Dice coefficient of 92.12%. Moreover, the time consumption of the system was low at 103.37 milliseconds per frame, facilitating its application in real-time scenarios. Phase-wise analysis indicated that the Phacoemulsification phase (nuclear disassembly) was the most challenging to segment (Dice coefficient of 86.02%).
CONCLUSION: The experimental results showed that the proposed system is highly effective in intraoperative capsulotomy recognition during cataract surgery and demonstrates both high accuracy and real-time capabilities. This system holds significant potential for applications in surgical performance analysis, education, and intraoperative guidance systems.
PMID:38476358 | PMC:PMC10929120 | DOI:10.2147/OPTH.S453073
SR-TTS: a rhyme-based end-to-end speech synthesis system
Front Neurorobot. 2024 Feb 27;18:1322312. doi: 10.3389/fnbot.2024.1322312. eCollection 2024.
ABSTRACT
Deep learning has significantly advanced text-to-speech (TTS) systems. These neural network-based systems have enhanced speech synthesis quality and are increasingly vital in applications like human-computer interaction. However, conventional TTS models still face challenges, as the synthesized speeches often lack naturalness and expressiveness. Additionally, the slow inference speed, reflecting low efficiency, contributes to the reduced voice quality. This paper introduces SynthRhythm-TTS (SR-TTS), an optimized Transformer-based structure designed to enhance synthesized speech. SR-TTS not only improves phonological quality and naturalness but also accelerates the speech generation process, thereby increasing inference efficiency. SR-TTS contains an encoder, a rhythm coordinator, and a decoder. In particular, a pre-duration predictor within the cadence coordinator and a self-attention-based feature predictor work together to enhance the naturalness and articulatory accuracy of speech. In addition, the introduction of causal convolution enhances the consistency of the time series. The cross-linguistic capability of SR-TTS is validated by training it on both English and Chinese corpora. Human evaluation shows that SR-TTS outperforms existing techniques in terms of speech quality and naturalness of expression. This technology is particularly suitable for applications that require high-quality natural speech, such as intelligent assistants, speech synthesized podcasts, and human-computer interaction.
PMID:38476267 | PMC:PMC10927791 | DOI:10.3389/fnbot.2024.1322312
Deep-learning approach to stratified reconstructions of tissue absorption and scattering in time-domain spatial frequency domain imaging
J Biomed Opt. 2024 Mar;29(3):036002. doi: 10.1117/1.JBO.29.3.036002. Epub 2024 Mar 12.
ABSTRACT
SIGNIFICANCE: The conventional optical properties (OPs) reconstruction in spatial frequency domain (SFD) imaging, like the lookup table (LUT) method, causes OPs aliasing and yields only average OPs without depth resolution. Integrating SFD imaging with time-resolved (TR) measurements enhances space-TR information, enabling improved reconstruction of absorption (μa) and reduced scattering (μs') coefficients at various depths.
AIM: To achieve the stratified reconstruction of OPs and the separation between μa and μs', using deep learning workflow based on the temporal and spatial information provided by time-domain SFD imaging technique, while enhancing the reconstruction accuracy.
APPROACH: Two data processing methods are employed for the OPs reconstruction with TR-SFD imaging, one is full TR data, and the other is the featured data extracted from the full TR data (E, continuous-wave component, ⟨t⟩, mean time of flight). We compared their performance using a series of simulation and phantom validations.
RESULTS: Compared to the LUT approach, utilizing full TR, E and ⟨t⟩ datasets yield high-resolution OPs reconstruction results. Among the three datasets employed, full TR demonstrates the optimal accuracy.
CONCLUSIONS: Utilizing the data obtained from SFD and TR measurement techniques allows for achieving high-resolution separation reconstruction of μa and μs' at different depths within 5 mm.
PMID:38476220 | PMC:PMC10929733 | DOI:10.1117/1.JBO.29.3.036002
Deep Learning Image Reconstruction for Transcatheter Aortic Valve Implantation Planning: Image Quality, Diagnostic Performance, Contrast volume and Radiation Dose Assessment
Acad Radiol. 2024 Mar 11:S1076-6332(24)00096-5. doi: 10.1016/j.acra.2024.02.026. Online ahead of print.
ABSTRACT
RATIONALE AND OBJECTIVES: To assess image quality, contrast volume and radiation dose reduction potential and diagnostic performance with the use of high-strength deep learning image reconstruction (DLIR-H) in transcatheter aortic valve implantation (TAVI) planning CT.
METHODS: We prospectively enrolled 128 patients referred to TAVI-planning CT. Patients were randomly divided into two groups: DLIR-H group (n = 64) and conventional group (n = 64). The DLIR-H group was scanned with tube voltage of 80kVp and body weighted-dependent contrast injection rate of 28mgI/kg/s, images reconstructed using DLIR-H; the conventional group was scanned with 100kVp and contrast injection rate of 40mgI/kg/s, and images reconstructed using adaptive statistical iterative reconstruction-V at 50% (ASIR-V 50%). Radiation dose, contrast volume, contrast injection rate, and image quality were compared between the two groups. The diagnostic performance of TAVI planning CT for coronary stenosis in 115 patients were calculated using invasive coronary angiography as golden standard.
RESULTS: DLIR-H group significantly reduced radiation dose (4.94 ± 0.39mSv vs. 7.93 ± 1.20mSv, p < 0.001), contrast dose (45.28 ± 5.38 mL vs. 63.26 ± 9.88 mL, p < 0.001), and contrast injection rate (3.1 ± 0.31 mL/s vs. 4.9 ± 0.2 mL/s, p < 0.001) compared to the conventional group. Images in DLIR-H group had significantly higher SNR and CNR (all p < 0.001). For the diagnostic performance on a per-patient basis, TAVI planning CT in the DLIR-H group provided 100% sensitivity, 92.1% specificity, 100% negative predictive value (NPV), and 84.2% positive predictive value for the detection of > 50% stenosis. In the conventional group, the corresponding results were 94.7%, 95.3%, 97.6%, and 90.0%, respectively.
CONCLUSION: DLIR-H in TAVI-planning CT provides improved image quality with reduced radiation and contrast doses, and enables satisfactory diagnostic performance for coronary arteries stenosis.
PMID:38472024 | DOI:10.1016/j.acra.2024.02.026
Fully Automated Identification of Lymph Node Metastases and Lymphovascular Invasion in Endometrial Cancer From Multi-Parametric MRI by Deep Learning
J Magn Reson Imaging. 2024 Mar 12. doi: 10.1002/jmri.29344. Online ahead of print.
ABSTRACT
BACKGROUND: Early and accurate identification of lymphatic node metastasis (LNM) and lymphatic vascular space invasion (LVSI) for endometrial cancer (EC) patients is important for treatment design, but difficult on multi-parametric MRI (mpMRI) images.
PURPOSE: To develop a deep learning (DL) model to simultaneously identify of LNM and LVSI of EC from mpMRI images.
STUDY TYPE: Retrospective.
POPULATION: Six hundred twenty-one patients with histologically proven EC from two institutions, including 111 LNM-positive and 168 LVSI-positive, divided into training, internal, and external test cohorts of 398, 169, and 54 patients, respectively.
FIELD STRENGTH/SEQUENCE: T2-weighted imaging (T2WI), contrast-enhanced T1WI (CE-T1WI), and diffusion-weighted imaging (DWI) were scanned with turbo spin-echo, gradient-echo, and two-dimensional echo-planar sequences, using either a 1.5 T or 3 T system.
ASSESSMENT: EC lesions were manually delineated on T2WI by two radiologists and used to train an nnU-Net model for automatic segmentation. A multi-task DL model was developed to simultaneously identify LNM and LVSI positive status using the segmented EC lesion regions and T2WI, CE-T1WI, and DWI images as inputs. The performance of the model for LNM-positive diagnosis was compared with those of three radiologists in the external test cohort.
STATISTICAL TESTS: Dice similarity coefficient (DSC) was used to evaluate segmentation results. Receiver Operating Characteristic (ROC) analysis was used to assess the performance of LNM and LVSI status identification. P value <0.05 was considered significant.
RESULTS: EC lesion segmentation model achieved mean DSC values of 0.700 ± 0.25 and 0.693 ± 0.21 in the internal and external test cohorts, respectively. For LNM positive/LVSI positive identification, the proposed model achieved AUC values of 0.895/0.848, 0.806/0.795, and 0.804/0.728 in the training, internal, and external test cohorts, respectively, and better than those of three radiologists (AUC = 0.770/0.648/0.674).
DATA CONCLUSION: The proposed model has potential to help clinicians to identify LNM and LVSI status of EC patients and improve treatment planning.
EVIDENCE LEVEL: 3 TECHNICAL EFFICACY: Stage 2.
PMID:38471960 | DOI:10.1002/jmri.29344
An accessible deep learning tool for voxel-wise classification of brain malignancies from perfusion MRI
Cell Rep Med. 2024 Mar 5:101464. doi: 10.1016/j.xcrm.2024.101464. Online ahead of print.
ABSTRACT
Noninvasive differential diagnosis of brain tumors is currently based on the assessment of magnetic resonance imaging (MRI) coupled with dynamic susceptibility contrast (DSC). However, a definitive diagnosis often requires neurosurgical interventions that compromise patients' quality of life. We apply deep learning on DSC images from histology-confirmed patients with glioblastoma, metastasis, or lymphoma. The convolutional neural network trained on ∼50,000 voxels from 40 patients provides intratumor probability maps that yield clinical-grade diagnosis. Performance is tested in 400 additional cases and an external validation cohort of 128 patients. The tool reaches a three-way accuracy of 0.78, superior to the conventional MRI metrics cerebral blood volume (0.55) and percentage of signal recovery (0.59), showing high value as a support diagnostic tool. Our open-access software, Diagnosis In Susceptibility Contrast Enhancing Regions for Neuro-oncology (DISCERN), demonstrates its potential in aiding medical decisions for brain tumor diagnosis using standard-of-care MRI.
PMID:38471504 | DOI:10.1016/j.xcrm.2024.101464
Synthetic PET from CT improves diagnosis and prognosis for lung cancer: Proof of concept
Cell Rep Med. 2024 Mar 4:101463. doi: 10.1016/j.xcrm.2024.101463. Online ahead of print.
ABSTRACT
[18F]Fluorodeoxyglucose positron emission tomography (FDG-PET) and computed tomography (CT) are indispensable components in modern medicine. Although PET can provide additional diagnostic value, it is costly and not universally accessible, particularly in low-income countries. To bridge this gap, we have developed a conditional generative adversarial network pipeline that can produce FDG-PET from diagnostic CT scans based on multi-center multi-modal lung cancer datasets (n = 1,478). Synthetic PET images are validated across imaging, biological, and clinical aspects. Radiologists confirm comparable imaging quality and tumor contrast between synthetic and actual PET scans. Radiogenomics analysis further proves that the dysregulated cancer hallmark pathways of synthetic PET are consistent with actual PET. We also demonstrate the clinical values of synthetic PET in improving lung cancer diagnosis, staging, risk prediction, and prognosis. Taken together, this proof-of-concept study testifies to the feasibility of applying deep learning to obtain high-fidelity PET translated from CT.
PMID:38471502 | DOI:10.1016/j.xcrm.2024.101463
Drop the shortcuts: image augmentation improves fairness and decreases AI detection of race and other demographics from medical images
EBioMedicine. 2024 Mar 11;102:105047. doi: 10.1016/j.ebiom.2024.105047. Online ahead of print.
ABSTRACT
BACKGROUND: It has been shown that AI models can learn race on medical images, leading to algorithmic bias. Our aim in this study was to enhance the fairness of medical image models by eliminating bias related to race, age, and sex. We hypothesise models may be learning demographics via shortcut learning and combat this using image augmentation.
METHODS: This study included 44,953 patients who identified as Asian, Black, or White (mean age, 60.68 years ±18.21; 23,499 women) for a total of 194,359 chest X-rays (CXRs) from MIMIC-CXR database. The included CheXpert images comprised 45,095 patients (mean age 63.10 years ±18.14; 20,437 women) for a total of 134,300 CXRs were used for external validation. We also collected 1195 3D brain magnetic resonance imaging (MRI) data from the ADNI database, which included 273 participants with an average age of 76.97 years ±14.22, and 142 females. DL models were trained on either non-augmented or augmented images and assessed using disparity metrics. The features learned by the models were analysed using task transfer experiments and model visualisation techniques.
FINDINGS: In the detection of radiological findings, training a model using augmented CXR images was shown to reduce disparities in error rate among racial groups (-5.45%), age groups (-13.94%), and sex (-22.22%). For AD detection, the model trained with augmented MRI images was shown 53.11% and 31.01% reduction of disparities in error rate among age and sex groups, respectively. Image augmentation led to a reduction in the model's ability to identify demographic attributes and resulted in the model trained for clinical purposes incorporating fewer demographic features.
INTERPRETATION: The model trained using the augmented images was less likely to be influenced by demographic information in detecting image labels. These results demonstrate that the proposed augmentation scheme could enhance the fairness of interpretations by DL models when dealing with data from patients with different demographic backgrounds.
FUNDING: National Science and Technology Council (Taiwan), National Institutes of Health.
PMID:38471396 | DOI:10.1016/j.ebiom.2024.105047
Recent developments in denoising medical images using deep learning: An overview of models, techniques, and challenges
Micron. 2024 Mar 2;180:103615. doi: 10.1016/j.micron.2024.103615. Online ahead of print.
ABSTRACT
Medical imaging plays a critical role in diagnosing and treating various medical conditions. However, interpreting medical images can be challenging even for expert clinicians, as they are often degraded by noise and artifacts that can hinder the accurate identification and analysis of diseases, leading to severe consequences such as patient misdiagnosis or mortality. Various types of noise, including Gaussian, Rician, and Salt-pepper noise, can corrupt the area of interest, limiting the precision and accuracy of algorithms. Denoising algorithms have shown the potential in improving the quality of medical images by removing noise and other artifacts that obscure essential information. Deep learning has emerged as a powerful tool for image analysis and has demonstrated promising results in denoising different medical images such as MRIs, CT scans, PET scans, etc. This review paper provides a comprehensive overview of state-of-the-art deep learning algorithms used for denoising medical images. A total of 120 relevant papers were reviewed, and after screening with specific inclusion and exclusion criteria, 104 papers were selected for analysis. This study aims to provide a thorough understanding for researchers in the field of intelligent denoising by presenting an extensive survey of current techniques and highlighting significant challenges that remain to be addressed. The findings of this review are expected to contribute to the development of intelligent models that enable timely and accurate diagnoses of medical disorders. It was found that 40% of the researchers used models based on Deep convolutional neural networks to denoise the images, followed by encoder-decoder (18%) and other artificial intelligence-based techniques (15%) (Like DIP, etc.). Generative adversarial network was used by 12%, transformer-based approaches (13%) and multilayer perceptron was used by 2% of the researchers. Moreover, Gaussian noise was present in 35% of the images, followed by speckle noise (16%), poisson noise (14%), artifacts (10%), rician noise (7%), Salt-pepper noise (6%), Impulse noise (3%) and other types of noise (9%). While the progress in developing novel models for the denoising of medical images is evident, significant work remains to be done in creating standardized denoising models that perform well across a wide spectrum of medical images. Overall, this review highlights the importance of denoising medical images and provides a comprehensive understanding of the current state-of-the-art deep learning algorithms in this field.
PMID:38471391 | DOI:10.1016/j.micron.2024.103615
A novel approach for intelligent diagnosis and grading of diabetic retinopathy
Comput Biol Med. 2024 Mar 6;172:108246. doi: 10.1016/j.compbiomed.2024.108246. Online ahead of print.
ABSTRACT
Diabetic retinopathy (DR) is a severe ocular complication of diabetes that can lead to vision damage and even blindness. Currently, traditional deep convolutional neural networks (CNNs) used for DR grading tasks face two primary challenges: (1) insensitivity to minority classes due to imbalanced data distribution, and (2) neglecting the relationship between the left and right eyes by utilizing the fundus image of only one eye for training without differentiating between them. To tackle these challenges, we proposed the DRGCNN (DR Grading CNN) model. To solve the problem caused by imbalanced data distribution, our model adopts a more balanced strategy by allocating an equal number of channels to feature maps representing various DR categories. Furthermore, we introduce a CAM-EfficientNetV2-M encoder dedicated to encoding input retinal fundus images for feature vector generation. The number of parameters of our encoder is 52.88 M, which is less than RegNet_y_16gf (80.57 M) and EfficientNetB7 (63.79 M), but the corresponding kappa value is higher. Additionally, in order to take advantage of the binocular relationship, we input fundus retinal images from both eyes of the patient into the network for features fusion during the training phase. We achieved a kappa value of 86.62% on the EyePACS dataset and 86.16% on the Messidor-2 dataset. Experimental results on these representative datasets for diabetic retinopathy (DR) demonstrate the exceptional performance of our DRGCNN model, establishing it as a highly competitive intelligent classification model in the field of DR. The code is available for use at https://github.com/Fat-Hai/DRGCNN.
PMID:38471350 | DOI:10.1016/j.compbiomed.2024.108246
TCMSSD: A comprehensive database focused on syndrome standardization
Phytomedicine. 2024 Feb 27;128:155486. doi: 10.1016/j.phymed.2024.155486. Online ahead of print.
ABSTRACT
BACKGROUD: Quantitative and standardized research on syndrome differentiation has always been at the forefront of modernizing Traditional Chinese Medicine (TCM) theory. However, the majority of existing databases primarily concentrate on the network pharmacology of herbal prescriptions, and there are limited databases specifically dedicated to TCM syndrome differentiation.
PURPOSE: In response to this gap, we have developed the Traditional Chinese Medical Syndrome Standardization Database (TCMSSD, http://tcmssd.ratcm.cn).
METHODS: TCMSSD is a comprehensive database that gathers data from various sources, including TCM literature such as TCM Syndrome Studies (Zhong Yi Zheng Hou Xue) and TCM Internal Medicine (Zhong Yi Nei Ke Xue) and various public databases such as TCMID and ETCM. In our study, we employ a deep learning approach to construct the knowledge graph and utilize the BM25 algorithm for syndrome prediction.
RESULTS: The TCMSSD integrates the essence of TCM with the modern medical system, providing a comprehensive collection of information related to TCM. It includes 624 syndromes, 133,518 prescriptions, 8,073 diseases (including 1,843 TCM-specific diseases), 8,259 Chinese herbal medicines, 43,413 ingredients, 17,602 targets, and 8,182 drugs. By analyzing input data and comparing it with the patterns and characteristics recorded in the database, the syndrome prediction tool generates predictions based on established correlations and patterns.
CONCLUSION: The TCMSSD fills the gap in existing databases by providing a comprehensive resource for quantitative and standardized research on TCM syndrome differentiation and laid the foundation for research on the biological basis of syndromes.
PMID:38471316 | DOI:10.1016/j.phymed.2024.155486
STCGRU: A hybrid model based on CNN and BiGRU for mild cognitive impairment diagnosis
Comput Methods Programs Biomed. 2024 Mar 8;248:108123. doi: 10.1016/j.cmpb.2024.108123. Online ahead of print.
ABSTRACT
BACKGROUND AND OBJECTIVE: Early diagnosis of mild cognitive impairment (MCI) is one of the essential measures to prevent its further development into Alzheimer's disease (AD). In this paper, we propose a hybrid deep learning model for early diagnosis of MCI, called spatio-temporal convolutional gated recurrent unit network (STCGRU).
METHODS: The STCGRU comprises three bespoke convolutional neural network (CNN) modules and a bi-directional gated recurrent unit (BiGRU) module, which can effectively extract the spatial and temporal features of EEG and obtain excellent diagnostic results. We use a publicly available EEG dataset that has not undergone pre-processing to verify the robustness and accuracy of the model. Ablation experiments on STCGRU are conducted to showcase the individual performance improvement of each module.
RESULTS: Compared with other state-of-the-art approaches using the same publicly available EEG dataset, the results show that STCGRU is more suitable for early diagnosis of MCI. After 10-fold cross-validation, the average classification accuracy of the hybrid model reached 99.95 %, while the average kappa value reached 0.9989.
CONCLUSIONS: The experimental results show that the hybrid model proposed in this paper can directly extract compelling spatio-temporal features from the raw EEG data for classification. The STCGRU allows for accurate diagnosis of patients with MCI and has a high practical value.
PMID:38471292 | DOI:10.1016/j.cmpb.2024.108123
Tolerant Self-Distillation for image classification
Neural Netw. 2024 Feb 28;174:106215. doi: 10.1016/j.neunet.2024.106215. Online ahead of print.
ABSTRACT
Deep neural networks tend to suffer from the overfitting issue when the training data are not enough. In this paper, we introduce two metrics from the intra-class distribution of correct-predicted and incorrect-predicted samples to provide a new perspective on the overfitting issue. Based on it, we propose a knowledge distillation approach without pretraining a teacher model in advance named Tolerant Self-Distillation (TSD) for alleviating the overfitting issue. It introduces an online updating memory and selectively stores the class predictions of the samples from the past iterations, making it possible to distill knowledge across the iterations. Specifically, the class predictions stored in the memory bank serve as the soft labels for supervising the samples from the same class for the current iteration in a reverse way, i.e. the correct-predicted samples are supervised with the incorrect predictions while the incorrect-predicted samples are supervised with the correct predictions. Consequently, the premature convergence issue caused by the over-confident samples would be mitigated, which helps the model to converge to a better local optimum. Extensive experimental results on several image classification benchmarks, including small-scale, large-scale, and fine-grained datasets, demonstrate the superiority of the proposed TSD.
PMID:38471261 | DOI:10.1016/j.neunet.2024.106215
Predicting Obstructive Sleep Apnea Based on Computed Tomography Scan Using Deep Learning Models
Am J Respir Crit Care Med. 2024 Mar 12. doi: 10.1164/rccm.202304-0767OC. Online ahead of print.
ABSTRACT
RATIONALE: The incidence of clinically undiagnosed obstructive sleep apnea (OSA) is high among the general population due to limited access to polysomnography. Computed tomography (CT) of craniofacial regions obtained for other purposes can be beneficial in predicting OSA and its severity.
OBJECTIVES: To predict OSA and its severity based on paranasal CT using a 3-dimensional deep learning algorithm.
METHODS: One internal dataset (n=798) and two external datasets (n=135 and 85) were used in this study. In the internal dataset, 92 normal, 159 mild, 201 moderate, and 346 severe OSA participants were enrolled to derive the deep learning model. A multimodal deep learning model was elicited from the connection between a 3-dimensional convolutional neural network (CNN)-based part treating unstructured data (CT images) and a multi-layer perceptron (MLP)-based part treating structured data (age, sex, and body mass index) to predict OSA and its severity.
MEASUREMENTS AND MAIN RESULTS: In four-class classification for predicting the severity of OSA, the AirwayNet-MM-H model (multimodal model with airway-highlighting preprocessing algorithm) showed an average accuracy of 87.6% (95% confidence interval [CI] 86.8-88.6) in the internal dataset and 84.0% (95% CI 83.0-85.1) and 86.3% (95% CI 85.3-87.3) in the two external datasets, respectively. In the two-class classification for predicting significant OSA (moderate to severe OSA), The area under the receiver operating characteristics (AUROC), accuracy, sensitivity, specificity, and F1 score were 0.910 (95% CI 0.899-0.922), 91.0% (95% CI 90.1-91.9), 89.9% (95% CI 88.8-90.9), 93.5% (95% CI 92.7-94.3), and 93.2% (95% CI 92.5-93.9), respectively, in the internal dataset. Furthermore, the diagnostic performance of the Airway Net-MM-H model outperformed that of the other six state-of-the-art deep learning models in terms of accuracy for both four- and two-class classifications and AUROC for two-class classification (p<0.001).
CONCLUSIONS: A novel deep learning model, including a multimodal deep learning model and an airway-highlighting preprocessing algorithm from CT images obtained for other purposes, can provide significantly precise outcomes for OSA diagnosis.
PMID:38471111 | DOI:10.1164/rccm.202304-0767OC
The training process of many deep networks explores the same low-dimensional manifold
Proc Natl Acad Sci U S A. 2024 Mar 19;121(12):e2310002121. doi: 10.1073/pnas.2310002121. Epub 2024 Mar 12.
ABSTRACT
We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we reveal that the training process explores an effectively low-dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization techniques, data augmentation techniques, and weight initializations lie on the same manifold in the prediction space. We study the details of this manifold to find that networks with different architectures follow distinguishable trajectories, but other factors have a minimal influence; larger networks train along a similar manifold as that of smaller networks, just faster; and networks initialized at very different parts of the prediction space converge to the solution along a similar manifold.
PMID:38470929 | DOI:10.1073/pnas.2310002121