Deep learning
Robust Fine-Grained Visual Recognition with Neighbor-Attention Label Correction
IEEE Trans Image Process. 2024 Mar 28;PP. doi: 10.1109/TIP.2024.3378461. Online ahead of print.
ABSTRACT
Existing deep learning methods for fine-grained visual recognition often rely on large-scale, well-annotated training data. Obtaining fine-grained annotations in the wild typically requires concentration and expertise, such as fine category annotation for species recognition, instance annotation for person re-identification (re-id) and dense annotation for segmentation, which inevitably leads to label noise. This paper aims to tackle label noise in deep model training for fine-grained visual recognition. We propose a Neighbor-Attention Label Correction (NALC) model to correct labels during the training stage. NALC samples a training batch and a validation batch from the training set. It hence leverages a meta-learning framework to correct labels in the training batch based on the validation batch. To enhance the optimization efficiency, we introduce a novel nested optimization algorithm for the meta-learning framework. The proposed training procedure consistently improves label accuracy in the training batch, consequently enhancing the learned image representation. Experimental results demonstrate that our method significantly increases label accuracy from 70% to over 98% and outperforms recent approaches by up to 13.4% in mean Average Precision (mAP) on various fine-grained image retrieval (FGIR) tasks, including instance retrieval on CUB200 and person re-id on Market1501. We also demonstrate the efficacy of NALC on noisy semantic segmentation datasets generated from Cityscapes, where it achieves a significant 7.8% improvement in mIOU score. NALC also exhibits robustness to different types of noise, including simulated noise such as Asymmetric, Pair-Flip, and Pattern noise, as well as practical noisy labels generated by tracklets and clustering.
PMID:38546993 | DOI:10.1109/TIP.2024.3378461
eVAE: Evolutionary Variational Autoencoder
IEEE Trans Neural Netw Learn Syst. 2024 Mar 28;PP. doi: 10.1109/TNNLS.2024.3359275. Online ahead of print.
ABSTRACT
Variational autoencoders (VAEs) are challenged by the imbalance between representation inference and task fitting caused by surrogate loss. To address this issue, existing methods adjust their balance by directly tuning their coefficients. However, these methods suffer from a tradeoff uncertainty, i.e., nondynamic regulation over iterations and inflexible hyperparameters for learning tasks. Accordingly, we make the first attempt to introduce an evolutionary VAE (eVAE), building on the variational information bottleneck (VIB) theory and integrative evolutionary neural learning. eVAE integrates a variational genetic algorithm (VGA) into VAE with variational evolutionary operators, including variational mutation (V-mutation), crossover, and evolution. Its training mechanism synergistically and dynamically addresses and updates the learning tradeoff uncertainty in the evidence lower bound (ELBO) without additional constraints and hyperparameter tuning. Furthermore, eVAE presents an evolutionary paradigm to tune critical factors of VAEs and addresses the premature convergence and random search problem in integrating evolutionary optimization into deep learning. Experiments show that eVAE addresses the KL-vanishing problem for text generation with low reconstruction loss, generates all the disentangled factors with sharp images, and improves image generation quality. eVAE achieves better disentanglement, generation performance, and generation-inference balance than its competitors. Code available at: https://github.com/amasawa/eVAE.
PMID:38546992 | DOI:10.1109/TNNLS.2024.3359275
A prior-information-based automatic segmentation method for the clinical target volume in adaptive radiotherapy of cervical cancer
J Appl Clin Med Phys. 2024 Mar 28:e14350. doi: 10.1002/acm2.14350. Online ahead of print.
ABSTRACT
OBJECTIVE: Adaptive planning to accommodate anatomic changes during treatment often requires repeated segmentation. In this study, prior patient-specific data was integrateda into a registration-guided multi-channel multi-path (Rg-MCMP) segmentation framework to improve the accuracy of repeated clinical target volume (CTV) segmentation.
METHODS: This study was based on CT image datasets for a total of 90 cervical cancer patients who received two courses of radiotherapy. A total of 15 patients were selected randomly as the test set. In the Rg-MCMP segmentation framework, the first-course CT images (CT1) were registered to second-course CT images (CT2) to yield aligned CT images (aCT1), and the CTV in the first course (CTV1) was propagated to yield aligned CTV contours (aCTV1). Then, aCT1, aCTV1, and CT2 were combined as the inputs for 3D U-Net consisting of a channel-based multi-path feature extraction network. The performance of the Rg-MCMP segmentation framework was evaluated and compared with the single-channel single-path model (SCSP), the standalone registration methods, and the registration-guided multi-channel single-path (Rg-MCSP) model. The Dice similarity coefficient (DSC), 95% Hausdorff distance (HD95), and average surface distance (ASD) were used as the metrics.
RESULTS: The average DSC of CTV for the deformable image DIR-MCMP model was found to be 0.892, greater than that of the standalone DIR (0.856), SCSP (0.837), and DIR-MCSP (0.877), which were improvements of 4.2%, 6.6%, and 1.7%, respectively. Similarly, the rigid body DIR-MCMP model yielded an average DSC of 0.875, which exceeded standalone RB (0.787), SCSP (0.837), and registration-guided multi-channel single-path (0.848), which were improvements of 11.2%, 4.5%, and 3.2%, respectively. These improvements in DSC were statistically significant (p < 0.05).
CONCLUSION: The proposed Rg-MCMP framework achieved excellent accuracy in CTV segmentation as part of the adaptive radiotherapy workflow.
PMID:38546277 | DOI:10.1002/acm2.14350
Optical Coherence Tomography versus Optic Disc Photo Assessment in Glaucoma Screening
J Glaucoma. 2024 Mar 28. doi: 10.1097/IJG.0000000000002392. Online ahead of print.
ABSTRACT
PURPOSE: This review article examines the strengths and limitations of optical coherence tomography (OCT) and optic disc photography in glaucoma screening.
METHODS/RESULTS: A comprehensive literature review was conducted, focusing on the accuracy, feasibility, cost-effectiveness, and technological advancements in OCT and optic disc photography for glaucoma screening. OCT is highly accurate and reproducible but faces limitations due to its cost and less portable nature, making widespread screening challenging. On the other hand, optic disc photos are more accessible and cost-effective but are hindered by subjective interpretation and inconsistent grading reliability. A critical challenge in glaucoma screening is achieving a high positive predictive value, particularly given the low prevalence of the disease, which can lead to a significant number of false positives. The advent of artificial intelligence and deep learning models shows potential in improving the diagnostic accuracy of optic disc photos by automating the detection of glaucomatous optic neuropathy (GON) and reducing subjectivity. However, the effectiveness of these AI models hinges on the quality of training data. Using subjective gradings as training data, will carry the limitations of human assessment into the AI system, leading to potential inaccuracies. Conversely, training AI models using objective data from OCT, such as retinal nerve fiber layer thickness, may offer a promising direction.
CONCLUSION: Both OCT and optic disc photography present valuable but distinct capabilities for glaucoma screening. An approach integrating AI technology might be key in optimizing these methods for effective, large-scale screening programs.
PMID:38546240 | DOI:10.1097/IJG.0000000000002392
Artificial intelligence- and computer-assisted navigation for shoulder surgery
J Orthop Surg (Hong Kong). 2024 Jan-Apr;32(1):10225536241243166. doi: 10.1177/10225536241243166.
ABSTRACT
Background: Over the last few decades, shoulder surgery has undergone rapid advancements, with ongoing exploration and the development of innovative technological approaches. In the coming years, technologies such as robot-assisted surgeries, virtual reality, artificial intelligence, patient-specific instrumentation, and different innovative perioperative and preoperative planning tools will continue to fuel a revolution in the medical field, thereby pushing it toward new frontiers and unprecedented advancements. In relation to this, shoulder surgery will experience significant breakthroughs. Main body: Recent advancements and technological innovations in the field were comprehensively analyzed. We aimed to provide a detailed overview of the current landscape, emphasizing the roles of technologies. Computer-assisted surgery utilizing robotic- or image-guided technologies is widely adopted in various orthopedic specialties. The most advanced components of computer-assisted surgery are navigation and robotic systems, with functions and applications that are continuously expanding. Surgical navigation requires a visual system that presents real-time positional data on surgical instruments or implants in relation to the target bone, displayed on a computer monitor. There are three primary categories of surgical planning that utilize navigation systems. The initial category involves volumetric images, such as ultrasound echogram, computed tomography, and magnetic resonance images. The second type is based on intraoperative fluoroscopic images, and the third type incorporates kinetic information about joints or morphometric data about the target bones acquired intraoperatively. Conclusion: The rapid integration of artificial intelligence and deep learning into the medical domain has a significant and transformative influence. Numerous studies utilizing deep learning-based diagnostics in orthopedics have remarkable achievements and performance.
PMID:38546214 | DOI:10.1177/10225536241243166
Prediction and Diagnosis of Breast Cancer Using Machine and Modern Deep Learning Models
Asian Pac J Cancer Prev. 2024 Mar 1;25(3):1077-1085. doi: 10.31557/APJCP.2024.25.3.1077.
ABSTRACT
Background &Objective: Carcinoma of the breast is one of the major issues causing death in women, especially in developing countries. Timely prediction, detection, diagnosis, and efficient therapies have become critical to reducing death rates. Increased use of artificial intelligence, machine, and deep learning techniques create more accurate and trustworthy models for predicting and detecting breast cancer. This study aims to examine the effectiveness of several machine and modern deep learning models for prediction and diagnosis of breast cancer.
METHODS: This research compares traditional machine learning classification methods to innovative techniques that use deep learning models. Established usual classification models such as k-Nearest Neighbors (kNN), Gradient Boosting, Support Vector Machine (SVM), Neural Network, CN2 rule inducer, Naive Bayes, Stochastic Gradient Descent (SGD), and Tree, and deep learning models such as Neural Decision Forest and Multilayer Perceptron used. The investigation, which was carried out using the Orange and Python tools, evaluates their diagnostic effectiveness in breast cancer detection. The evaluation uses UCI's publicly accessible Wisconsin Diagnostic Data Set, enabling transparency and accessibility in the study approach.
RESULT: The mean radius ranges from 6.981 to 28.110, while the mean texture runs from 9.71 to 39.28 in malignant and benign cases. Gradient boosting and CN2 rule inducer classifiers outperform SVM in accuracy and sensitivity, whereas SVM has the lowest accuracy and sensitivity at 88%. The CN2 rule inducer classifier achieves the greatest ROC curve score for benign and malignant breast cancer datasets, with an AUC score of 0.98%. MLP displays distinguish positive and negative classes, with a higher AUC-ROC of 0.9959. with accuracy of 96.49%, precision of 96.57%, recall of 96.49%, and an F1-Score of 96.50%.
CONCLUSION: Among the most commonly used classifier models, CN2 rule and GB performed better than other models. However, MLP from deep learning produced the greatest overall performance.
PMID:38546090 | DOI:10.31557/APJCP.2024.25.3.1077
Prediction of CRISPR/Cas9 off-target activity using multi-scale convolutional neural network
Sheng Wu Gong Cheng Xue Bao. 2024 Mar 25;40(3):858-876. doi: 10.13345/j.cjb.230382.
ABSTRACT
Clustered regularly interspaced short palindromic repeat/CRISPR-associated protein 9 (CRISPR/Cas9) is a new generation of gene editing technology, which relies on single guide RNA to identify specific gene sites and guide Cas9 nuclease to edit specific location in the genome. However, the off-target effect of this technology hampers its development. In recent years, several deep learning models have been developed for predicting the CRISPR/Cas9 off-target activity, which contributes to more efficient and safe gene editing and gene therapy. However, the prediction accuracy remains to be improved. In this paper, we proposed a multi-scale convolutional neural network-based method, designated as CnnCRISPR, for CRISPR/Cas9 off-target prediction. First, we used one-hot encoding method to encode the sgRNA-DNA sequence pair, followed by a bitwise or operation on the two binary matrices. Second, the encoded sequence was fed into the Inception-based network for training and evaluating. Third, the well-trained model was applied to evaluate the off-target situation of the sgRNA-DNA sequence pair. Experiments on public datasets showed CnnCRISPR outperforms existing deep learning-based methods, which provides an effective and feasible method for addressing the off-target problems.
PMID:38545983 | DOI:10.13345/j.cjb.230382
Leveraging deep neural networks to uncover unprecedented levels of precision in the diagnosis of hair and scalp disorders
Skin Res Technol. 2024 Apr;30(4):e13660. doi: 10.1111/srt.13660.
ABSTRACT
BACKGROUND: Hair and scalp disorders present a significant challenge in dermatology due to their clinical diversity and overlapping symptoms, often leading to misdiagnoses. Traditional diagnostic methods rely heavily on clinical expertise and are limited by subjectivity and accessibility, necessitating more advanced and accessible diagnostic tools. Artificial intelligence (AI) and deep learning offer a promising solution for more accurate and efficient diagnosis.
METHODS: The research employs a modified Xception model incorporating ReLU activation, dense layers, global average pooling, regularization and dropout layers. This deep learning approach is evaluated against existing models like VGG19, Inception, ResNet, and DenseNet for its efficacy in accurately diagnosing various hair and scalp disorders.
RESULTS: The model achieved a 92% accuracy rate, significantly outperforming the comparative models, with accuracies ranging from 50% to 80%. Explainable AI techniques like Gradient-weighted Class Activation Mapping (Grad-CAM) and Saliency Map provided deeper insights into the model's decision-making process.
CONCLUSION: This study emphasizes the potential of AI in dermatology, particularly in accurately diagnosing hair and scalp disorders. The superior accuracy and interpretability of the model represents a significant advancement in dermatological diagnostics, promising more reliable and accessible diagnostic methods.
PMID:38545843 | DOI:10.1111/srt.13660
Construction and validation of artificial intelligence pathomics models for predicting pathological staging in colorectal cancer: Using multimodal data and clinical variables
Cancer Med. 2024 Apr;13(7):e6947. doi: 10.1002/cam4.6947.
ABSTRACT
OBJECTIVE: This retrospective observational study aims to develop and validate artificial intelligence (AI) pathomics models based on pathological Hematoxylin-Eosin (HE) slides and pathological immunohistochemistry (Ki67) slides for predicting the pathological staging of colorectal cancer. The goal is to enable AI-assisted accurate pathological staging, supporting healthcare professionals in making efficient and precise staging assessments.
METHODS: This study included a total of 267 colorectal cancer patients (training cohort: n = 213; testing cohort: n = 54). Logistic regression algorithms were used to construct the models. The HE image features were used to build the HE model, the Ki67 image features were used for the Ki67 model, and the combined model included features from both the HE and Ki67 images, as well as tumor markers (CEA, CA724, CA125, and CA242). The predictive results of the HE model, Ki67 model, and tumor markers were visualized through a nomogram. The models were evaluated using ROC curve analysis, and their clinical value was estimated using decision curve analysis (DCA).
RESULTS: A total of 260 deep learning features were extracted from HE or Ki67 images. The AUC for the HE model and Ki67 model in the training cohort was 0.885 and 0.890, and in the testing cohort, it was 0.703 and 0.767, respectively. The combined model and nomogram in the training cohort had AUC values of 0.907 and 0.926, and in the testing cohort, they had AUC values of 0.814 and 0.817. In clinical DCA, the net benefit of the Ki67 model was superior to the HE model. The combined model and nomogram showed significantly higher net benefits compared to the individual HE model or Ki67 model.
CONCLUSION: The combined model and nomogram, which integrate pathomics multi-modal data and clinical-pathological variables, demonstrated superior performance in distinguishing between Stage I-II and Stage III colorectal cancer. This provides valuable support for clinical decision-making and may improve treatment strategies and patient prognosis. Furthermore, the use of immunohistochemistry (Ki67) slides for pathomics modeling outperformed HE slide, offering new insights for future pathomics research.
PMID:38545828 | DOI:10.1002/cam4.6947
Inference of Developmental Hierarchy and Functional States of Exhausted T Cells from Epigenetic Profiles with Deep Learning
J Chem Inf Model. 2024 Mar 28. doi: 10.1021/acs.jcim.4c00261. Online ahead of print.
ABSTRACT
Exhausted T cells are a key component of immune cells that play a crucial role in the immune response against cancer and influence the efficacy of immunotherapy. Accurate assessment and measurement of T-cell exhaustion (TEX) are critical for understanding the heterogeneity of TEX in the tumor microenvironment (TME) and tailoring individualized immunotherapeutic strategies. In this study, we introduced DeepEpiTEX, a novel computational framework based on deep neural networks, for inferring the developmental hierarchy and functional states of exhausted T cells in the TME from epigenetic profiles. DeepEpiTEX was trained using various modalities of epigenetic data, including DNA methylation data, microRNA expression data, and long non-coding RNA expression data from 30 bulk solid cancer types in the TCGA pan-cancer cohort, and identified five optimal TEX subsets with significant survival differences across the majority of cancer types. The performance of DeepEpiTEX was further evaluated and validated in external multi-center and multi-type cancer cohorts, consistently demonstrating its generalizability and applicability in different experimental settings. In addition, we discovered the potential relationship between TEX subsets identified by DeepEpiTEX and the response to immune checkpoint blockade therapy, indicating that individuals with immune-favorable TEX subsets may experience the greatest benefits. In conclusion, our study sheds light on the role of epigenetic regulation in TEX and provides a powerful and promising tool for categorizing TEX in different disease settings.
PMID:38545680 | DOI:10.1021/acs.jcim.4c00261
Predicting skin cancer risk from facial images with an explainable artificial intelligence (XAI) based approach: a proof-of-concept study
EClinicalMedicine. 2024 Mar 19;71:102550. doi: 10.1016/j.eclinm.2024.102550. eCollection 2024 May.
ABSTRACT
BACKGROUND: Efficient identification of individuals at high risk of skin cancer is crucial for implementing personalized screening strategies and subsequent care. While Artificial Intelligence holds promising potential for predictive analysis using image data, its application for skin cancer risk prediction utilizing facial images remains unexplored. We present a neural network-based explainable artificial intelligence (XAI) approach for skin cancer risk prediction based on 2D facial images and compare its efficacy to 18 established skin cancer risk factors using data from the Rotterdam Study.
METHODS: The study employed data from the Rotterdam population-based study in which both skin cancer risk factors and 2D facial images and the occurrence of skin cancer were collected from 2010 to 2018. We conducted a deep-learning survival analysis based on 2D facial images using our developed XAI approach. We subsequently compared these results with survival analysis based on skin cancer risk factors using cox proportional hazard regression.
FINDINGS: Among the 2810 participants (mean Age = 68.5 ± 9.3 years, average Follow-up = 5.0 years), 228 participants were diagnosed with skin cancer after photo acquisition. Our XAI approach achieved superior predictive accuracy based on 2D facial images (c-index = 0.72, 95% CI: 0.70-0.74), outperforming that of the known risk factors (c-index = 0.59, 95% CI 0.57-0.61).
INTERPRETATION: This proof-of-concept study underscores the high potential of harnessing facial images and a tailored XAI approach as an easily accessible alternative over known risk factors for identifying individuals at high risk of skin cancer.
FUNDING: The Rotterdam Study is funded through unrestricted research grants from Erasmus Medical Center and Erasmus University, Rotterdam, Netherlands Organization for the Health Research and Development (ZonMw), the Research Institute for Diseases in the Elderly (RIDE), the Ministry of Education, Culture and Science, the Ministry for Health, Welfare and Sports, the European Commission (DG XII), and the Municipality of Rotterdam. G.V. Roshchupkin is supported by the ZonMw Veni grant (Veni, 549 1936320).
PMID:38545426 | PMC:PMC10965465 | DOI:10.1016/j.eclinm.2024.102550
PEMFC model identification using a squeezenet developed by modified transient search optimization algorithm
Heliyon. 2024 Mar 9;10(6):e27555. doi: 10.1016/j.heliyon.2024.e27555. eCollection 2024 Mar 30.
ABSTRACT
Proton Exchange Membrane Fuel Cells (PEMFCs) are promising sources of clean and renewable energy, but their performance and efficiency depend on an accurate modeling and identification of their system parameters. However, existing methods for PEMFC modeling suffer from drawbacks, such as slow convergence, high computational cost, and low accuracy. To address these challenges, this research work proposes an enhanced approach that combines a modified version of the SqueezeNet model, a deep learning architecture that reduces the number of parameters and computations, and a new optimization algorithm called the Modified Transient Search Optimization (MTSO) Algorithm, which improves the exploration and exploitation abilities of the search process. The proposed approach is applied to model the output voltage of the PEMFC under different operating conditions, and the results are compared with empirical data and two other state-of-the-art methods: Gated Recurrent Unit and Improved Manta Ray Foraging Optimization (GRU/IMRFO) and Grey Neural Network Model integrated with Particle Swarm Optimization (GNNM/PSO). The comparison shows that the proposed approach achieves the lowest Sum of Squared Errors (SSE) and the highest accuracy, demonstrating its superiority and effectiveness in PEMFC modeling. The proposed approach can facilitate the optimal design, control, and monitoring of PEMFC systems in various applications.
PMID:38545225 | PMC:PMC10965477 | DOI:10.1016/j.heliyon.2024.e27555
Can public opinions improve the effect of financial early warning ? -- an empirical study on the new energy industry
Heliyon. 2024 Mar 3;10(6):e26169. doi: 10.1016/j.heliyon.2024.e26169. eCollection 2024 Mar 30.
ABSTRACT
Public opinion will significantly affect investor decision-making and stock prices, which ultimately has an impact on the long-term development of the new energy industry. This paper mainly aims to delve in the impact of public opinion on the efficacy of financial risk early warning effect and try to establish an enhanced financial risk early warning model for the new energy list companies. To achieve this, we collect the financial data and public evaluation texts of 185 new energy listed companies, converting the text into emotional indicators which are combined with financial indicators to build a financial risk early warning model for new energy listed companies. The contributions of this paper are as follows: (1) The experiment validation demonstrates that the combination of 7 deep learning models and Bagging algorithm highly improves the accuracy of the sentiment analysis model, achieving an accuracy of 84.09%. (2) The accuracy of financial early warning models is generally enhanced after adding sentiment indicators, among which the accuracy of the BP neural network model reached 95.78%. (3) Through clustering analysis, the evaluation models can productively divide the warning intervals, thereby bolstering the interpretability and applicability of early warning results. Therefore, we suggest that when establishing the financial early warning system, it's necessary to take public opinions into consideration. Aside from improving the early warning effect, it also can be used as a separate indicator for daily monitoring.
PMID:38545220 | PMC:PMC10965472 | DOI:10.1016/j.heliyon.2024.e26169
Sports activity (SA) recognition based on error correcting output codes (ECOC) and convolutional neural network (CNN)
Heliyon. 2024 Mar 19;10(6):e28258. doi: 10.1016/j.heliyon.2024.e28258. eCollection 2024 Mar 30.
ABSTRACT
The increasing use of motion sensors is causing major changes in the process of monitoring people's activities. One of the main applications of these sensors is the detection of sports activities, for example, they can be used to monitor the condition of athletes or analyze the quality of sports training. Although the existing sensor-based activity recognition systems can recognize basic activities such as: walking, running, or sitting; they don't perform well in recognizing different types of sports activities. This article introduces a new model based on machine learning (ML) techniques to more accurately distinguish between sports and everyday activities. In the proposed method, the necessary data to detect the type of activity is collected through his two sensors: an accelerometer and a gyroscope attached to a person's foot. For this purpose, the input signals are first preprocessed and then short-time Fourier transform (STFT) is used to describe the characteristics of each signal. In the next step, each STFT matrix is used as input to a convolutional neural network (CNN). This CNN describes various motion characteristics of the sensor in the form of vectors. Finally, a classification model based on error correction output code (ECOC) is used to classify the extracted features and detect the type of SA. The performance of the proposed AS recognition method is evaluated using the DSADS database and the results are compared with previous methods. Based on the results, the proposed method can recognize sports activities with an accuracy of 99.71. Furthermore, the performance of the proposed method based on precision and recall criteria are 99.72 and 99.71, respectively, which are better than the compared methods.
PMID:38545217 | PMC:PMC10965824 | DOI:10.1016/j.heliyon.2024.e28258
Synthesis of virtual monoenergetic images from kilovoltage peak images using wavelet loss enhanced CycleGAN for improving radiomics features reproducibility
Quant Imaging Med Surg. 2024 Mar 15;14(3):2370-2390. doi: 10.21037/qims-23-922. Epub 2024 Mar 7.
ABSTRACT
BACKGROUND: Dual-energy computed tomography (CT) can provide a range of image information beyond conventional CT through virtual monoenergetic images (VMIs). The purpose of this study was to investigate the impact of material decomposition in detector-based spectral CT on radiomics features and effectiveness of using deep learning-based image synthesis to improve the reproducibility of radiomics features.
METHODS: In this paper, spectral CT image data from 45 esophageal cancer patients were collected for investigation retrospectively. First, we computed the correlation coefficient of radiomics features between conventional kilovoltage peak (kVp) CT images and VMI. Then, a wavelet loss-enhanced CycleGAN (WLL-CycleGAN) with paired loss terms was developed to synthesize virtual monoenergetic CT images from the corresponding conventional single-energy CT (SECT) images for improving radiomics reproducibility. Finally, the radiomic features in 6 different categories, including gray-level co-occurrence matrix (GLCM), gray-level difference matrix (GLDM), gray-level run-length matrix (GLRLM), gray-level size-zone matrix (GLSZM), neighborhood gray-tone difference matrix (NGTDM), and wavelet, were extracted from the gross tumor volumes from conventional single energy CT, synthetic virtual monoenergetic CT images, and virtual monoenergetic CT images. Comparison between errors in the VMI and synthetic VMI (sVMI) suggested that the performance of our proposed deep learning method improved the radiomic feature accuracy.
RESULTS: Material decomposition of dual-layer dual-energy CT (DECT) can substantially influence the reproducibility of the radiomic features, and the degree of impact is feature dependent. The average reduction of radiomics errors for 15 patients in testing sets was 96.9% for first-order, 12.1% for GLCM, 12.9% for GLDM, 15.7% for GLRLM, 50.3% for GLSZM, 53.4% for NGTDM, and 6% for wavelet features.
CONCLUSIONS: The work revealed that material decomposition has a significant effect on the radiomic feature values. The deep learning-based method reduced the influence of material decomposition in VMIs and might improve the robustness and reproducibility of radiomic features in esophageal cancer. Quantitative results demonstrated that our proposed wavelet loss-enhanced paired CycleGAN outperforms the original CycleGAN.
PMID:38545083 | PMC:PMC10963845 | DOI:10.21037/qims-23-922
Comparison of deep-learning and radiomics-based machine-learning methods for the identification of chronic obstructive pulmonary disease on low-dose computed tomography images
Quant Imaging Med Surg. 2024 Mar 15;14(3):2485-2498. doi: 10.21037/qims-23-1307. Epub 2024 Mar 5.
ABSTRACT
BACKGROUND: Radiomics and artificial intelligence approaches have been developed to predict chronic obstructive pulmonary disease (COPD), but it is still unclear which approach has the best performance. Therefore, we established five prediction models that employed deep-learning (DL) and radiomics-based machine-learning (ML) approaches to identify COPD on low-dose computed tomography (LDCT) images and compared the relative performance of the different models to find the best model for identifying COPD.
METHODS: This retrospective analysis included 1,024 subjects (169 COPD patients and 855 control subjects) who underwent LDCT scans from August 2018 to July 2021. Five prediction models, including models that employed computed tomography (CT)-based radiomics features, chest CT images, quantitative lung density parameters, and demographic and clinical characteristics, were established to identify COPD by DL or ML approaches. Model 1 used CT-based radiomics features by ML method. Model 2 used a combination of CT-based radiomics features, lung density parameters, and demographic and clinical characteristics by ML method. Model 3 used CT images only by DL method. Model 4 used a combination of CT images, lung density parameters, and demographic and clinical characteristics by DL method. Model 5 used a combination of CT images, CT-based radiomics features, lung density parameters, and demographic and clinical characteristics by DL method. The accuracy, sensitivity, specificity, highest negative predictive values (NPVs), positive predictive values, and areas under the receiver operating characteristic (AUC) curve of the five prediction models were compared to examine their performance. The DeLong test was used to compare the AUCs of the different models.
RESULTS: In total, 107 radiomics features were extracted from each subject's CT images, 17 lung density parameters were acquired by quantitative measurement, and 18 selected demographic and clinical characteristics were recorded in this study. Model 2 had the highest AUC [0.73, 95% confidence interval (CI): 0.64-0.82], while model 3 had the lowest AUC (0.65, 95% CI: 0.55-0.75) in the test set. Model 2 also had the highest sensitivity (0.84), the highest accuracy (0.81), and the highest NPV (0.36). In the test set, based on the AUC results, Model 2 significantly outperformed Model 1 (P=0.03).
CONCLUSIONS: The results showed that the identification ability of models that employ CT-based radiomics features combined with lung density parameters, and demographic and clinical characteristics using ML methods performed better than the chest CT image-based DL methods. ML methods are more suitable and beneficial for COPD identification.
PMID:38545077 | PMC:PMC10963838 | DOI:10.21037/qims-23-1307
Automated anatomical landmark detection on 3D facial images using U-NET-based deep learning algorithm
Quant Imaging Med Surg. 2024 Mar 15;14(3):2466-2474. doi: 10.21037/qims-22-1108. Epub 2024 Mar 4.
ABSTRACT
BACKGROUND: Facial anthropometry based on 3-dimensional (3D) imaging technology, or 3D photogrammetry, has gained increasing popularity among surgeons. It outperforms direct measurement and 2-dimensional (2D) photogrammetry because of many advantages. However, a main limitation of 3D photogrammetry is the time-consuming process of manual landmark localization. To address this problem, this study developed a U-NET-based deep learning algorithm to enable automated and accurate anatomical landmark detection on 3D facial models.
METHODS: The main structure of the algorithm stacked 2 U-NETs. In each U-NET block, we used 3×3 convolution kernel and rectified linear unit (ReLU) as activation function. A total of 200 3D images of healthy cases, acromegaly patients, and localized scleroderma patients were captured by Vectra H1 handheld 3D camera and input for algorithm training. The algorithm was tested to detect 20 landmarks on 3D images. Percentage of correct key points (PCK) and normalized mean error (NME) were used to evaluate facial landmark detection accuracy.
RESULTS: Among healthy cases, the average NME was 1.4 mm. The PCK reached 90% when the threshold was set to the clinically acceptable limit of 2 mm. The average NME was 2.8 and 2.2 mm among acromegaly patients and localized scleroderma patients, respectively.
CONCLUSIONS: This study developed a deep learning algorithm for automated facial landmark detection on 3D images. The algorithm was innovatively validated in 3 different groups of participants. It achieved accurate landmark detection and improved the efficiency of 3D image analysis.
PMID:38545057 | PMC:PMC10963813 | DOI:10.21037/qims-22-1108
Domain generalization across tumor types, laboratories, and species - Insights from the 2022 edition of the Mitosis Domain Generalization Challenge
Med Image Anal. 2024 Mar 22;94:103155. doi: 10.1016/j.media.2024.103155. Online ahead of print.
ABSTRACT
Recognition of mitotic figures in histologic tumor specimens is highly relevant to patient outcome assessment. This task is challenging for algorithms and human experts alike, with deterioration of algorithmic performance under shifts in image representations. Considerable covariate shifts occur when assessment is performed on different tumor types, images are acquired using different digitization devices, or specimens are produced in different laboratories. This observation motivated the inception of the 2022 challenge on MItosis Domain Generalization (MIDOG 2022). The challenge provided annotated histologic tumor images from six different domains and evaluated the algorithmic approaches for mitotic figure detection provided by nine challenge participants on ten independent domains. Ground truth for mitotic figure detection was established in two ways: a three-expert majority vote and an independent, immunohistochemistry-assisted set of labels. This work represents an overview of the challenge tasks, the algorithmic strategies employed by the participants, and potential factors contributing to their success. With an F1 score of 0.764 for the top-performing team, we summarize that domain generalization across various tumor domains is possible with today's deep learning-based recognition pipelines. However, we also found that domain characteristics not present in the training set (feline as new species, spindle cell shape as new morphology and a new scanner) led to small but significant decreases in performance. When assessed against the immunohistochemistry-assisted reference standard, all methods resulted in reduced recall scores, with only minor changes in the order of participants in the ranking.
PMID:38537415 | DOI:10.1016/j.media.2024.103155
Predicting treatment plan approval probability for high-dose-rate brachytherapy of cervical cancer using adversarial deep learning
Phys Med Biol. 2024 Mar 27. doi: 10.1088/1361-6560/ad3880. Online ahead of print.
ABSTRACT
Predicting the probability of having the plan approved by the physician is important for automatic treatment planning. Driven by the mathematical foundation of deep learning that can use a deep neural network to represent functions accurately and flexibly, we developed a deep-learning framework that learns the probability of plan approval for cervical cancer high-dose-rate brachytherapy (HDRBT).
Approach: The system consisted of a dose prediction network (DPN) and a plan-approval probability network (PPN). DPN predicts OAR D2cc and CTV D90% of the current fraction from the patient's current anatomy and prescription dose of HDRBT. PPN outputs the probability of a given plan being acceptable to the physician based on the patient's anatomy and the total dose combining HDRBT and external beam radiotherapy sessions. Training of the networks was achieved by first training them separately for a good initialization, and then jointly via an adversarial process. We collected approved treatment plans of 248 treatment fractions from 63 patients. Among them, 216 plans from 54 patients were employed in a four-fold cross validation study, and the remaining 32 plans from other 9 patients were saved for independent testing.
Main results: DPN predicted equivalent dose of 2 Gy for bladder, rectum, sigmoid D2cc and CTV D90% with a relative error of 11.51±6.92%, 8.23±5.75%, 7.12±6.00%, and 10.16±10.42%, In a task that differentiates clinically approved plans and disapproved plans generated by perturbing doses in ground truth approved plans by 20%, PPN achieved accuracy, sensitivity, specificity, and area under the curve 0.70, 0.74, 0.65, and 0.74. 
Significance: We demonstrated the feasibility of developing a novel deep-learning framework that predicts a probability of plan approval for HDRBT of cervical cancer, which is an essential component in automatic treatment planning.
PMID:38537309 | DOI:10.1088/1361-6560/ad3880
Technical and functional design considerations for a real-world interpretable AI solution for NIR perfusion analysis (including cancer)
Eur J Surg Oncol. 2024 Mar 18:108273. doi: 10.1016/j.ejso.2024.108273. Online ahead of print.
ABSTRACT
Near infrared (NIR) analysis of tissue perfusion via indocyanine green fluorescence assessment is performed clinically during surgery for a range of indications. Its usefulness can potentially be further enhanced through the application of interpretable artificial intelligence (AI) methods to improve dynamic interpretation accuracy in these and also open new applications. While its main use currently is for perfusion assessment as a tissue health check prior to performing an anastomosis, there is increasing interest in using fluorophores for cancer detection during surgical interventions with most research being based on the paradigm of static imaging for fluorophore uptake hours after preoperative dosing. Although some image boosting and relative estimation of fluorescence signals is already inbuilt into commercial NIR systems, fuller implementation of AI methods can enable actionable predictions especially when applied during the dynamic, early inflow-outflow phase that occurs seconds to minutes after ICG (or indeed other fluorophore) administration. Already research has shown that such methods can accurately differentiate cancer from benign tissue in the operating theatre in real time in principle based on their differential signalling and could be useful for tissue perfusion classification more generally. This can be achieved through the generation of fluorescence intensity curves from an intra-operative NIR video stream. These curves are processed to adjust for image disturbances and curve features known to be influential in tissue characterisation are extracted. Existing machine learning based classifiers can then use these features to classify the tissue in question according to prior training sets. The use of this interpretable methodology enables accurate classification algorithms to be built with modest training sets in comparison to those required for deep learning modelling in addition to achieving compliance with medical device regulations. Integration of the multiple algorithms required to achieve this classification into a desktop application or medical device could make the use of this method accessible and useful to (as well as useable by) surgeons without prior training in computer technology. This document details some technical and functional design considerations underlying such a novel recommender system to advance the foundational concept and methodology as software as medical device for in situ cancer characterisation with relevance more broadly also to other tissue perfusion applications.
PMID:38538505 | DOI:10.1016/j.ejso.2024.108273