Deep learning
Sustainable Sea of Internet of Things: Wind Energy Harvesting System for Unmanned Surface Vehicles
ACS Appl Mater Interfaces. 2024 May 20. doi: 10.1021/acsami.4c05142. Online ahead of print.
ABSTRACT
Harvesting wind energy from the environment and integrating it with the internet of things and artificial intelligence to enable intelligent ocean environment monitoring are effective approach. There are some challenges that limit the performance of wind energy harvesters, such as the larger start-up torque and the narrow operational wind speed range. To address these issues, this paper proposes a wind energy harvesting system with a self-regulation strategy based on piezoelectric and electromagnetic effects to achieve state monitoring for unmanned surface vehicles (USVs). The proposed energy harvesting system comprises eight rotation units with centrifugal adaptation and four piezoelectric units with a magnetic coupling mechanism, which can further reduce the start-up torque and expand the wind speed range. The dynamic model of the energy harvester with the centrifugal effect is explored, and the corresponding structural parameters are analyzed. The simulation and experimental results show that it can obtain a maximum average power of 23.25 mW at a wind speed of 8 m/s. Furthermore, three different magnet configurations are investigated, and the optimal configuration can effectively decrease the resistance torque by 91.25% compared with the traditional mode. A prototype is manufactured, and the test result shows that it can charge a 2200 μF supercapacitor to 6.2 V within 120 s, which indicates that it has a great potential to achieve the self-powered low-power sensors. Finally, a deep learning algorithm is applied to detect the stability of the operation, and the average accuracy reached 95.33%, which validates the feasibility of the state monitoring of USVs.
PMID:38768307 | DOI:10.1021/acsami.4c05142
Versatile multiple object tracking in sparse 2D/3D videos via deformable image registration
PLoS Comput Biol. 2024 May 20;20(5):e1012075. doi: 10.1371/journal.pcbi.1012075. Online ahead of print.
ABSTRACT
Tracking body parts in behaving animals, extracting fluorescence signals from cells embedded in deforming tissue, and analyzing cell migration patterns during development all require tracking objects with partially correlated motion. As dataset sizes increase, manual tracking of objects becomes prohibitively inefficient and slow, necessitating automated and semi-automated computational tools. Unfortunately, existing methods for multiple object tracking (MOT) are either developed for specific datasets and hence do not generalize well to other datasets, or require large amounts of training data that are not readily available. This is further exacerbated when tracking fluorescent sources in moving and deforming tissues, where the lack of unique features and sparsely populated images create a challenging environment, especially for modern deep learning techniques. By leveraging technology recently developed for spatial transformer networks, we propose ZephIR, an image registration framework for semi-supervised MOT in 2D and 3D videos. ZephIR can generalize to a wide range of biological systems by incorporating adjustable parameters that encode spatial (sparsity, texture, rigidity) and temporal priors of a given data class. We demonstrate the accuracy and versatility of our approach in a variety of applications, including tracking the body parts of a behaving mouse and neurons in the brain of a freely moving C. elegans. We provide an open-source package along with a web-based graphical user interface that allows users to provide small numbers of annotations to interactively improve tracking results.
PMID:38768230 | DOI:10.1371/journal.pcbi.1012075
A multimodal Transformer Network for protein-small molecule interactions enhances predictions of kinase inhibition and enzyme-substrate relationships
PLoS Comput Biol. 2024 May 20;20(5):e1012100. doi: 10.1371/journal.pcbi.1012100. Online ahead of print.
ABSTRACT
The activities of most enzymes and drugs depend on interactions between proteins and small molecules. Accurate prediction of these interactions could greatly accelerate pharmaceutical and biotechnological research. Current machine learning models designed for this task have a limited ability to generalize beyond the proteins used for training. This limitation is likely due to a lack of information exchange between the protein and the small molecule during the generation of the required numerical representations. Here, we introduce ProSmith, a machine learning framework that employs a multimodal Transformer Network to simultaneously process protein amino acid sequences and small molecule strings in the same input. This approach facilitates the exchange of all relevant information between the two molecule types during the computation of their numerical representations, allowing the model to account for their structural and functional interactions. Our final model combines gradient boosting predictions based on the resulting multimodal Transformer Network with independent predictions based on separate deep learning representations of the proteins and small molecules. The resulting predictions outperform recently published state-of-the-art models for predicting protein-small molecule interactions across three diverse tasks: predicting kinase inhibitions; inferring potential substrates for enzymes; and predicting Michaelis constants KM. The Python code provided can be used to easily implement and improve machine learning predictions involving arbitrary protein-small molecule interactions.
PMID:38768223 | DOI:10.1371/journal.pcbi.1012100
Deep learning and digital pathology powers prediction of HCC development in steatotic liver disease
Hepatology. 2024 May 20. doi: 10.1097/HEP.0000000000000904. Online ahead of print.
ABSTRACT
BACKGROUND AND AIMS: Identifying patients with steatotic liver disease who are at a high risk of developing HCC remains challenging. We present a deep learning (DL) model to predict HCC development using hematoxylin and eosin-stained whole-slide images of biopsy-proven steatotic liver disease.
APPROACH AND RESULTS: We included 639 patients who did not develop HCC for ≥7 years after biopsy (non-HCC class) and 46 patients who developed HCC <7 years after biopsy (HCC class). Paired cases of the HCC and non-HCC classes matched by biopsy date and institution were used for training, and the remaining nonpaired cases were used for validation. The DL model was trained using deep convolutional neural networks with 28,000 image tiles cropped from whole-slide images of the paired cases, with an accuracy of 81.0% and an AUC of 0.80 for predicting HCC development. Validation using the nonpaired cases also demonstrated a good accuracy of 82.3% and an AUC of 0.84. These results were comparable to the predictive ability of logistic regression model using fibrosis stage. Notably, the DL model also detected the cases of HCC development in patients with mild fibrosis. The saliency maps generated by the DL model highlighted various pathological features associated with HCC development, including nuclear atypia, hepatocytes with a high nuclear-cytoplasmic ratio, immune cell infiltration, fibrosis, and a lack of large fat droplets.
CONCLUSIONS: The ability of the DL model to capture subtle pathological features beyond fibrosis suggests its potential for identifying early signs of hepatocarcinogenesis in patients with steatotic liver disease.
PMID:38768142 | DOI:10.1097/HEP.0000000000000904
A confounder controlled machine learning approach: Group analysis and classification of schizophrenia and Alzheimer's disease using resting-state functional network connectivity
PLoS One. 2024 May 20;19(5):e0293053. doi: 10.1371/journal.pone.0293053. eCollection 2024.
ABSTRACT
Resting-state functional magnetic resonance imaging (rs-fMRI) has increasingly been used to study both Alzheimer's disease (AD) and schizophrenia (SZ). While most rs-fMRI studies being conducted in AD and SZ compare patients to healthy controls, it is also of interest to directly compare AD and SZ patients with each other to identify potential biomarkers shared between the disorders. However, comparing patient groups collected in different studies can be challenging due to potential confounds, such as differences in the patient's age, scan protocols, etc. In this study, we compared and contrasted resting-state functional network connectivity (rs-FNC) of 162 patients with AD and late mild cognitive impairment (LMCI), 181 schizophrenia patients, and 315 cognitively normal (CN) subjects. We used confounder-controlled rs-FNC and applied machine learning algorithms (including support vector machine, logistic regression, random forest, and k-nearest neighbor) and deep learning models (i.e., fully-connected neural networks) to classify subjects in binary and three-class categories according to their diagnosis labels (e.g., AD, SZ, and CN). Our statistical analysis revealed that FNC between the following network pairs is stronger in AD compared to SZ: subcortical-cerebellum, subcortical-cognitive control, cognitive control-cerebellum, and visual-sensory motor networks. On the other hand, FNC is stronger in SZ than AD for the following network pairs: subcortical-visual, subcortical-auditory, subcortical-sensory motor, cerebellum-visual, sensory motor-cognitive control, and within the cerebellum networks. Furthermore, we observed that while AD and SZ disorders each have unique FNC abnormalities, they also share some common functional abnormalities that can be due to similar neurobiological mechanisms or genetic factors contributing to these disorders' development. Moreover, we achieved an accuracy of 85% in classifying subjects into AD and SZ where default mode, visual, and subcortical networks contributed the most to the classification and accuracy of 68% in classifying subjects into AD, SZ, and CN with the subcortical domain appearing as the most contributing features to the three-way classification. Finally, our findings indicated that for all classification tasks, except AD vs. SZ, males are more predictable than females.
PMID:38768123 | DOI:10.1371/journal.pone.0293053
Res2Net-based multi-scale and multi-attention model for traffic scene image classification
PLoS One. 2024 May 20;19(5):e0300017. doi: 10.1371/journal.pone.0300017. eCollection 2024.
ABSTRACT
With the increasing applications of traffic scene image classification in intelligent transportation systems, there is a growing demand for improved accuracy and robustness in this classification task. However, due to weather conditions, time, lighting variations, and annotation costs, traditional deep learning methods still have limitations in extracting complex traffic scene features and achieving higher recognition accuracy. The previous classification methods for traffic scene images had gaps in multi-scale feature extraction and the combination of frequency domain, spatial, and channel attention. To address these issues, this paper proposes a multi-scale and multi-attention model based on Res2Net. Our proposed framework introduces an Adaptive Feature Refinement Pyramid Module (AFRPM) to enhance multi-scale feature extraction, thus improving the accuracy of traffic scene image classification. Additionally, we integrate frequency domain and spatial-channel attention mechanisms to develop recognition capabilities for complex backgrounds, objects of different scales, and local details in traffic scene images. The paper conducts the task of classifying traffic scene images using the Traffic-Net dataset. The experimental results demonstrate that our model achieves an accuracy of 96.88% on this dataset, which is an improvement of approximately 2% compared to the baseline Res2Net network. Furthermore, we validate the effectiveness of the proposed modules through ablation experiments.
PMID:38768119 | DOI:10.1371/journal.pone.0300017
FPNC Net: A hydrogenation catalyst image recognition algorithm based on deep learning
PLoS One. 2024 May 20;19(5):e0300924. doi: 10.1371/journal.pone.0300924. eCollection 2024.
ABSTRACT
The identification research of hydrogenation catalyst information has always been one of the most important businesses in the chemical industry. In order to aid researchers in efficiently screening high-performance catalyst carriers and tackle the pressing challenge at hand, it is imperative to find a solution for the intelligent recognition of hydrogenation catalyst images. To address the issue of low recognition accuracy caused by adhesion and stacking of hydrogenation catalysts, An image recognition algorithm of hydrogenation catalyst based on FPNC Net was proposed in this paper. In the present study, Resnet50 backbone network was used to extract the features, and spatially-separable convolution kernel was used to extract the multi-scale features of catalyst fringe. In addition, to effectively segment the adhesive regions of stripes, FPN (Feature Pyramid Network) is added to the backbone network for deep and shallow feature fusion. Introducing an attention module to adaptively adjust weights can effectively highlight the target features of the catalyst. The experimental results showed that the FPNC Net model achieved an accuracy of 94.2% and an AP value improvement of 19.37% compared to the original CenterNet model. The improved model demonstrates a significant enhancement in detection accuracy, indicating a high capability for detecting hydrogenation catalyst targets.
PMID:38768105 | DOI:10.1371/journal.pone.0300924
Classifying Routine Clinical Electroencephalograms with Multivariate Iterative Filtering and Convolutional Neural Networks
IEEE Trans Neural Syst Rehabil Eng. 2024 May 20;PP. doi: 10.1109/TNSRE.2024.3403198. Online ahead of print.
ABSTRACT
Electroencephalography (EEG) is a non-invasive tool widely used in basic and clinical neuroscience to explore neural states in various populations, and classifying these EEG recordings is a fundamental challenge. While machine learning shows promising results in classifying long multivariate time series, optimal prediction models and feature extraction methods for EEG classification remain elusive. Our study addressed the problem of EEG classification under the framework of brain age prediction, applying a deep learning model on EEG time series. We hypothesized that decomposing EEG signals into pseudostationary waveforms would yield more accurate age predictions than using raw or canonically frequency-filtered EEG. Specifically, we employed multivariate intrinsic mode functions (MIMFs), an empirical mode decomposition (EMD) variant based on multivariate iterative filtering (MIF), with a convolutional neural network (CNN) model. Testing a large dataset of routine clinical EEG scans (n = 6540) from patients aged 1 to 103 years, we found that an ad-hoc CNN model without fine-tuning could reasonably predict age from EEGs. Crucially, MIMF decomposition significantly improved performance compared to canonical brain rhythms (from delta to lower gamma oscillations). Our approach achieved a mean absolute error (MAE) of 13.76 ± 0.33 and a correlation coefficient of 0.64 ± 0.01 in brain age prediction over the entire lifespan. Our findings indicate that CNN models applied to EEGs, preserving their original temporal structure, remains a promising framework for EEG classification, wherein the time-frequency decompositions such as the MIF can enhance CNN models' performance in this task.
PMID:38768007 | DOI:10.1109/TNSRE.2024.3403198
Segmentation Guided Crossing Dual Decoding Generative Adversarial Network for Synthesizing Contrast-Enhanced Computed Tomography Images
IEEE J Biomed Health Inform. 2024 May 20;PP. doi: 10.1109/JBHI.2024.3403199. Online ahead of print.
ABSTRACT
Although contrast-enhanced computed tomography (CE-CT) images significantly improve the accuracy of diagnosing focal liver lesions (FLLs), the administration of contrast agents imposes a considerable physical burden on patients. The utilization of generative models to synthesize CE-CT images from non-contrasted CT images offers a promising solution. However, existing image synthesis models tend to overlook the importance of critical regions, inevitably reducing their effectiveness in downstream tasks. To overcome this challenge, we propose an innovative CE-CT image synthesis model called Segmentation Guided Crossing Dual Decoding Generative Adversarial Network (SGCDD-GAN). Specifically, the SGCDD-GAN involves a crossing dual decoding generator including an attention decoder and an improved transformation decoder. The attention decoder is designed to highlight some critical regions within the abdominal cavity, while the improved transformation decoder is responsible for synthesizing CE-CT images. These two decoders are interconnected using a crossing technique to enhance each other's capabilities. Furthermore, we employ a multi-task learning strategy to guide the generator to focus more on the lesion area. To evaluate the performance of proposed SGCDD-GAN, we test it on an in-house CE-CT dataset. In both CE-CT image synthesis tasks-namely, synthesizing ART images and synthesizing PV images-the proposed SGCDD-GAN demonstrates superior performance metrics across the entire image and liver region, including SSIM, PSNR, MSE, and PCC scores. Furthermore, CE-CT images synthetized from our SGCDD-GAN achieve remarkable accuracy rates of 82.68%, 94.11%, and 94.11% in a deep learning-based FLLs classification task, along with a pilot assessment conducted by two radiologists.
PMID:38768004 | DOI:10.1109/JBHI.2024.3403199
Frequency Domain Deep Learning With Non-Invasive Features for Intraoperative Hypotension Prediction
IEEE J Biomed Health Inform. 2024 May 20;PP. doi: 10.1109/JBHI.2024.3403109. Online ahead of print.
ABSTRACT
BACKGROUND: Intraoperative hypotension can lead to postoperative organ dysfunction. Previous studies primarily used invasive arterial pressure as the key biosignal for the detection of hypotension. However, these studies had limitations in incorporating different biosignal modalities and utilizing the periodic nature of biosignals. To address these limitations, we utilized frequency-domain information, which provides key insights that time-domain analysis cannot provide, as revealed by recent advances in deep learning. With the frequency-domain information, we propose a deep-learning approach that integrates multiple biosignal modalities.
METHODS: We used the discrete Fourier transform technique, to extract frequency information from biosignal data, which we then combined with the original time-domain data as input for our deep learning model. To improve the interpretability of our results, we incorporated recent interpretable modules for deep-learning models into our analysis.
RESULTS: We constructed 75,994 segments from the data of 3,226 patients to predict hypotension during surgery. Our proposed frequency-domain deep-learning model outperformed conventional approaches that rely solely on time-domain information. Notably, our model achieved a greater increase in AUROC performance than the time-domain deep learning models when trained on non-invasive biosignal data only (AUROC 0.898 [95% CI: 0.885-0.91] vs. 0.853 [95% CI: 0.839-0.867]). Further analysis revealed that the 1.5-3.0 Hz frequency band played an important role in predicting hypotension events.
CONCLUSION: Utilizing the frequency domain not only demonstrated high performance on invasive data but also showed significant performance improvement when applied to non-invasive data alone. Our proposed framework offers clinicians a novel perspective for predicting intraoperative hypotension.
PMID:38768003 | DOI:10.1109/JBHI.2024.3403109
Enhanced crop health monitoring: attention convolutional stacked recurrent networks and binary Kepler search for early detection of paddy crop issues
Environ Monit Assess. 2024 May 20;196(6):561. doi: 10.1007/s10661-024-12504-6.
ABSTRACT
The diseases that affect the plants cannot be easily avoided due to rapid and substantial changes in the environment and climate. Generally, paddy crops are affected by several conditions including pests and nutritional deficiencies. Hence, it is important to detect these disease-affected paddy crops at an early stage for better productivity. To detect and classify the problems in this specific domain, deep learning approaches are utilized. In this paper, a novel attention convolutional stacked recurrent based binary Kepler search (ACSR-BKS) algorithm is used to detect diseases, nutritional deficiencies, and pest patterns at an early stage via diverse significant pipelines namely the data augmentation, data pre-processing, and classification phase thereby providing pest patterns and identifying nutritional deficiencies. Subsequent to data collection processes, the images are augmented via zooming, rotating, flipping horizontally, shifting of height, width, and rescaling. To acquire the accurate and best results in terms of classification, the parameters need to be tuned and adjusted using the binary Kepler search algorithm. The results revealed that the accuracy of the proposed ACSR-BKS algorithm is 98.2% in terms of detecting the diseases. Then, the obtained results are compared with the other existing approaches. Additionally, it is revealed that the yield of paddy can also be improved by utilizing the proposed disease-detecting methods.
PMID:38767686 | DOI:10.1007/s10661-024-12504-6
Evaluation of High-Dimensional Data Classification for Skin Malignancy Detection Using DL-Based Techniques
Cancer Invest. 2024 May 20:1-25. doi: 10.1080/07357907.2024.2345184. Online ahead of print.
ABSTRACT
Skin cancer can be detected through visual screening and skin analysis based on the biopsy and pathological state of the human body. The survival rate of cancer patients is low, and millions of people are diagnosed annually. By determining the different comparative analyses, the skin malignancy classification is evaluated. Using the Isomap with the vision transformer, we analyze the high-dimensional images with dimensionality reduction. Skin cancer can present with severe cases and life-threatening symptoms. Overall performance evaluation and classification tend to improve the accuracy of the high-dimensional skin lesion dataset when completed. In deep learning methodologies, the distinct phases of skin malignancy classification are determined by its accuracy, specificity, F1 recall, and sensitivity while implementing the classification methodology. A nonlinear dimensionality reduction technique called Isomap preserves the data's underlying nonlinear relationships intact. This is essential for the categorization of skin malignancies, as the features that separate malignant from benign skin lesions may not be linearly separable. Isomap decreases the data's complexity while maintaining its essential characteristics, making it simpler to analyze and explain the findings. High-dimensional datasets for skin lesions have been evaluated and classified more effectively when evaluated and classified using Isomap with the vision transformer.
PMID:38767503 | DOI:10.1080/07357907.2024.2345184
Early identification of stroke through deep learning with multi-modal human speech and movement data
Neural Regen Res. 2025 Jan 1;20(1):234-241. doi: 10.4103/1673-5374.393103. Epub 2024 Jan 8.
ABSTRACT
JOURNAL/nrgr/04.03/01300535-202501000-00031/figure1/v/2024-05-14T021156Z/r/image-tiff Early identification and treatment of stroke can greatly improve patient outcomes and quality of life. Although clinical tests such as the Cincinnati Pre-hospital Stroke Scale (CPSS) and the Face Arm Speech Test (FAST) are commonly used for stroke screening, accurate administration is dependent on specialized training. In this study, we proposed a novel multimodal deep learning approach, based on the FAST, for assessing suspected stroke patients exhibiting symptoms such as limb weakness, facial paresis, and speech disorders in acute settings. We collected a dataset comprising videos and audio recordings of emergency room patients performing designated limb movements, facial expressions, and speech tests based on the FAST. We compared the constructed deep learning model, which was designed to process multi-modal datasets, with six prior models that achieved good action classification performance, including the I3D, SlowFast, X3D, TPN, TimeSformer, and MViT. We found that the findings of our deep learning model had a higher clinical value compared with the other approaches. Moreover, the multi-modal model outperformed its single-module variants, highlighting the benefit of utilizing multiple types of patient data, such as action videos and speech audio. These results indicate that a multi-modal deep learning model combined with the FAST could greatly improve the accuracy and sensitivity of early stroke identification of stroke, thus providing a practical and powerful tool for assessing stroke patients in an emergency clinical setting.
PMID:38767488 | DOI:10.4103/1673-5374.393103
Authentication with a one-dimensional CNN model using EEG-based brain-computer interface
Comput Methods Biomech Biomed Engin. 2024 May 20:1-12. doi: 10.1080/10255842.2024.2355490. Online ahead of print.
ABSTRACT
Brain-computer interface (BCI) technology uses electroencephalogram (EEG) signals to create a direct interaction between the human body and its surroundings. Motor imagery (MI) classification using EEG signals is an important application that can help a rehabilitated or motor-impaired stroke patient perform certain tasks. Robust classification of these signals is an important step toward making the use of EEG more practical in many applications and less dependent on trained professionals. Deep learning methods have produced impressive results in BCI in recent years, especially with the availability of large electroencephalography (EEG) data sets. Dealing with EEG-MI signals is difficult because noise and other signal sources can interfere with the electrical amplitude of the brain, and its generalization ability is limited, so it is difficult to improve EEG classifiers. To address these issues, this paper presents a methodology based on one-dimensional convolutional neural networks (1-D CNN) for motor imagery (MI) recognition for the right hand, left hand, feet, and sedentary task. The proposed model is a lightweight model with fewer parameters and has an accuracy of 91.75%. Then, in an innovative exploitation of the four output classes, there is an idea that allows people with disabilities who are deprived of security measures, such as entering a secret code, to use the output classification, such as password codes. It is also an idea for a unique authentication system that is more secure and less vulnerable to theft or the like for a healthy person at the same time.
PMID:38767327 | DOI:10.1080/10255842.2024.2355490
Ultra-flat bands at large twist angles in group-V twisted bilayer materials
J Chem Phys. 2024 May 21;160(19):194710. doi: 10.1063/5.0197757.
ABSTRACT
Flat bands in 2D twisted materials are key to the realization of correlation-related exotic phenomena. However, a flat band often was achieved in the large system with a very small twist angle, which enormously increases the computational and experimental complexity. In this work, we proposed group-V twisted bilayer materials, including P, As, and Sb in the β phase with large twist angles. The band structure of twisted bilayer materials up to 2524 atoms has been investigated by a deep learning method DeepH, which significantly reduces the computational time. Our results show that the bandgap and the flat bandwidth of twisted bilayer β-P, β-As, and β-Sb reduce gradually with the decreasing of twist angle, and the ultra-flat band with bandwidth approaching 0 eV is achieved. Interestingly, we found that a twist angle of 9.43° is sufficient to achieve the band flatness for β-As comparable to that of twist bilayer graphene at the magic angle of 1.08°. Moreover, we also find that the bandgap reduces with decreasing interlayer distance while the flat band is still preserved, which suggests interlayer distance as an effective routine to tune the bandgap of flat band systems. Our research provides a feasible platform for exploring physical phenomena related to flat bands in twisted layered 2D materials.
PMID:38767261 | DOI:10.1063/5.0197757
Transfer learning for anatomical structure segmentation in otorhinolaryngology microsurgery
Int J Med Robot. 2024 Jun;20(3):e2634. doi: 10.1002/rcs.2634.
ABSTRACT
BACKGROUND: Reducing the annotation burden is an active and meaningful area of artificial intelligence (AI) research.
METHODS: Multiple datasets for the segmentation of two landmarks were constructed based on 41 257 labelled images and 6 different microsurgical scenarios. These datasets were trained using the multi-stage transfer learning (TL) methodology.
RESULTS: The multi-stage TL enhanced segmentation performance over baseline (mIOU 0.6892 vs. 0.8869). Besides, Convolutional Neural Networks (CNNs) achieved a robust performance (mIOU 0.8917 vs. 0.8603) even when the training dataset size was reduced from 90% (30 078 images) to 10% (3342 images). When directly applying the weight from one certain surgical scenario to recognise the same target in images of other scenarios without training, CNNs still obtained an optimal mIOU of 0.6190 ± 0.0789.
CONCLUSIONS: Model performance can be improved with TL in datasets with reduced size and increased complexity. It is feasible for data-based domain adaptation among different microsurgical fields.
PMID:38767083 | DOI:10.1002/rcs.2634
Distinguishing between aldosterone-producing adenomas and non-functional adrenocortical adenomas using the YOLOv5 network
Acta Radiol. 2024 May 20:2841851241251446. doi: 10.1177/02841851241251446. Online ahead of print.
ABSTRACT
BACKGROUND: You Only Look Once version 5 (YOLOv5), a one-stage deep-learning (DL) algorithm for object detection and classification, offers high speed and accuracy for identifying targets.
PURPOSE: To investigate the feasibility of using the YOLOv5 algorithm to non-invasively distinguish between aldosterone-producing adenomas (APAs) and non-functional adrenocortical adenomas (NF-ACAs) on computed tomography (CT) images.
MATERIAL AND METHODS: A total of 235 patients who were diagnosed with ACAs between January 2011 and July 2022 were included in this study. Of the 215 patients, 81 (37.7%) had APAs and 134 (62.3%) had NF-ACAs' they were randomly divided into either the training set or the validation set at a ratio of 9:1. Another 20 patients, including 8 (40.0%) with APA and 12 (60.0%) with NF-ACA, were collected for the testing set. Five submodels (YOLOv5n, YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x) of YOLOv5 were trained and evaluated on the datasets.
RESULTS: In the testing set, the mAP_0.5 value for YOLOv5x (0.988) was higher than the values for YOLOv5n (0.969), YOLOv5s (0.965), YOLOv5m (0.974), and YOLOv5l (0.983). The mAP_0.5:0.95 value for YOLOv5x (0.711) was also higher than the values for YOLOv5n (0.587), YOLOv5s (0.674), YOLOv5m (0.671), and YOLOv5l (0.698) in the testing set. The inference speed of YOLOv5n was 2.4 ms in the testing set, which was the fastest among the five submodels.
CONCLUSION: The YOLOv5 algorithm can accurately and efficiently distinguish between APAs and NF-ACAs on CT images, especially YOLOv5x has the best identification performance.
PMID:38767055 | DOI:10.1177/02841851241251446
Super-resolution based Nodule Localization in Thyroid Ultrasound Images through Deep Learning
Curr Med Imaging. 2024;20(1):e15734056269264. doi: 10.2174/0115734056269264240408080443.
ABSTRACT
BACKGROUND: Currently, it is difficult to find a solution to the inverse inappropriate problem, which involves restoring a high-resolution image from a lowresolution image contained within a single image. In nature photography, one can capture a wide variety of objects and textures, each with its own characteristics, most notably the high-frequency component. These qualities can be distinguished from each other by looking at the pictures.
OBJECTIVE: The goal is to develop an automated approach to identify thyroid nodules on ultrasound images. The aim of this research is to accurately differentiate thyroid nodules using Deep Learning Technique and to evaluate the effectiveness of different localization techniques.
METHODS: The method used in this research is to reconstruct a single super-resolution image based on segmentation and classification. The poor-quality ultrasound image is divided into several parts, and the best applicable classification is chosen for each component. Pairs of high- and lowresolution images belonging to the same class are found and used to figure out which image is high-resolution for each segment. Deep learning technology, specifically the Adam classifier, is used to identify carcinoid tumors within thyroid nodules. Measures, such as localization accuracy, sensitivity, specificity, dice loss, ROC, and area under the curve (AUC), are used to evaluate the effectiveness of the techniques.
RESULTS: The results of the proposed method are superior, both statistically and qualitatively, compared to other methods that are considered one of the latest and best technologies. The developed automated approach shows promising results in accurately identifying thyroid nodules on ultrasound images.
CONCLUSION: The research demonstrates the development of an automated approach to identify thyroid nodules within ultrasound images using super-resolution single-image reconstruction and deep learning technology. The results indicate that the proposed method is superior to the latest and best techniques in terms of accuracy and quality. This research contributes to the advancement of medical imaging and holds the potential to improve the diagnosis and treatment of thyroid nodules.</p>.
PMID:38766836 | DOI:10.2174/0115734056269264240408080443
Utilizing Immunoglobulin IgG4 Immunohistochemistry for Risk Stratification in Patients with Papillary Thyroid Carcinoma Associated with Hashimoto Thyroiditis
Endocrinol Metab (Seoul). 2024 May 20. doi: 10.3803/EnM.2024.1923. Online ahead of print.
ABSTRACT
BACKGROUND: Hashimoto thyroiditis (HT) is suspected to correlate with papillary thyroid carcinoma (PTC) development. While some HT cases exhibit histologic features of immunoglobulin G4 (IgG4)-related disease, the relationship of HT with PTC progression remains unestablished.
METHODS: This cross-sectional study included 426 adult patients with PTC (≥1 cm) undergoing thyroidectomy at an academic thyroid center. HT was identified based on its typical histologic features. IgG4 and IgG immunohistochemistry were performed. Wholeslide images of immunostained slides were digitalized. Positive plasma cells per 2 mm2 were counted using QuPath and a pre-trained deep learning model. The primary outcome was tumor structural recurrence post-surgery.
RESULTS: Among the 426 PTC patients, 79 were diagnosed with HT. With a 40% IgG4 positive/IgG plasma cell ratio as the threshold for diagnosing IgG4-related disease, a cutoff value of >150 IgG4 positive plasma cells per 2 mm2 was established. According to this criterion, 53% (43/79) of HT patients were classified as IgG4-related. The IgG4-related HT subgroup presented a more advanced cancer stage than the IgG4-non-related HT group (P=0.038). The median observation period was 109 months (range, 6 to 142). Initial assessment revealed 43 recurrence cases. Recurrence-free survival periods showed significant (P=0.023) differences, with patients with IgG4 non-related HT showing the longest period, followed by patients without HT and those with IgG4-related HT.
CONCLUSION: This study effectively stratified recurrence risk in PTC patients based on HT status and IgG4-related subtypes. These findings may contribute to better-informed treatment decisions and patient care strategies.
PMID:38766717 | DOI:10.3803/EnM.2024.1923
Automatic assessment of bowel preparation by an artificial intelligence model and its clinical applicability
J Gastroenterol Hepatol. 2024 May 20. doi: 10.1111/jgh.16618. Online ahead of print.
ABSTRACT
BACKGROUND AND AIM: Reliable bowel preparation assessment is important in colonoscopy. However, current scoring systems are limited by laborious and time-consuming tasks and interobserver variability. We aimed to develop an artificial intelligence (AI) model to assess bowel cleanliness and evaluate its clinical applicability.
METHODS: A still image-driven AI model to assess the Boston Bowel Preparation Scale (BBPS) was developed and validated using 2361 colonoscopy images. For evaluating real-world applicability, the model was validated using 113 10-s colonoscopy video clips and 30 full colonoscopy videos to identify "adequate (BBPS 2-3)" or "inadequate (BBPS 0-1)" preparation. The model was tested with an external dataset of 29 colonoscopy videos. The clinical applicability of the model was evaluated using 225 consecutive colonoscopies. Inter-rater variability was analyzed between the AI model and endoscopists.
RESULTS: The AI model achieved an accuracy of 94.0% and an area under the receiver operating characteristic curve of 0.939 with the still images. Model testing with an external dataset showed an accuracy of 95.3%, an area under the receiver operating characteristic curve of 0.976, and a sensitivity of 100% for the detection of inadequate preparations. The clinical applicability study showed an overall agreement rate of 85.3% between endoscopists and the AI model, with Fleiss' kappa of 0.686. The agreement rate was lower for the right colon compared with the transverse and left colon, with Fleiss' kappa of 0.563, 0.575, and 0.789, respectively.
CONCLUSIONS: The AI model demonstrated accurate bowel preparation assessment and substantial agreement with endoscopists. Further refinement of the AI model is warranted for effective monitoring of qualified colonoscopy in large-scale screening programs.
PMID:38766682 | DOI:10.1111/jgh.16618