Deep learning
A hierarchical deep learning approach for diagnosing impacted canine-induced root resorption via cone-beam computed tomography
BMC Oral Health. 2024 Aug 23;24(1):982. doi: 10.1186/s12903-024-04718-4.
ABSTRACT
OBJECTIVES: Canine-induced root resorption (CIRR) is caused by impacted canines and CBCT images have shown to be more accurate in diagnosing CIRR than panoramic and periapical radiographs with the reported AUCs being 0.95, 0.49, and 0.57, respectively. The aim of this study was to use deep learning to automatically evaluate the diagnosis of CIRR in maxillary incisors using CBCT images.
METHODS: A total of 50 cone beam computed tomography (CBCT) images and 176 incisors were selected for the present study. The maxillary incisors were manually segmented and labeled from the CBCT images by two independent radiologists as either healthy or affected by root resorption induced by the impacted canines. We used five different strategies for training the model: (A) classification using 3D ResNet50 (Baseline), (B) classification of the segmented masks using the outcome of a 3D U-Net pretrained on the 3D MNIST, (C) training a 3D U-Net for the segmentation task and use its outputs for classification, (D) pretraining a 3D U-Net for the segmentation and transfer of the model, and (E) pretraining a 3D U-Net for the segmentation and fine-tuning the model with only the model encoder. The segmentation models were evaluated using the mean intersection over union (mIoU) and Dice coefficient (DSC). The classification models were evaluated in terms of classification accuracy, precision, recall, and F1 score.
RESULTS: The segmentation model achieved a mean intersection over union (mIoU) of 0.641 and a DSC of 0.901, indicating good performance in segmenting the tooth structures from the CBCT images. For the main classification task of detecting CIRR, Model C (classification of the segmented masks using 3D ResNet) and Model E (pretraining on segmentation followed by fine-tuning for classification) performed the best, both achieving 82% classification accuracy and 0.62 F1-scores on the test set. These results demonstrate the effectiveness of the proposed hierarchical, data-efficient deep learning approaches in improving the accuracy of automated CIRR diagnosis from limited CBCT data compared to the 3D ResNet baseline model.
CONCLUSION: The proposed approaches are effective at improving the accuracy of classification tasks and are helpful when the diagnosis is based on the volume and boundaries of an object. While the study demonstrated promising results, future studies with larger sample size are required to validate the effectiveness of the proposed method in enhancing the medical image classification tasks.
PMID:39180070 | DOI:10.1186/s12903-024-04718-4
MSH-DTI: multi-graph convolution with self-supervised embedding and heterogeneous aggregation for drug-target interaction prediction
BMC Bioinformatics. 2024 Aug 23;25(1):275. doi: 10.1186/s12859-024-05904-5.
ABSTRACT
BACKGROUND: The rise of network pharmacology has led to the widespread use of network-based computational methods in predicting drug target interaction (DTI). However, existing DTI prediction models typically rely on a limited amount of data to extract drug and target features, potentially affecting the comprehensiveness and robustness of features. In addition, although multiple networks are used for DTI prediction, the integration of heterogeneous information often involves simplistic aggregation and attention mechanisms, which may impose certain limitations.
RESULTS: MSH-DTI, a deep learning model for predicting drug-target interactions, is proposed in this paper. The model uses self-supervised learning methods to obtain drug and target structure features. A Heterogeneous Interaction-enhanced Feature Fusion Module is designed for multi-graph construction, and the graph convolutional networks are used to extract node features. With the help of an attention mechanism, the model focuses on the important parts of different features for prediction. Experimental results show that the AUROC and AUPR of MSH-DTI are 0.9620 and 0.9605 respectively, outperforming other models on the DTINet dataset.
CONCLUSION: The proposed MSH-DTI is a helpful tool to discover drug-target interactions, which is also validated through case studies in predicting new DTIs.
PMID:39179993 | DOI:10.1186/s12859-024-05904-5
Leaf rolling detection in maize under complex environments using an improved deep learning method
Plant Mol Biol. 2024 Aug 23;114(5):92. doi: 10.1007/s11103-024-01491-4.
ABSTRACT
Leaf rolling is a common adaptive response that plants have evolved to counteract the detrimental effects of various environmental stresses. Gaining insight into the mechanisms underlying leaf rolling alterations presents researchers with a unique opportunity to enhance stress tolerance in crops exhibiting leaf rolling, such as maize. In order to achieve a more profound understanding of leaf rolling, it is imperative to ascertain the occurrence and extent of this phenotype. While traditional manual leaf rolling detection is slow and laborious, research into high-throughput methods for detecting leaf rolling within our investigation scope remains limited. In this study, we present an approach for detecting leaf rolling in maize using the YOLOv8 model. Our method, LRD-YOLO, integrates two significant improvements: a Convolutional Block Attention Module to augment feature extraction capabilities, and a Deformable ConvNets v2 to enhance adaptability to changes in target shape and scale. Through experiments on a dataset encompassing severe occlusion, variations in leaf scale and shape, and complex background scenarios, our approach achieves an impressive mean average precision of 81.6%, surpassing current state-of-the-art methods. Furthermore, the LRD-YOLO model demands only 8.0 G floating point operations and the parameters of 3.48 M. We have proposed an innovative method for leaf rolling detection in maize, and experimental outcomes showcase the efficacy of LRD-YOLO in precisely detecting leaf rolling in complex scenarios while maintaining real-time inference speed.
PMID:39179745 | DOI:10.1007/s11103-024-01491-4
A novel image semantic communication method via dynamic decision generation network and generative adversarial network
Sci Rep. 2024 Aug 23;14(1):19636. doi: 10.1038/s41598-024-70619-9.
ABSTRACT
Effectively compressing transmitted images and reducing the distortion of reconstructed images are challenges in image semantic communication. This paper proposes a novel image semantic communication model that integrates a dynamic decision generation network and a generative adversarial network to address these challenges as efficiently as possible. At the transmitter, features are extracted and selected based on the channel's signal-to-noise ratio (SNR) using semantic encoding and a dynamic decision generation network. This semantic approach can effectively compress transmitted images, thereby reducing communication traffic. At the receiver, the generator/decoder collaborates with the discriminator network, enhancing image reconstruction quality through adversarial and perceptual losses. The experimental results on the CIFAR-10 dataset demonstrate that our scheme achieves a peak SNR of 26 dB, a structural similarity of 0.9, and a compression ratio (CR) of 81.5% in an AWGN channel with an SNR of 3 dB. Similarly, in the Rayleigh fading channel, the peak SNR is 23 dB, structural similarity is 0.8, and the CR is 80.5%. The learned perceptual image patch similarity in both channels is below 0.008. These experiments thoroughly demonstrate that the proposed semantic communication is a superior deep learning-based joint source-channel coding method, offering a high CR and low distortion of reconstructed images.
PMID:39179724 | DOI:10.1038/s41598-024-70619-9
Accuracy of vestibular schwannoma segmentation using deep learning models - a systematic review & meta-analysis
Neuroradiology. 2024 Aug 24. doi: 10.1007/s00234-024-03449-1. Online ahead of print.
ABSTRACT
Vestibular Schwannoma (VS) is a rare tumor with varied incidence rates, predominantly affecting the 60-69 age group. In the era of artificial intelligence (AI), deep learning (DL) algorithms show promise in automating diagnosis. However, a knowledge gap exists in the automated segmentation of VS using DL. To address this gap, this meta-analysis aims to provide insights into the current state of DL algorithms applied to MR images of VS.
METHODOLOGY: Following 2020 PRISMA guidelines, a search across four databases was conducted. Inclusion criteria focused on articles using DL for VS MR image segmentation. The primary metric was the Dice score, supplemented by relative volume error (RVE) and average symmetric surface distance (ASSD).
RESULTS: The search process identified 752 articles, leading to 11 studies for meta-analysis. A QUADAS- 2 analysis revealed varying biases. The overall Dice score for 56 models was 0.89 (CI: 0.88-0.90), with high heterogeneity (I2 = 95.9%). Subgroup analyses based on DL architecture, MRI inputs, and testing set sizes revealed performance variations. 2.5D DL networks demonstrated comparable efficacy to 3D networks. Imaging input analyses highlighted the superiority of contrast-enhanced T1-weighted imaging and mixed MRI inputs.
DISCUSSION: This study fills a gap in systematic review in the automated segmentation of VS using DL techniques. Despite promising results, limitations include publication bias and high heterogeneity. Future research should focus on standardized designs, larger testing sets, and addressing biases for more reliable results. DL have promising efficacy in VS diagnosis, however further validation and standardization is needed.
CONCLUSION: In conclusion, this meta-analysis provides comprehensive review into the current landscape of automated VS segmentation using DL. The high Dice score indicates promising agreement in segmentation, yet challenges like bias and heterogeneity must be addressed in the future research.
PMID:39179652 | DOI:10.1007/s00234-024-03449-1
Evaluation of perceived urgency from single-trial EEG data elicited by upper-body vibration feedback using deep learning
Sci Rep. 2024 Aug 23;14(1):19604. doi: 10.1038/s41598-024-70508-1.
ABSTRACT
Notification systems that convey urgency without adding cognitive burden are crucial in human-computer interaction. Haptic feedback systems, particularly those utilizing vibration feedback, have emerged as a compelling solution, capable of providing desirable levels of urgency depending on the application. High-risk applications require an evaluation of the urgency level elicited during critical notifications. Traditional evaluations of perceived urgency rely on subjective self-reporting and performance metrics, which, while useful, are not real-time and can be distracting from the task at hand. In contrast, EEG technology offers a direct, non-intrusive method of assessing the user's cognitive state. Leveraging deep learning, this study introduces a novel approach to evaluate perceived urgency from single-trial EEG data, induced by vibration stimuli on the upper body, utilizing our newly collected urgency-via-vibration dataset. The proposed model combines a 2D convolutional neural network with a temporal convolutional network to capture spatial and temporal EEG features, outperforming several established EEG models. The proposed model achieves an average classification accuracy of 83% through leave-one-subject-out cross-validation across three urgency classes (not urgent, urgent, and very urgent) from a single trial of EEG data. Furthermore, explainability analysis showed that the prefrontal brain region, followed by the central brain region, are the most influential in predicting the urgency level. A follow-up neural statistical analysis revealed an increase in event-related synchronization (ERS) in the theta frequency band (4-7 Hz) with the increased level of urgency, which is associated with high arousal and attention in the neuroscience literature. A limitation of this study is that the proposed model's performance was tested only the urgency-via-vibration dataset, which may affect the generalizability of the findings.
PMID:39179642 | DOI:10.1038/s41598-024-70508-1
The use of artificial neural networks in studying the progression of glaucoma
Sci Rep. 2024 Aug 23;14(1):19597. doi: 10.1038/s41598-024-70748-1.
ABSTRACT
In ophthalmology, artificial intelligence methods show great promise due to their potential to enhance clinical observations with predictive capabilities and support physicians in diagnosing and treating patients. This paper focuses on modelling glaucoma evolution because it requires early diagnosis, individualized treatment, and lifelong monitoring. Glaucoma is a chronic, progressive, irreversible, multifactorial optic neuropathy that primarily affects elderly individuals. It is important to emphasize that the processed data are taken from medical records, unlike other studies in the literature that rely on image acquisition and processing. Although more challenging to handle, this approach has the advantage of including a wide range of parameters in large numbers, which can highlight their potential influence. Artificial neural networks are used to study glaucoma progression, designed through successive trials for near-optimal configurations using the NeuroSolutions and PyTorch frameworks. Furthermore, different problems are formulated to demonstrate the influence of various structural and functional parameters on the study of glaucoma progression. Optimal neural networks were obtained using a program written in Python using the PyTorch deep learning framework. For various tasks, very small errors in training and validation, under 5%, were obtained. It has been demonstrated that very good results can be achieved, making them credible and useful for medical practice.
PMID:39179625 | DOI:10.1038/s41598-024-70748-1
Sexually dimorphic computational histopathological signatures prognostic of overall survival in high-grade gliomas via deep learning
Sci Adv. 2024 Aug 23;10(34):eadi0302. doi: 10.1126/sciadv.adi0302. Epub 2024 Aug 23.
ABSTRACT
High-grade glioma (HGG) is an aggressive brain tumor. Sex is an important factor that differentially affects survival outcomes in HGG. We used an end-to-end deep learning approach on hematoxylin and eosin (H&E) scans to (i) identify sex-specific histopathological attributes of the tumor microenvironment (TME), and (ii) create sex-specific risk profiles to prognosticate overall survival. Surgically resected H&E-stained tissue slides were analyzed in a two-stage approach using ResNet18 deep learning models, first, to segment the viable tumor regions and second, to build sex-specific prognostic models for prediction of overall survival. Our mResNet-Cox model yielded C-index (0.696, 0.736, 0.731, and 0.729) for the female cohort and C-index (0.729, 0.738, 0.724, and 0.696) for the male cohort across training and three independent validation cohorts, respectively. End-to-end deep learning approaches using routine H&E-stained slides, trained separately on male and female patients with HGG, may allow for identifying sex-specific histopathological attributes of the TME associated with survival and, ultimately, build patient-centric prognostic risk assessment models.
PMID:39178259 | DOI:10.1126/sciadv.adi0302
aiSEGcell: User-friendly deep learning-based segmentation of nuclei in transmitted light images
PLoS Comput Biol. 2024 Aug 23;20(8):e1012361. doi: 10.1371/journal.pcbi.1012361. eCollection 2024 Aug.
ABSTRACT
Segmentation is required to quantify cellular structures in microscopic images. This typically requires their fluorescent labeling. Convolutional neural networks (CNNs) can detect these structures also in only transmitted light images. This eliminates the need for transgenic or dye fluorescent labeling, frees up imaging channels, reduces phototoxicity and speeds up imaging. However, this approach currently requires optimized experimental conditions and computational specialists. Here, we introduce "aiSEGcell" a user-friendly CNN-based software to segment nuclei and cells in bright field images. We extensively evaluated it for nucleus segmentation in different primary cell types in 2D cultures from different imaging modalities in hand-curated published and novel imaging data sets. We provide this curated ground-truth data with 1.1 million nuclei in 20,000 images. aiSEGcell accurately segments nuclei from even challenging bright field images, very similar to manual segmentation. It retains biologically relevant information, e.g. for demanding quantification of noisy biosensors reporting signaling pathway activity dynamics. aiSEGcell is readily adaptable to new use cases with only 32 images required for retraining. aiSEGcell is accessible through both a command line, and a napari graphical user interface. It is agnostic to computational environments and does not require user expert coding experience.
PMID:39178193 | DOI:10.1371/journal.pcbi.1012361
VT-3DCapsNet: Visual tempos 3D-Capsule network for video-based facial expression recognition
PLoS One. 2024 Aug 23;19(8):e0307446. doi: 10.1371/journal.pone.0307446. eCollection 2024.
ABSTRACT
Facial expression recognition(FER) is a hot topic in computer vision, especially as deep learning based methods are gaining traction in this field. However, traditional convolutional neural networks (CNN) ignore the relative position relationship of key facial features (mouth, eyebrows, eyes, etc.) due to changes of facial expressions in real-world environments such as rotation, displacement or partial occlusion. In addition, most of the works in the literature do not take visual tempos into account when recognizing facial expressions that possess higher similarities. To address these issues, we propose a visual tempos 3D-CapsNet framework(VT-3DCapsNet). First, we propose 3D-CapsNet model for emotion recognition, in which we introduced improved 3D-ResNet architecture that integrated with AU-perceived attention module to enhance the ability of feature representation of capsule network, through expressing deeper hierarchical spatiotemporal features and extracting latent information (position, size, orientation) in key facial areas. Furthermore, we propose the temporal pyramid network(TPN)-based expression recognition module(TPN-ERM), which can learn high-level facial motion features from video frames to model differences in visual tempos, further improving the recognition accuracy of 3D-CapsNet. Extensive experiments are conducted on extended Kohn-Kanada (CK+) database and Acted Facial Expression in Wild (AFEW) database. The results demonstrate competitive performance of our approach compared with other state-of-the-art methods.
PMID:39178187 | DOI:10.1371/journal.pone.0307446
Classification of Multi-Parametric Body MRI Series Using Deep Learning
IEEE J Biomed Health Inform. 2024 Aug 23;PP. doi: 10.1109/JBHI.2024.3448373. Online ahead of print.
ABSTRACT
Multi-parametric magnetic resonance imaging (mpMRI) exams have various series types acquired with different imaging protocols. The DICOM headers of these series often have incorrect information due to the sheer diversity of protocols and occasional technologist errors. To address this, we present a deep learning-based classification model to classify 8 different body mpMRI series types so that radiologists read the exams efficiently. Using mpMRI data from various institutions, multiple deep learning-based classifiers of ResNet, EfficientNet, and DenseNet are trained to classify 8 different MRI series, and their performance is compared. Then, the best-performing classifier is identified, and its classification capability under the setting of different training data quantities is studied. Also, the model is evaluated on the out-of-training-distribution datasets. Moreover, the model is trained using mpMRI exams obtained from different scanners in two training strategies, and its performance is tested. Experimental results show that the DenseNet-121 model achieves the highest F1-score and accuracy of 0.966 and 0.972 over the other classification models with p-value 0.05. The model shows greater than 0.95 accuracy when trained with over 729 studies of the training data, whose performance improves as the training data quantities grew larger. On the external data with the DLDS and CPTAC-UCEC datasets, the model yields 0.872 and 0.810 accuracy for each. These results indicate that in both the internal and external datasets, the DenseNet-121 model attains high accuracy for the task of classifying 8 body MRI series types.
PMID:39178097 | DOI:10.1109/JBHI.2024.3448373
Advancing Bioactivity Prediction through Molecular Docking and Self-Attention
IEEE J Biomed Health Inform. 2024 Aug 23;PP. doi: 10.1109/JBHI.2024.3448455. Online ahead of print.
ABSTRACT
Bioactivity refers to the ability of a substance to induce biological effects within living systems, often describing the influence of molecules, drugs, or chemicals on organisms. In drug discovery, predicting bioactivity streamlines early-stage candidate screening by swiftly identifying potential active molecules. The popular deep learning methods in bioactivity prediction primarily model the ligand structure-bioactivity relationship under the premise of Quantitative Structure-Activity Relationship (QSAR). However, bioactivity is determined by multiple factors, including not only the ligand structure but also drug-target interactions, signaling pathways, reaction environments, pharmacokinetic properties, and species differences. Our study first integrates drug-target interactions into bioactivity prediction using protein-ligand complex data from molecular docking. We devise a Drug-Target Interaction Graph Neural Network (DTIGN), infusing interatomic forces into intermolecular graphs. DTIGN employs multi-head self-attention to identify native-like binding pockets and poses within molecular docking results. To validate the fidelity of the self-attention mechanism, we gather ground truth data from crystal structure databases. Subsequently, we employ these limited native structures to refine bioactivity prediction via semi-supervised learning. For this study, we establish a unique benchmark dataset for evaluating bioactivity prediction models in the context of protein-ligand complexes, showcasing the superior performance of our method (with an average improvement of 27.03%) through comparison with 9 leading deep learning-based bioactivity prediction methods.
PMID:39178096 | DOI:10.1109/JBHI.2024.3448455
Low-light phase retrieval with implicit generative priors
IEEE Trans Image Process. 2024 Aug 23;PP. doi: 10.1109/TIP.2024.3445739. Online ahead of print.
ABSTRACT
Phase retrieval (PR) is fundamentally important in scientific imaging and is crucial for nanoscale techniques like coherent diffractive imaging (CDI). Low radiation dose imaging is essential for applications involving radiation-sensitive samples. However, most PR methods struggle in low-dose scenarios due to high shot noise. Recent advancements in optical data acquisition setups, such as in-situ CDI, have shown promise for low-dose imaging, but they rely on a time series of measurements, making them unsuitable for single-image applications. Similarly, data-driven phase retrieval techniques are not easily adaptable to data-scarce situations. Zero-shot deep learning methods based on pre-trained and implicit generative priors have been effective in various imaging tasks but have shown limited success in PR. In this work, we propose low-dose deep image prior (LoDIP), which combines in-situ CDI with the power of implicit generative priors to address single-image low-dose phase retrieval. Quantitative evaluations demonstrate LoDIP's superior performance in this task and its applicability to real experimental scenarios.
PMID:39178091 | DOI:10.1109/TIP.2024.3445739
A Semantic-Consistent Few-Shot Modulation Recognition Framework for IoT Applications
IEEE Trans Neural Netw Learn Syst. 2024 Aug 23;PP. doi: 10.1109/TNNLS.2024.3441597. Online ahead of print.
ABSTRACT
The rapid growth of the Internet of Things (IoT) has led to the widespread adoption of the IoT networks in numerous digital applications. To counter physical threats in these systems, automatic modulation classification (AMC) has emerged as an effective approach for identifying the modulation format of signals in noisy environments. However, identifying those threats can be particularly challenging due to the scarcity of labeled data, which is a common issue in various IoT applications, such as anomaly detection for unmanned aerial vehicles (UAVs) and intrusion detection in the IoT networks. Few-shot learning (FSL) offers a promising solution by enabling models to grasp the concepts of new classes using only a limited number of labeled samples. However, prevalent FSL techniques are primarily tailored for tasks in the computer vision domain and are not suitable for the wireless signal domain. Instead of designing a new FSL model, this work suggests a novel approach that enhances wireless signals to be more efficiently processed by the existing state-of-the-art (SOTA) FSL models. We present the semantic-consistent signal pretransformation (ScSP), a parameterized transformation architecture that ensures signals with identical semantics exhibit similar representations. ScSP is designed to integrate seamlessly with various SOTA FSL models for signal modulation recognition and supports commonly used deep learning backbones. Our evaluation indicates that ScSP boosts the performance of numerous SOTA FSL models, while preserving flexibility.
PMID:39178083 | DOI:10.1109/TNNLS.2024.3441597
Public Behavior and Emotion Correlation Mining Driven by Aspect From News Corpus
IEEE Trans Neural Netw Learn Syst. 2024 Aug 23;PP. doi: 10.1109/TNNLS.2024.3441011. Online ahead of print.
ABSTRACT
Emotion motivates behavior. Investigating the correlation between behavior and emotion, an often overlooked perspective, plays a significant role in uncovering the underlying motives behind behaviors and the intrinsic cause-effects of social events. This article proposes a methodology for mining the correlation between public behavior and emotion using daily news data. Initially, aspect-emotion-reaction (A-E-R) triplets are extracted and generalized, encompassing both explicit and implicit patterns. Then, a knowledge representation model based on hypothetical context (KRHC) with a self-reflection mechanism is proposed to uncover implicit relationships between emotion and behavior through attention mechanisms. By combining rule-based methods for explicit relationships and deep learning for implicit ones, an understanding of emotion-behavior patterns is achieved. In this study, the behaviors are divided into three categories of prosocial, antisocial, and normal behaviors with ten secondary types. Seven categories of emotions are adopted. The proposed deep learning model KRHC is validated on A-E-R datasets and public KINSHIP datasets. The experiment results are concluded; for example, when "fear", "sad", and "surprise" emotions appear, it drives behavior "panic" with most probability. These findings could provide insights for both human-computer interaction and public safety management applications.
PMID:39178079 | DOI:10.1109/TNNLS.2024.3441011
Frequency-Aware Divide-and-Conquer for Efficient Real Noise Removal
IEEE Trans Neural Netw Learn Syst. 2024 Aug 23;PP. doi: 10.1109/TNNLS.2024.3439591. Online ahead of print.
ABSTRACT
Deep-learning-based approaches have achieved remarkable progress for complex real scenario denoising, yet their accuracy-efficiency tradeoff is still understudied, particularly critical for mobile devices. As real noise is unevenly distributed relative to underlay signals in different frequency bands, we introduce a frequency-aware divide-and-conquer strategy to develop a frequency-aware denoising network (FADN). FADN is materialized by stacking frequency-aware denoising blocks (FADBs), in which a denoised image is progressively predicted by a series of frequency-aware noise dividing and conquering operations. For noise dividing, FADBs decompose the noisy and clean image pairs into low-and high-frequency representations via a wavelet transform (WT) followed by an invertible network and recover the final denoised image by integrating the denoised information from different frequency bands. For noise conquering, the separated low-frequency representation of the noisy image is kept as clean as possible by the supervision of the clean counterpart, while the high-frequency representation combining the estimated residual from the successive FADB is purified under the corresponding accompanied supervision for residual compensation. Since our FADN progressively and pertinently denoises from frequency bands, the accuracy-efficiency tradeoff can be controlled as a requirement by the number of FADBs. Experimental results on the SIDD, DND, and NAM datasets show that our FADN outperforms the state-of-the-art methods by improving the peak signal-to-noise ratio (PSNR) and decreasing the model parameters. The code is released at https://github.com/NekoDaiSiki/FADN.
PMID:39178076 | DOI:10.1109/TNNLS.2024.3439591
Automated Interpretation of Lung Sounds by Deep Learning in Children With Asthma: Scoping Review and Strengths, Weaknesses, Opportunities, and Threats Analysis
J Med Internet Res. 2024 Aug 23;26:e53662. doi: 10.2196/53662.
ABSTRACT
BACKGROUND: The interpretation of lung sounds plays a crucial role in the appropriate diagnosis and management of pediatric asthma. Applying artificial intelligence (AI) to this task has the potential to better standardize assessment and may even improve its predictive potential.
OBJECTIVE: This study aims to objectively review the literature on AI-assisted lung auscultation for pediatric asthma and provide a balanced assessment of its strengths, weaknesses, opportunities, and threats.
METHODS: A scoping review on AI-assisted lung sound analysis in children with asthma was conducted across 4 major scientific databases (PubMed, MEDLINE Ovid, Embase, and Web of Science), supplemented by a gray literature search on Google Scholar, to identify relevant studies published from January 1, 2000, until May 23, 2023. The search strategy incorporated a combination of keywords related to AI, pulmonary auscultation, children, and asthma. The quality of eligible studies was assessed using the ChAMAI (Checklist for the Assessment of Medical Artificial Intelligence).
RESULTS: The search identified 7 relevant studies out of 82 (9%) to be included through an academic literature search, while 11 of 250 (4.4%) studies from the gray literature search were considered but not included in the subsequent review and quality assessment. All had poor to medium ChAMAI scores, mostly due to the absence of external validation. Identified strengths were improved predictive accuracy of AI to allow for prompt and early diagnosis, personalized management strategies, and remote monitoring capabilities. Weaknesses were the heterogeneity between studies and the lack of standardization in data collection and interpretation. Opportunities were the potential of coordinated surveillance, growing data sets, and new ways of collaboratively learning from distributed data. Threats were both generic for the field of medical AI (loss of interpretability) but also specific to the use case, as clinicians might lose the skill of auscultation.
CONCLUSIONS: To achieve the opportunities of automated lung auscultation, there is a need to address weaknesses and threats with large-scale coordinated data collection in globally representative populations and leveraging new approaches to collaborative learning.
PMID:39178033 | DOI:10.2196/53662
Longitudinal Changes in Choroidal Vascularity in Myopic and Non-Myopic Children
Transl Vis Sci Technol. 2024 Aug 1;13(8):38. doi: 10.1167/tvst.13.8.38.
ABSTRACT
PURPOSE: The purpose of this study was to evaluate longitudinal changes in choroidal vascular characteristics in childhood, and their relationship with eye growth and refractive error.
METHODS: Analysis of high-resolution optical coherence tomography (OCT) scans, collected over an 18-month period as part of the Role of Outdoor Activity in Myopia (ROAM) study, was conducted in 101 children (41 myopic, 60 non-myopic, age 10-15 years). OCT images were automatically analyzed and binarized using a deep learning software tool. The output was then used to compute changes in macular choroidal vascularity index (CVI), choroidal luminal, and stromal thickness over 18-months. Associations of these variables with refractive error and axial length were analyzed.
RESULTS: CVI decreased significantly, whereas luminal and stromal thickness increased significantly over 18 months (all P < 0.001). The magnitude of change was approximately double in stromal tissue compared to luminal tissue (luminal β = 2.6 µm/year; 95% confidence interval [CI] = -1.0 to 4.1 µm/year; stromal β = 5.2 µm/year; 95% CI = 4.0, 6.5 µm/year). A significant interaction between baseline axial length and change in CVI over time (P = 0.047) was observed, with a greater CVI reduction in those with shorter axial lengths. Significant associations were observed between the change in CVI, luminal thickness, stromal thickness, and change in axial length over time (all P < 0.05).
CONCLUSIONS: Faster axial eye growth was associated with smaller reductions in CVI, and less increase in choroidal luminal and stromal thickness. The changes in choroidal vascularity, particularly in the stromal component, may thus be a marker for eye growth.
TRANSLATIONAL RELEVANCE: This knowledge of the longitudinal changes in choroidal vascularity in childhood and their relationship with eye growth may assist clinicians in the future to better predict eye growth and myopia progression in childhood.
PMID:39177994 | DOI:10.1167/tvst.13.8.38
Automatic Determination of Endothelial Cell Density From Donor Cornea Endothelial Cell Images
Transl Vis Sci Technol. 2024 Aug 1;13(8):40. doi: 10.1167/tvst.13.8.40.
ABSTRACT
PURPOSE: To determine endothelial cell density (ECD) from real-world donor cornea endothelial cell (EC) images using a self-supervised deep learning segmentation model.
METHODS: Two eye banks (Eversight, VisionGift) provided 15,138 single, unique EC images from 8169 donors along with their demographics, tissue characteristics, and ECD. This dataset was utilized for self-supervised training and deep learning inference. The Cornea Image Analysis Reading Center (CIARC) provided a second dataset of 174 donor EC images based on image and tissue quality. These images were used to train a supervised deep learning cell border segmentation model. Evaluation between manual and automated determination of ECD was restricted to the 1939 test EC images with at least 100 cells counted by both methods.
RESULTS: The ECD measurements from both methods were in excellent agreement with rc of 0.77 (95% confidence interval [CI], 0.75-0.79; P < 0.001) and bias of 123 cells/mm2 (95% CI, 114-131; P < 0.001); 81% of the automated ECD values were within 10% of the manual ECD values. When the analysis was further restricted to the cropped image, the rc was 0.88 (95% CI, 0.87-0.89; P < 0.001), bias was 46 cells/mm2 (95% CI, 39-53; P < 0.001), and 93% of the automated ECD values were within 10% of the manual ECD values.
CONCLUSIONS: Deep learning analysis provides accurate ECDs of donor images, potentially reducing analysis time and training requirements.
TRANSLATIONAL RELEVANCE: The approach of this study, a robust methodology for automatically evaluating donor cornea EC images, could expand the quantitative determination of endothelial health beyond ECD.
PMID:39177992 | DOI:10.1167/tvst.13.8.40
mm3DSNet: multi-scale and multi-feedforward self-attention 3D segmentation network for CT scans of hepatobiliary ducts
Med Biol Eng Comput. 2024 Aug 23. doi: 10.1007/s11517-024-03183-z. Online ahead of print.
ABSTRACT
Image segmentation is a key step of the 3D reconstruction of the hepatobiliary duct tree, which is significant for preoperative planning. In this paper, a novel 3D U-Net variant is designed for CT image segmentation of hepatobiliary ducts from the abdominal CT scans, which is composed of a 3D encoder-decoder and a 3D multi-feedforward self-attention module (MFSAM). To well sufficient semantic and spatial features with high inference speed, the 3D ConvNeXt block is designed as the 3D extension of the 2D ConvNeXt. To improve the ability of semantic feature extraction, the MFSAM is designed to transfer the semantic and spatial features at different scales from the encoder to the decoder. Also, to balance the losses for the voxels and the edges of the hepatobiliary ducts, a boundary-aware overlap cross-entropy loss is proposed by combining the cross-entropy loss, the Dice loss, and the boundary loss. Experimental results indicate that the proposed method is superior to some existing deep networks as well as the radiologist without rich experience in terms of CT segmentation of hepatobiliary ducts, with a segmentation performance of 76.54% Dice and 6.56 HD.
PMID:39177918 | DOI:10.1007/s11517-024-03183-z