Deep learning
Predicting osteoporosis from kidney-ureter-bladder radiographs utilizing deep convolutional neural networks
Bone. 2024 Apr 25:117107. doi: 10.1016/j.bone.2024.117107. Online ahead of print.
ABSTRACT
Osteoporosis is a common condition that can lead to fractures, mobility issues, and death. Although dual-energy X-ray absorptiometry (DXA) is the gold standard for osteoporosis, it is expensive and not widely available. In contrast, kidney-ureter-bladder (KUB) radiographs are inexpensive and frequently ordered in clinical practice. Thus, it is a potential screening tool for osteoporosis. In this study, we explored the possibility of predicting the bone mineral density (BMD) and classifying high-risk patient groups using KUB radiographs. We proposed DeepDXA-KUB, a deep learning model that predicts the BMD values of the left hip and lumbar vertebrae from an input KUB image. The datasets were obtained from Taiwanese medical centers between 2006 and 2019, using 8913 pairs of KUB radiographs and DXA examinations performed within 6 months. The images were randomly divided into training and validation sets in a 4:1 ratio. To evaluate the model's performance, we computed a confusion matrix and evaluated the sensitivity, specificity, accuracy, precision, positive predictive value, negative predictive value, F1 score, and area under the receiver operating curve (AUROC). Moderate correlations were observed between the predicted and DXA-measured BMD values, with a correlation coefficient of 0.858 for the lumbar vertebrae and 0.87 for the left hip. The model demonstrated an osteoporosis detection accuracy, sensitivity, and specificity of 84.7 %, 81.6 %, and 86.6 % for the lumbar vertebrae and 84.2 %, 91.2 %, and 81 % for the left hip, respectively. The AUROC was 0.939 for the lumbar vertebrae and 0.947 for the left hip, indicating a satisfactory performance in osteoporosis screening. The present study is the first to develop a deep learning model based on KUB radiographs to predict lumbar spine and femoral BMD. Our model demonstrated a promising correlation between the predicted and DXA-measured BMD in both the lumbar vertebrae and hip, showing great potential for the opportunistic screening of osteoporosis.
PMID:38677502 | DOI:10.1016/j.bone.2024.117107
Deep Learning Model for Grading and Localization of Lumbar Disc Herniation on Magnetic Resonance Imaging
J Magn Reson Imaging. 2024 Apr 27. doi: 10.1002/jmri.29403. Online ahead of print.
ABSTRACT
BACKGROUND: Methods for grading and localization of lumbar disc herniation (LDH) on MRI are complex, time-consuming, and subjective. Utilizing deep learning (DL) models as assistance would mitigate such complexities.
PURPOSE: To develop an interpretable DL model capable of grading and localizing LDH.
STUDY TYPE: Retrospective.
SUBJECTS: 1496 patients (M/F: 783/713) were evaluated, and randomly divided into training (70%), validation (10%), and test (20%) sets.
FIELD STRENGTH/SEQUENCE: 1.5T MRI for axial T2-weighted sequences (spin echo).
ASSESSMENT: The training set was annotated by three spinal surgeons using the Michigan State University classification to train the DL model. The test set was annotated by a spinal surgery expert (as ground truth labels), and two spinal surgeons (comparison with the trained model). An external test set was employed to evaluate the generalizability of the DL model.
STATISTICAL TESTS: Calculated intersection over union (IoU) for detection consistency, utilized Gwet's AC1 to assess interobserver agreement, and evaluated model performance based on sensitivity and specificity, with statistical significance set at P < 0.05.
RESULTS: The DL model achieved high detection consistency in both the internal test dataset (grading: mean IoU 0.84, recall 99.6%; localization: IoU 0.82, recall 99.5%) and external test dataset (grading: 0.72, 98.0%; localization: 0.71, 97.6%). For internal testing, the DL model (grading: 0.81; localization: 0.76), Rater 1 (0.88; 0.82), and Rater 2 (0.86; 0.83) demonstrated results highly consistent with the ground truth labels. The overall sensitivity of the DL model was 87.0% for grading and 84.0% for localization, while the specificity was 95.5% and 94.4%. For external testing, the DL model showed an appreciable decrease in consistency (grading: 0.69; localization: 0.66), sensitivity (77.2%; 76.7%), and specificity (92.3%; 91.8%).
DATA CONCLUSION: The classification capabilities of the DL model closely resemble those of spinal surgeons. For future improvement, enriching the diversity of cases could enhance the model's generalization.
TECHNICAL EFFICACY: Stage 2.
PMID:38676436 | DOI:10.1002/jmri.29403
Prototype Learning for Medical Time Series Classification via Human-Machine Collaboration
Sensors (Basel). 2024 Apr 22;24(8):2655. doi: 10.3390/s24082655.
ABSTRACT
Deep neural networks must address the dual challenge of delivering high-accuracy predictions and providing user-friendly explanations. While deep models are widely used in the field of time series modeling, deciphering the core principles that govern the models' outputs remains a significant challenge. This is crucial for fostering the development of trusted models and facilitating domain expert validation, thereby empowering users and domain experts to utilize them confidently in high-risk decision-making contexts (e.g., decision-support systems in healthcare). In this work, we put forward a deep prototype learning model that supports interpretable and manipulable modeling and classification of medical time series (i.e., ECG signal). Specifically, we first optimize the representation of single heartbeat data by employing a bidirectional long short-term memory and attention mechanism, and then construct prototypes during the training phase. The final classification outcomes (i.e., normal sinus rhythm, atrial fibrillation, and other rhythm) are determined by comparing the input with the obtained prototypes. Moreover, the proposed model presents a human-machine collaboration mechanism, allowing domain experts to refine the prototypes by integrating their expertise to further enhance the model's performance (contrary to the human-in-the-loop paradigm, where humans primarily act as supervisors or correctors, intervening when required, our approach focuses on a human-machine collaboration, wherein both parties engage as partners, enabling more fluid and integrated interactions). The experimental outcomes presented herein delineate that, within the realm of binary classification tasks-specifically distinguishing between normal sinus rhythm and atrial fibrillation-our proposed model, albeit registering marginally lower performance in comparison to certain established baseline models such as Convolutional Neural Networks (CNNs) and bidirectional long short-term memory with attention mechanisms (Bi-LSTMAttns), evidently surpasses other contemporary state-of-the-art prototype baseline models. Moreover, it demonstrates significantly enhanced performance relative to these prototype baseline models in the context of triple classification tasks, which encompass normal sinus rhythm, atrial fibrillation, and other rhythm classifications. The proposed model manifests a commendable prediction accuracy of 0.8414, coupled with macro precision, recall, and F1-score metrics of 0.8449, 0.8224, and 0.8235, respectively, achieving both high classification accuracy as well as good interpretability.
PMID:38676273 | DOI:10.3390/s24082655
COVID-19 Hierarchical Classification Using a Deep Learning Multi-Modal
Sensors (Basel). 2024 Apr 20;24(8):2641. doi: 10.3390/s24082641.
ABSTRACT
Coronavirus disease 2019 (COVID-19), originating in China, has rapidly spread worldwide. Physicians must examine infected patients and make timely decisions to isolate them. However, completing these processes is difficult due to limited time and availability of expert radiologists, as well as limitations of the reverse-transcription polymerase chain reaction (RT-PCR) method. Deep learning, a sophisticated machine learning technique, leverages radiological imaging modalities for disease diagnosis and image classification tasks. Previous research on COVID-19 classification has encountered several limitations, including binary classification methods, single-feature modalities, small public datasets, and reliance on CT diagnostic processes. Additionally, studies have often utilized a flat structure, disregarding the hierarchical structure of pneumonia classification. This study aims to overcome these limitations by identifying pneumonia caused by COVID-19, distinguishing it from other types of pneumonia and healthy lungs using chest X-ray (CXR) images and related tabular medical data, and demonstrate the value of incorporating tabular medical data in achieving more accurate diagnoses. Resnet-based and VGG-based pre-trained convolutional neural network (CNN) models were employed to extract features, which were then combined using early fusion for the classification of eight distinct classes. We leveraged the hierarchal structure of pneumonia classification within our approach to achieve improved classification outcomes. Since an imbalanced dataset is common in this field, a variety of versions of generative adversarial networks (GANs) were used to generate synthetic data. The proposed approach tested in our private datasets of 4523 patients achieved a macro-avg F1-score of 95.9% and an F1-score of 87.5% for COVID-19 identification using a Resnet-based structure. In conclusion, in this study, we were able to create an accurate deep learning multi-modal to diagnose COVID-19 and differentiate it from other kinds of pneumonia and normal lungs, which will enhance the radiological diagnostic process.
PMID:38676257 | DOI:10.3390/s24082641
Emotion Classification Based on Pulsatile Images Extracted from Short Facial Videos via Deep Learning
Sensors (Basel). 2024 Apr 19;24(8):2620. doi: 10.3390/s24082620.
ABSTRACT
Most human emotion recognition methods largely depend on classifying stereotypical facial expressions that represent emotions. However, such facial expressions do not necessarily correspond to actual emotional states and may correspond to communicative intentions. In other cases, emotions are hidden, cannot be expressed, or may have lower arousal manifested by less pronounced facial expressions, as may occur during passive video viewing. This study improves an emotion classification approach developed in a previous study, which classifies emotions remotely without relying on stereotypical facial expressions or contact-based methods, using short facial video data. In this approach, we desire to remotely sense transdermal cardiovascular spatiotemporal facial patterns associated with different emotional states and analyze this data via machine learning. In this paper, we propose several improvements, which include a better remote heart rate estimation via a preliminary skin segmentation, improvement of the heartbeat peaks and troughs detection process, and obtaining a better emotion classification accuracy by employing an appropriate deep learning classifier using an RGB camera input only with data. We used the dataset obtained in the previous study, which contains facial videos of 110 participants who passively viewed 150 short videos that elicited the following five emotion types: amusement, disgust, fear, sexual arousal, and no emotion, while three cameras with different wavelength sensitivities (visible spectrum, near-infrared, and longwave infrared) recorded them simultaneously. From the short facial videos, we extracted unique high-resolution spatiotemporal, physiologically affected features and examined them as input features with different deep-learning approaches. An EfficientNet-B0 model type was able to classify participants' emotional states with an overall average accuracy of 47.36% using a single input spatiotemporal feature map obtained from a regular RGB camera.
PMID:38676235 | DOI:10.3390/s24082620
Soundscape Characterization Using Autoencoders and Unsupervised Learning
Sensors (Basel). 2024 Apr 18;24(8):2597. doi: 10.3390/s24082597.
ABSTRACT
Passive acoustic monitoring (PAM) through acoustic recorder units (ARUs) shows promise in detecting early landscape changes linked to functional and structural patterns, including species richness, acoustic diversity, community interactions, and human-induced threats. However, current approaches primarily rely on supervised methods, which require prior knowledge of collected datasets. This reliance poses challenges due to the large volumes of ARU data. In this work, we propose a non-supervised framework using autoencoders to extract soundscape features. We applied this framework to a dataset from Colombian landscapes captured by 31 audiomoth recorders. Our method generates clusters based on autoencoder features and represents cluster information with prototype spectrograms using centroid features and the decoder part of the neural network. Our analysis provides valuable insights into the distribution and temporal patterns of various sound compositions within the study area. By utilizing autoencoders, we identify significant soundscape patterns characterized by recurring and intense sound types across multiple frequency ranges. This comprehensive understanding of the study area's soundscape allows us to pinpoint crucial sound sources and gain deeper insights into its acoustic environment. Our results encourage further exploration of unsupervised algorithms in soundscape analysis as a promising alternative path for understanding and monitoring environmental changes.
PMID:38676214 | DOI:10.3390/s24082597
Learning-Based Hierarchical Decision-Making Framework for Automatic Driving in Incompletely Connected Traffic Scenarios
Sensors (Basel). 2024 Apr 18;24(8):2592. doi: 10.3390/s24082592.
ABSTRACT
The decision-making algorithm serves as a fundamental component for advancing the level of autonomous driving. The end-to-end decision-making algorithm has a strong ability to process the original data, but it has grave uncertainty. However, other learning-based decision-making algorithms rely heavily on ideal state information and are entirely unsuitable for autonomous driving tasks in real-world scenarios with incomplete global information. Addressing this research gap, this paper proposes a stable hierarchical decision-making framework with images as the input. The first step of the framework is a model-based data encoder that converts the input image data into a fixed universal data format. Next is a state machine based on a time series Graph Convolutional Network (GCN), which is used to classify the current driving state. Finally, according to the state's classification, the corresponding rule-based algorithm is selected for action generation. Through verification, the algorithm demonstrates the ability to perform autonomous driving tasks in different traffic scenarios without relying on global network information. Comparative experiments further confirm the effectiveness of the hierarchical framework, model-based image data encoder, and time series GCN.
PMID:38676210 | DOI:10.3390/s24082592
Real-Time 3D Tracking of Multi-Particle in the Wide-Field Illumination Based on Deep Learning
Sensors (Basel). 2024 Apr 18;24(8):2583. doi: 10.3390/s24082583.
ABSTRACT
In diverse realms of research, such as holographic optical tweezer mechanical measurements, colloidal particle motion state examinations, cell tracking, and drug delivery, the localization and analysis of particle motion command paramount significance. Algorithms ranging from conventional numerical methods to advanced deep-learning networks mark substantial strides in the sphere of particle orientation analysis. However, the need for datasets has hindered the application of deep learning in particle tracking. In this work, we elucidated an efficacious methodology pivoted toward generating synthetic datasets conducive to this domain that resonates with robustness and precision when applied to real-world data of tracking 3D particles. We developed a 3D real-time particle positioning network based on the CenterNet network. After conducting experiments, our network has achieved a horizontal positioning error of 0.0478 μm and a z-axis positioning error of 0.1990 μm. It shows the capability to handle real-time tracking of particles, diverse in dimensions, near the focal plane with high precision. In addition, we have rendered all datasets cultivated during this investigation accessible.
PMID:38676200 | DOI:10.3390/s24082583
Enhancing Pure Inertial Navigation Accuracy through a Redundant High-Precision Accelerometer-Based Method Utilizing Neural Networks
Sensors (Basel). 2024 Apr 17;24(8):2566. doi: 10.3390/s24082566.
ABSTRACT
The pure inertial navigation system, crucial for autonomous navigation in GPS-denied environments, faces challenges of error accumulation over time, impacting its effectiveness for prolonged missions. Traditional methods to enhance accuracy have focused on improving instrumentation and algorithms but face limitations due to complexity and costs. This study introduces a novel device-level redundant inertial navigation framework using high-precision accelerometers combined with a neural network-based method to refine navigation accuracy. Experimental validation confirms that this integration significantly boosts navigational precision, outperforming conventional system-level redundancy approaches. The proposed method utilizes the advanced capabilities of high-precision accelerometers and deep learning to achieve superior predictive accuracy and error reduction. This research paves the way for the future integration of cutting-edge technologies like high-precision optomechanical and atom interferometer accelerometers, offering new directions for advanced inertial navigation systems and enhancing their application scope in challenging environments.
PMID:38676182 | DOI:10.3390/s24082566
Time-Frequency Aliased Signal Identification Based on Multimodal Feature Fusion
Sensors (Basel). 2024 Apr 16;24(8):2558. doi: 10.3390/s24082558.
ABSTRACT
The identification of multi-source signals with time-frequency aliasing is a complex problem in wideband signal reception. The traditional method of first separation and identification especially fails due to the significant separation error under underdetermined conditions when the degree of time-frequency aliasing is high. The single-mode recognition method does not need to be separated first. However, the single-mode features contain less signal information, making it challenging to identify time-frequency aliasing signals accurately. To solve the above problems, this article proposes a time-frequency aliasing signal recognition method based on multi-mode fusion (TRMM). This method uses the U-Net network to extract pixel-by-pixel features of the time-frequency and wave-frequency images and then performs weighted fusion. The multimodal feature scores are used as the classification basis to realize the recognition of the time-frequency aliasing signals. When the SNR is 0 dB, the recognition rate of the four-signal aliasing model can reach more than 97.3%.
PMID:38676175 | DOI:10.3390/s24082558
Optimisation and Calibration of Bayesian Neural Network for Probabilistic Prediction of Biogas Performance in an Anaerobic Lagoon
Sensors (Basel). 2024 Apr 15;24(8):2537. doi: 10.3390/s24082537.
ABSTRACT
This study aims to enhance diagnostic capabilities for optimising the performance of the anaerobic sewage treatment lagoon at Melbourne Water's Western Treatment Plant (WTP) through a novel machine learning (ML)-based monitoring strategy. This strategy employs ML to make accurate probabilistic predictions of biogas performance by leveraging diverse real-life operational and inspection sensor and other measurement data for asset management, decision making, and structural health monitoring (SHM). The paper commences with data analysis and preprocessing of complex irregular datasets to facilitate efficient learning in an artificial neural network. Subsequently, a Bayesian mixture density neural network model incorporating an attention-based mechanism in bidirectional long short-term memory (BiLSTM) was developed. This probabilistic approach uses a distribution output layer based on the Gaussian mixture model and Monte Carlo (MC) dropout technique in estimating data and model uncertainties, respectively. Furthermore, systematic hyperparameter optimisation revealed that the optimised model achieved a negative log-likelihood (NLL) of 0.074, significantly outperforming other configurations. It achieved an accuracy approximately 9 times greater than the average model performance (NLL = 0.753) and 22 times greater than the worst performing model (NLL = 1.677). Key factors influencing the model's accuracy, such as the input window size and the number of hidden units in the BiLSTM layer, were identified, while the number of neurons in the fully connected layer was found to have no significant impact on accuracy. Moreover, model calibration using the expected calibration error was performed to correct the model's predictive uncertainty. The findings suggest that the inherent data significantly contribute to the overall uncertainty of the model, highlighting the need for more high-quality data to enhance learning. This study lays the groundwork for applying ML in transforming high-value assets into intelligent structures and has broader implications for ML in asset management, SHM applications, and renewable energy sectors.
PMID:38676155 | DOI:10.3390/s24082537
Deep learning for low-data drug discovery: Hurdles and opportunities
Curr Opin Struct Biol. 2024 Apr 25;86:102818. doi: 10.1016/j.sbi.2024.102818. Online ahead of print.
ABSTRACT
Deep learning is becoming increasingly relevant in drug discovery, from de novo design to protein structure prediction and synthesis planning. However, it is often challenged by the small data regimes typical of certain drug discovery tasks. In such scenarios, deep learning approaches-which are notoriously 'data-hungry'-might fail to live up to their promise. Developing novel approaches to leverage the power of deep learning in low-data scenarios is sparking great attention, and future developments are expected to propel the field further. This mini-review provides an overview of recent low-data-learning approaches in drug discovery, analyzing their hurdles and advantages. Finally, we venture to provide a forecast of future research directions in low-data learning for drug discovery.
PMID:38669740 | DOI:10.1016/j.sbi.2024.102818
Brain tumor detection using proper orthogonal decomposition integrated with deep learning networks
Comput Methods Programs Biomed. 2024 Apr 15;250:108167. doi: 10.1016/j.cmpb.2024.108167. Online ahead of print.
ABSTRACT
BACKGROUND AND OBJECTIVE: The central organ of the human nervous system is the brain, which receives and sends stimuli to the various parts of the body to engage in daily activities. Uncontrolled growth of brain cells can result in tumors which affect the normal functions of healthy brain cells. An automatic reliable technique for detecting tumors is imperative to assist medical practitioners in the timely diagnosis of patients. Although machine learning models are being used, with minimal data availability to train, development of low-order based models integrated with machine learning are a tool for reliable detection.
METHODS: In this study, we focus on comparing a low-order model such as proper orthogonal decomposition (POD) coupled with convolutional neural network (CNN) on 2D images from magnetic resonance imaging (MRI) scans to effectively identify brain tumors. The explainability of the coupled POD-CNN prediction output as well as the state-of-the-art pre-trained transfer learning models such as MobileNetV2, Inception-v3, ResNet101, and VGG-19 were explored.
RESULTS: The results showed that CNN predicted tumors with an accuracy of 99.21% whereas POD-CNN performed better with about 1/3rd of computational time at an accuracy of 95.88%. Explainable AI with SHAP showed MobileNetV2 has better prediction in identifying the tumor boundaries.
CONCLUSIONS: Integration of POD with CNN is carried for the first time to detect brain tumor detection with minimal MRI scan data. This study facilitates low-model approaches in machine learning to improve the accuracy and performance of tumor detection.
PMID:38669717 | DOI:10.1016/j.cmpb.2024.108167
Dense Sample Deep Learning
Neural Comput. 2024 Apr 17:1-17. doi: 10.1162/neco_a_01666. Online ahead of print.
ABSTRACT
Deep learning (DL), a variant of the neural network algorithms originally proposed in the 1980s (Rumelhart et al., 1986), has made surprising progress in artificial intelligence (AI), ranging from language translation, protein folding (Jumper et al, 2021), autonomous cars, and, more recently, human-like language models (chatbots). All that seemed intractable until very recently. Despite the growing use of DL networks, little is understood about the learning mechanisms and representations that make these networks effective across such a diverse range of applications. Part of the answer must be the huge scale of the architecture and, of course, the large scale of the data, since not much has changed since 1986. But the nature of deep learned representations remains largely unknown. Unfortunately, training sets with millions or billions of tokens have unknown combinatorics, and networks with millions or billions of hidden units can't easily be visualized and their mechanisms can't be easily revealed. In this letter, we explore these challenges with a large (1.24 million weights; VGG) DL in a novel high-density sample task (five unique tokens with more than 500 exemplars per token), which allows us to more carefully follow the emergence of category structure and feature construction. We use various visualization methods for following the emergence of the classification and the development of the coupling of feature detectors and structures that provide a type of graphical bootstrapping. From these results, we harvest some basic observations of the learning dynamics of DL and propose a new theory of complex feature construction based on our results.
PMID:38669696 | DOI:10.1162/neco_a_01666
Feature shared multi-decoder network using complementary learning for Photon counting CT ring artifact suppression
J Xray Sci Technol. 2024 Apr 25. doi: 10.3233/XST-230396. Online ahead of print.
ABSTRACT
BACKGROUND: Photon-counting computed tomography (Photon counting CT) utilizes photon-counting detectors to precisely count incident photons and measure their energy. These detectors, compared to traditional energy integration detectors, provide better image contrast and material differentiation. However, Photon counting CT tends to show more noticeable ring artifacts due to limited photon counts and detector response variations, unlike conventional spiral CT.
OBJECTIVE: To comprehensively address this issue, we propose a novel feature shared multi-decoder network (FSMDN) that utilizes complementary learning to suppress ring artifacts in Photon counting CT images.
METHODS: Specifically, we employ a feature-sharing encoder to extract context and ring artifact features, facilitating effective feature sharing. These shared features are also independently processed by separate decoders dedicated to the context and ring artifact channels, working in parallel. Through complementary learning, this approach achieves superior performance in terms of artifact suppression while preserving tissue details.
RESULTS: We conducted numerous experiments on Photon counting CT images with three-intensity ring artifacts. Both qualitative and quantitative results demonstrate that our network model performs exceptionally well in correcting ring artifacts at different levels while exhibiting superior stability and robustness compared to the comparison methods.
CONCLUSIONS: In this paper, we have introduced a novel deep learning network designed to mitigate ring artifacts in Photon counting CT images. The results illustrate the viability and efficacy of our proposed network model as a new deep learning-based method for suppressing ring artifacts.
PMID:38669511 | DOI:10.3233/XST-230396
Deep learning-based anatomical position recognition for gastroscopic examination
Technol Health Care. 2024 Apr 18. doi: 10.3233/THC-248004. Online ahead of print.
ABSTRACT
BACKGROUND: The gastroscopic examination is a preferred method for the detection of upper gastrointestinal lesions. However, gastroscopic examination has high requirements for doctors, especially for the strict position and quantity of the archived images. These requirements are challenging for the education and training of junior doctors.
OBJECTIVE: The purpose of this study is to use deep learning to develop automatic position recognition technology for gastroscopic examination.
METHODS: A total of 17182 gastroscopic images in eight anatomical position categories are collected. Convolutional neural network model MogaNet is used to identify all the anatomical positions of the stomach for gastroscopic examination The performance of four models is evaluated by sensitivity, precision, and F1 score.
RESULTS: The average sensitivity of the method proposed is 0.963, which is 0.074, 0.066 and 0.065 higher than ResNet, GoogleNet and SqueezeNet, respectively. The average precision of the method proposed is 0.964, which is 0.072, 0.067 and 0.068 higher than ResNet, GoogleNet, and SqueezeNet, respectively. And the average F1-Score of the method proposed is 0.964, which is 0.074, 0.067 and 0.067 higher than ResNet, GoogleNet, and SqueezeNet, respectively. The results of the t-test show that the method proposed is significantly different from other methods (p< 0.05).
CONCLUSION: The method proposed exhibits the best performance for anatomical positions recognition. And the method proposed can help junior doctors meet the requirements of completeness of gastroscopic examination and the number and position of archived images quickly.
PMID:38669495 | DOI:10.3233/THC-248004
Clinical VMAT machine parameter optimization for localized prostate cancer using deep reinforcement learning
Med Phys. 2024 Apr 26. doi: 10.1002/mp.17100. Online ahead of print.
ABSTRACT
BACKGROUND: Volumetric modulated arc therapy (VMAT) machine parameter optimization (MPO) remains computationally expensive and sensitive to input dose objectives creating challenges for manual and automatic planning. Reinforcement learning (RL) involves machine learning through extensive trial-and-error, demonstrating performance exceeding humans, and existing algorithms in several domains.
PURPOSE: To develop and evaluate an RL approach for VMAT MPO for localized prostate cancer to rapidly and automatically generate deliverable VMAT plans for a clinical linear accelerator (linac) and compare resultant dosimetry to clinical plans.
METHODS: We extended our previous RL approach to enable VMAT MPO of a 3D beam model for a clinical linac through a policy network. It accepts an input state describing the current control point and predicts continuous machine parameters for the next control point, which are used to update the input state, repeating until plan termination. RL training was conducted to minimize a dose-based cost function for prescription of 60 Gy in 20 fractions using CT scans and contours from 136 retrospective localized prostate cancer patients, 20 of which had existing plans used to initialize training. Data augmentation was employed to mitigate over-fitting, and parameter exploration was achieved using Gaussian perturbations. Following training, RL VMAT was applied to an independent cohort of 15 patients, and the resultant dosimetry was compared to clinical plans. We also combined the RL approach with our clinical treatment planning system (TPS) to automate final plan refinement, and creating the potential for manual review and edits as required for clinical use.
RESULTS: RL training was conducted for 5000 iterations, producing 40 000 plans during exploration. Mean ± SD execution time to produce deliverable VMAT plans in the test cohort was 3.3 ± 0.5 s which were automatically refined in the TPS taking an additional 77.4 ± 5.8 s. When normalized to provide equivalent target coverage, the RL+TPS plans provided a similar mean ± SD overall maximum dose of 63.2 ± 0.6 Gy and a lower mean rectum dose of 17.4 ± 7.4 compared to 63.9 ± 1.5 Gy (p = 0.061) and 21.0 ± 6.0 (p = 0.024) for the clinical plans.
CONCLUSIONS: An approach for VMAT MPO using RL for a clinical linac model was developed and applied to automatically generate deliverable plans for localized prostate cancer patients, and when combined with the clinical TPS shows potential to rapidly generate high-quality plans. The RL VMAT approach shows promise to discover advanced linac control policies through trial-and-error, and algorithm limitations and future directions are identified and discussed.
PMID:38669457 | DOI:10.1002/mp.17100
Real-time 3D tracking of swimming microbes using digital holographic microscopy and deep learning
PLoS One. 2024 Apr 26;19(4):e0301182. doi: 10.1371/journal.pone.0301182. eCollection 2024.
ABSTRACT
The three-dimensional swimming tracks of motile microorganisms can be used to identify their species, which holds promise for the rapid identification of bacterial pathogens. The tracks also provide detailed information on the cells' responses to external stimuli such as chemical gradients and physical objects. Digital holographic microscopy (DHM) is a well-established, but computationally intensive method for obtaining three-dimensional cell tracks from video microscopy data. We demonstrate that a common neural network (NN) accelerates the analysis of holographic data by an order of magnitude, enabling its use on single-board computers and in real time. We establish a heuristic relationship between the distance of a cell from the focal plane and the size of the bounding box assigned to it by the NN, allowing us to rapidly localise cells in three dimensions as they swim. This technique opens the possibility of providing real-time feedback in experiments, for example by monitoring and adapting the supply of nutrients to a microbial bioreactor in response to changes in the swimming phenotype of microbes, or for rapid identification of bacterial pathogens in drinking water or clinical samples.
PMID:38669245 | DOI:10.1371/journal.pone.0301182
Understanding Double Descent Using VC-Theoretical Framework
IEEE Trans Neural Netw Learn Syst. 2024 Apr 26;PP. doi: 10.1109/TNNLS.2024.3388873. Online ahead of print.
ABSTRACT
In spite of many successful applications of deep learning (DL) networks, theoretical understanding of their generalization capabilities and limitations remains limited. We present analysis of generalization performance of DL networks for classification under VC-theoretical framework. In particular, we analyze the so-called "double descent" phenomenon, when large overparameterized networks can generalize well, even when they perfectly memorize all available training data. This appears to contradict conventional statistical view that optimal model complexity should reflect an optimal balance between underfitting and overfitting, i.e., the bias-variance trade-off. We present VC-theoretical explanation of double descent phenomenon, under classification setting. Our theoretical explanation is supported by empirical modeling of double descent curves, using analytic VC-bounds, for several learning methods, such as support vector machine (SVM), least squares (LS), and multilayer perceptron classifiers. The proposed VC-theoretical approach enables better understanding of overparameterized estimators during second descent.
PMID:38669171 | DOI:10.1109/TNNLS.2024.3388873
Classification of the quality of canine and feline ventrodorsal and dorsoventral thoracic radiographs through machine learning
Vet Radiol Ultrasound. 2024 Apr 26. doi: 10.1111/vru.13373. Online ahead of print.
ABSTRACT
Thoracic radiographs are an essential diagnostic tool in companion animal medicine and are frequently used as a part of routine workups in patients presenting for coughing, respiratory distress, cardiovascular diseases, and for staging of neoplasia. Quality control is a critical aspect of radiology practice in preventing misdiagnosis and ensuring consistent, accurate, and reliable diagnostic imaging. Implementing an effective quality control procedure in radiology can impact patient outcomes, facilitate clinical decision-making, and decrease healthcare costs. In this study, a machine learning-based quality classification model is suggested for canine and feline thoracic radiographs captured in both ventrodorsal and dorsoventral positions. The problem of quality classification was divided into collimation, positioning, and exposure, and then an automatic classification method was proposed for each based on deep learning and machine learning. We utilized a dataset of 899 radiographs of dogs and cats. Evaluations using fivefold cross-validation resulted in an F1 score and AUC score of 91.33 (95% CI: 88.37-94.29) and 91.10 (95% CI: 88.16-94.03), respectively. Results indicated that the proposed automatic quality classification has the potential to be implemented in radiology clinics to improve radiograph quality and reduce nondiagnostic images.
PMID:38668682 | DOI:10.1111/vru.13373