Deep learning
Improving the efficiency and accuracy of CMR with AI - review of evidence and proposition of a roadmap to clinical translation
J Cardiovasc Magn Reson. 2024 Jun 21:101051. doi: 10.1016/j.jocmr.2024.101051. Online ahead of print.
ABSTRACT
Cardiovascular magnetic resonance (CMR) is an important imaging modality for the assessment of heart disease; however, limitations of CMR include long exam times and high complexity compared to other cardiac imaging modalities. Recently advancements in artificial intelligence (AI) technology have shown great potential to address many CMR limitations. While the developments are remarkable, translation of AI-based methods into real-world CMR clinical practice remains at a nascent stage and much work lies ahead to realize the full potential of AI for CMR. Herein we review recent cutting-edge and representative examples demonstrating how AI can advance CMR in areas such as exam planning, accelerated image reconstruction, post-processing, quality control, classification and diagnosis. These advances can be applied to speed up and simplify essentially every application including cine, strain, late gadolinium enhancement, parametric mapping, 3D whole heart, flow, perfusion and others. AI is a unique technology based on training models using data. Beyond reviewing the literature, this paper discusses important AI-specific issues in the context of CMR, including (1) properties and characteristics of datasets for training and validation, (2) previously published guidelines for reporting CMR AI research, (3) considerations around clinical deployment, (4) responsibilities of clinicians and the need for multi-disciplinary teams in the development and deployment of AI in CMR, (5) industry considerations, and (6) regulatory perspectives. Understanding and consideration of all these factors will contribute to the effective and ethical deployment of AI to improve clinical CMR.
PMID:38909656 | DOI:10.1016/j.jocmr.2024.101051
Floating on groundwater: Insight of multi-source remote sensing for Qaidam basin
J Environ Manage. 2024 Jun 22;365:121513. doi: 10.1016/j.jenvman.2024.121513. Online ahead of print.
ABSTRACT
Situated in the north of the Qinghai-Tibet Plateau, the Qaidam Basin experiences limited precipitation and significant evaporation. Despite these conditions, it stands out as one of the most densely distributed lakes in China. The formation of these lakes is controversial: whether the lake water primarily originates from local precipitation or external water sources. To address this issue, this paper explores the recharge sources of lakes in the Qaidam Basin and the circulation patterns of groundwater from a remote sensing perspective. Based on deep learning networks, we optimized the soft object regions of the Object-Contextual Representations Network (OCRNet) and proposed the Remote·Sensing Adaptive-Improved OCRNet (RSA-IOCRNet). Compared with seven other networks, RSA-IOCRNet obtained better experimental results and was used to construct an area sequence of 16 major lakes in the Qaidam Basin. Combined with multi-source data, the comprehensive analysis indicates no significant correlation between climatic factors and lake changes, while an obvious correlation between lakes and groundwater changes in the eastern Qaidam, consisting with the results of the field survey. Deep-circulating groundwater recharges numerous Qaidam lakes through upwelling from fault zones, such as Gasikule Lake and Xiaochaidan Lake. Groundwater in the Qaidam Basin is more depleted in hydrogen-oxygen isotope characteristics than surface water in the basin, but similar to some river water in the endorheic Tibetan Plateau. This indicates that Tibetan seepage water, estimated at approximately 540 billion m3/a, is transported through the Qaidam Basin via deep circulation. Moreover, it rises to recharge the groundwater and lakes within this basin through fracture zones, extending to various arid and semi-arid regions such as Taitema Lake. This work provides a new perspective on the impact of deep groundwater on lakes and water circulation in these areas.
PMID:38909574 | DOI:10.1016/j.jenvman.2024.121513
An improved method for diagnosis of Parkinson's disease using deep learning models enhanced with metaheuristic algorithm
BMC Med Imaging. 2024 Jun 24;24(1):156. doi: 10.1186/s12880-024-01335-z.
ABSTRACT
Parkinson's disease (PD) is challenging for clinicians to accurately diagnose in the early stages. Quantitative measures of brain health can be obtained safely and non-invasively using medical imaging techniques like magnetic resonance imaging (MRI) and single photon emission computed tomography (SPECT). For accurate diagnosis of PD, powerful machine learning and deep learning models as well as the effectiveness of medical imaging tools for assessing neurological health are required. This study proposes four deep learning models with a hybrid model for the early detection of PD. For the simulation study, two standard datasets are chosen. Further to improve the performance of the models, grey wolf optimization (GWO) is used to automatically fine-tune the hyperparameters of the models. The GWO-VGG16, GWO-DenseNet, GWO-DenseNet + LSTM, GWO-InceptionV3 and GWO-VGG16 + InceptionV3 are applied to the T1,T2-weighted and SPECT DaTscan datasets. All the models performed well and obtained near or above 99% accuracy. The highest accuracy of 99.94% and AUC of 99.99% is achieved by the hybrid model (GWO-VGG16 + InceptionV3) for T1,T2-weighted dataset and 100% accuracy and 99.92% AUC is recorded for GWO-VGG16 + InceptionV3 models using SPECT DaTscan dataset.
PMID:38910241 | DOI:10.1186/s12880-024-01335-z
Estimating mandibular growth stage based on cervical vertebral maturation in lateral cephalometric radiographs using artificial intelligence
Prog Orthod. 2024 Jun 24;25(1):28. doi: 10.1186/s40510-024-00527-1.
ABSTRACT
INTRODUCTION: Determining the right time for orthodontic treatment is one of the most important factors affecting the treatment plan and its outcome. The aim of this study is to estimate the mandibular growth stage based on cervical vertebral maturation (CVM) in lateral cephalometric radiographs using artificial intelligence. Unlike previous studies, which use conventional CVM stage naming, our proposed method directly correlates cervical vertebrae with mandibular growth slope.
METHODS AND MATERIALS: To conduct this study, first, information of people achieved in American Association of Orthodontics Foundation (AAOF) growth centers was assessed and after considering the entry and exit criteria, a total of 200 people, 108 women and 92 men, were included in the study. Then, the length of the mandible in the lateral cephalometric radiographs that were taken serially from the patients was calculated. The corresponding graphs were labeled based on the growth rate of the mandible in 3 stages; before the growth peak of puberty (pre-pubertal), during the growth peak of puberty (pubertal) and after the growth peak of puberty (post-pubertal). A total of 663 images were selected for evaluation using artificial intelligence. These images were evaluated with different deep learning-based artificial intelligence models considering the diagnostic measures of sensitivity, specificity, accuracy, positive predictive value (PPV), and negative predictive value (NPV). We also employed weighted kappa statistics.
RESULTS: In the diagnosis of pre-pubertal stage, the convolutional neural network (CNN) designed for this study has the higher sensitivity and NPV (0.84, 0.91 respectively) compared to ResNet-18 model. The ResNet-18 model had better performance in other diagnostic measures of the pre-pubertal stage and all measures in the pubertal and post-pubertal stages. The highest overall diagnostic accuracy was also obtained using ResNet-18 model with the amount of 87.5% compared to 81% in designed CNN.
CONCLUSION: The artificial intelligence model trained in this study can receive images of cervical vertebrae and predict mandibular growth status by classifying it into one of three groups; before the growth spurt (pre-pubertal), during the growth spurt (pubertal), and after the growth spurt (post-pubertal). The highest accuracy is in post-pubertal stage with the designed networks.
PMID:38910180 | DOI:10.1186/s40510-024-00527-1
Image-based facial emotion recognition using convolutional neural network on emognition dataset
Sci Rep. 2024 Jun 23;14(1):14429. doi: 10.1038/s41598-024-65276-x.
ABSTRACT
Detecting emotions from facial images is difficult because facial expressions can vary significantly. Previous research on using deep learning models to classify emotions from facial images has been carried out on various datasets that contain a limited range of expressions. This study expands the use of deep learning for facial emotion recognition (FER) based on Emognition dataset that includes ten target emotions: amusement, awe, enthusiasm, liking, surprise, anger, disgust, fear, sadness, and neutral. A series of data preprocessing was carried out to convert video data into images and augment the data. This study proposes Convolutional Neural Network (CNN) models built through two approaches, which are transfer learning (fine-tuned) with pre-trained models of Inception-V3 and MobileNet-V2 and building from scratch using the Taguchi method to find robust combination of hyperparameters setting. The proposed model demonstrated favorable performance over a series of experimental processes with an accuracy and an average F1-score of 96% and 0.95, respectively, on the test data.
PMID:38910179 | DOI:10.1038/s41598-024-65276-x
Utility of Thin-slice Fat-suppressed Single-shot T2-weighted MR Imaging with Deep Learning Image Reconstruction as a Protocol for Evaluating the Pancreas
Magn Reson Med Sci. 2024 Jun 21. doi: 10.2463/mrms.mp.2024-0017. Online ahead of print.
ABSTRACT
PURPOSE: To compare the utility of thin-slice fat-suppressed single-shot T2-weighted imaging (T2WI) with deep learning image reconstruction (DLIR) and conventional fast spin-echo T2WI with DLIR for evaluating pancreatic protocol.
METHODS: This retrospective study included 42 patients (mean age, 70.2 years) with pancreatic cancer who underwent gadoxetic acid-enhanced MRI. Three fat-suppressed T2WI, including conventional fast-spin echo with 6 mm thickness (FSE 6 mm), single-shot fast-spin echo with 6 mm and 3 mm thickness (SSFSE 6 mm and SSFSE 3 mm), were acquired for each patient. For quantitative analysis, the SNRs of the upper abdominal organs were calculated between images with and without DLIR. The pancreas-to-lesion contrast on DLIR images was also calculated. For qualitative analysis, two abdominal radiologists independently scored the image quality on a 5-point scale in the FSE 6 mm, SSFSE 6 mm, and SSFSE 3 mm with DLIR.
RESULTS: The SNRs significantly improved among the three T2-weighted images with DLIR compared to those without DLIR in all patients (P < 0.001). The pancreas-to-lesion contrast of SSFSE 3 mm was higher than those of the FSE 6 mm (P < 0.001) and tended to be higher than SSFSE 6 mm (P = 0.07). SSFSE 3 mm had the highest image qualities regarding pancreas edge sharpness, pancreatic duct clarity, and overall image quality, followed by SSFSE 6 mm and FSE 6 mm (P < 0.0001).
CONCLUSION: SSFSE 3 mm with DLIR demonstrated significant improvements in SNRs of the pancreas, pancreas-to-lesion contrast, and image quality more efficiently than did SSFSE 6 mm and FSE 6 mm. Thin-slice fat-suppressed single-shot T2WI with DLIR can be easily implemented for pancreatic MR protocol.
PMID:38910138 | DOI:10.2463/mrms.mp.2024-0017
Computer Vision-Radiomics & Pathognomics
Otolaryngol Clin North Am. 2024 Jun 22:S0030-6665(24)00072-0. doi: 10.1016/j.otc.2024.05.003. Online ahead of print.
ABSTRACT
The role of computer vision in extracting radiographic (radiomics) and histopathologic (pathognomics) features is an extension of molecular biomarkers that have been foundational to our understanding across the spectrum of head and neck disorders. Especially within head and neck cancers, machine learning and deep learning applications have yielded advances in the characterization of tumor features, nodal features, and various outcomes. This review aims to overview the landscape of radiomic and pathognomic applications, informing future work to address gaps. Novel methodologies will be needed to potentially engineer ways of integrating multidimensional data inputs to examine disease features to guide prognosis comprehensively and ultimately clinical management.
PMID:38910065 | DOI:10.1016/j.otc.2024.05.003
Deep learning-based detection of lumbar spinal canal stenosis using convolutional neural networks
Spine J. 2024 Jun 21:S1529-9430(24)00299-7. doi: 10.1016/j.spinee.2024.06.009. Online ahead of print.
ABSTRACT
BACKGROUND CONTEXT: Lumbar spinal canal stenosis (LSCS) is the most common spinal degenerative disorder in elderly people and usually first seen by primary care physicians or orthopedic surgeons who are not spine surgery specialists. Magnetic resonance imaging (MRI) is useful in the diagnosis of LSCS, but the equipment is often not available or difficult to read. LSCS patients with progressive neurologic deficits have difficulty with recovery if surgical treatment is delayed. So, early diagnosis and determination of appropriate surgical indications are crucial in the treatment of LSCS. Convolutional neural networks (CNNs), a type of deep learning, offers significant advantages for image recognition and classification, and work well with radiographs, which can be easily taken at any facility.
PURPOSE: Our purpose was to develop an algorithm to diagnose the presence or absence of LSCS requiring surgery from plain radiographs using CNNs.
STUDY DESIGN: Retrospective analysis of consecutive, nonrandomized series of patients at a single institution.
PATIENT SAMPLE: Data of 150 patients who underwent surgery for LSCS, including degenerative spondylolisthesis, at a single institution from January 2022 to August 2022, were collected. Additionally, 25 patients who underwent surgery at two other hospitals were included for extra external validation.
OUTCOME MEASURES: In annotation 1, the area under the curve (AUC) computed from the receiver operating characteristic (ROC) curve, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, positive likelihood ratio (PLR), and negative likelihood ratio (NLR) were calculated. In annotation 2, correlation coefficients were used.
METHODS: Four intervertebral levels from L1/2 to L4/5 were extracted as region of interest from lateral plain lumbar spine radiographs totaling 600 images were obtained. Based on the date of surgery, 500 images derived from the first 125 cases were used for internal validation, and 100 images from the subsequent 25 cases used for external validation. Additionally, 100 images from other hospitals were used for extra external validation. In annotation 1, binary classification of operative and nonoperative levels was used, and in annotation 2, the spinal canal area measured on axial MRI was labeled as the output layer. For internal validation, the 500 images were divided into each 5 dataset on per-patient basis and five-fold cross-validation was performed. Five trained models were registered in the external validation prediction performance. Grad-CAM was used to visualize area with the high features extracted by CNNs.
RESULTS: In internal validation, the AUC and accuracy for annotation 1 ranged between 0.85-0.89 and 79-83%, respectively, and the correlation coefficients for annotation 2 ranged between 0.53-0.64 (all P < 0.01). In external validation, the AUC and accuracy for annotation 1 were 0.90 and 82%, respectively, and the correlation coefficient for annotation 2 was 0.69, using five trained CNN models. In the extra external validation, the AUC and accuracy for annotation 1 were 0.89 and 84%, respectively, and the correlation coefficient for annotation 2 was 0.56. Grad-CAM showed high feature density in the intervertebral joints and posterior intervertebral discs.
CONCLUSIONS: This technology automatically detects LSCS from plain lumbar spine radiographs, making it possible for medical facilities without MRI or non-specialists to diagnose LSCS, suggesting the possibility of eliminating delays in the diagnosis and treatment of LSCS that require early treatment.
PMID:38909909 | DOI:10.1016/j.spinee.2024.06.009
Artificial Intelligence in Dermatopathology: updates, strengths, and challenges
Clin Dermatol. 2024 Jun 21:S0738-081X(24)00094-4. doi: 10.1016/j.clindermatol.2024.06.010. Online ahead of print.
ABSTRACT
Artificial Intelligence (AI) has evolved to become a significant force in various domains, including medicine. We explore the role of AI in pathology, with a specific focus on dermatopathology and neoplastic dermatopathology. AI, encompassing Machine Learning (ML) and Deep Learning (DL), has demonstrated its potential in tasks ranging from diagnostic applications on Whole Slide Imaging (WSI) to predictive and prognostic functions in skin pathology. In dermatopathology, studies have assessed AI's ability to identify skin lesions, classify melanomas, and improve diagnostic accuracy. Results indicate that AI, particularly Convolutional Neural Networks (CNNs), can outperform human pathologists in terms of sensitivity and specificity. Moreover, AI aids in predicting disease outcomes, identifying aggressive tumors, and differentiating between various skin conditions. Neoplastic dermatopathology showcases AI's prowess in classifying melanocytic lesions, discriminating between melanomas and nevi, and aiding dermatopathologists in making accurate diagnoses. Studies emphasize the reproducibility and diagnostic aid that AI provides, especially in challenging cases. In inflammatory and lymphoproliferative dermatopathology, limited research exists, but studies show attempts to use AI to differentiate conditions like Mycosis Fungoides and eczema. While some results are promising, further exploration is needed in these areas. We highlight the extraordinary interest AI has garnered in the scientific community and its potential to assist clinicians and pathologists. Despite the advancements, we have stress edthe importance of collaboration between medical professionals, computer scientists, bioinformaticians, and engineers to harness AI's benefits while acknowledging its limitations and risks. The integration of AI into dermatopathology holds great promise, positioning it as a valuable tool rather than as a replacement for human expertise.
PMID:38909860 | DOI:10.1016/j.clindermatol.2024.06.010
Innovative approaches for accurate ozone prediction and health risk analysis in South Korea: The combined effectiveness of deep learning and AirQ
Sci Total Environ. 2024 Jun 21:174158. doi: 10.1016/j.scitotenv.2024.174158. Online ahead of print.
ABSTRACT
Short-term exposure to ground-level ozone (O3) poses significant health risks, particularly respiratory and cardiovascular diseases, and mortality. This study addresses the pressing need for accurate O3 forecasting to mitigate these risks, focusing on South Korea. We introduce Deep Bias Correction (Deep-BC), a novel framework leveraging Convolutional Neural Networks (CNNs), to refine hourly O3 forecasts from the Community Multiscale Air Quality (CMAQ) model. Our approach involves training Deep-BC using data from 2016 to 2019, including CMAQ's 72-hour O3 forecasts, 31 meteorological variables from the Weather Research and Forecasting (WRF) model, and previous days' station measurements of 6 air pollutants. Deep-BC significantly outperforms CMAQ in 2021, reducing biases in O3 forecasts. Furthermore, we utilize Deep-BC's daily maximum 8-hour average O3 (MDA8 O3) forecasts as input for the AirQ+ model to assess O3's potential impact on mortality across seven major provinces of South Korea: Seoul, Busan, Daegu, Incheon, Daejeon, Ulsan, and Sejong. Short-term O3 exposure is associated with 0.40 % to 0.48 % of natural cause and respiratory deaths and 0.67 % to 0.81 % of cardiovascular deaths. Gender-specific analysis reveals higher mortality rates among men, particularly from respiratory causes. Our findings underscore the critical need for region-specific interventions to address air pollution's detrimental effects on public health in South Korea. By providing improved O3 predictions and quantifying its impact on mortality, this research offers valuable insights for formulating targeted strategies to mitigate air pollution's adverse effects. Moreover, we highlight the urgency of proactive measures in health policies, emphasizing the significance of accurate forecasting and effective interventions to safeguard public health from the deleterious effects of air pollution.
PMID:38909816 | DOI:10.1016/j.scitotenv.2024.174158
Deep learning based decoding of single local field potential events
Neuroimage. 2024 Jun 21:120696. doi: 10.1016/j.neuroimage.2024.120696. Online ahead of print.
ABSTRACT
How is information processed in the cerebral cortex? In most cases, recorded brain activity is averaged over many (stimulus) repetitions, which erases the fine-structure of the neural signal. However, the brain is obviously a single-trial processor. Thus, we here demonstrate that an unsupervised machine learning approach can be used to extract meaningful information from electro-physiological recordings on a single-trial basis. We use an auto-encoder network to reduce the dimensions of single local field potential (LFP) events to create interpretable clusters of different neural activity patterns. Strikingly, certain LFP shapes correspond to latency differences in different recording channels. Hence, LFP shapes can be used to determine the direction of information flux in the cerebral cortex. Furthermore, after clustering, we decoded the cluster centroids to reverse-engineer the underlying prototypical LFP event shapes. To evaluate our approach, we applied it to both extra-cellular neural recordings in rodents, and intra-cranial EEG recordings in humans. Finally, we find that single channel LFP event shapes during spontaneous activity sample from the realm of possible stimulus evoked event shapes. A finding which so far has only been demonstrated for multi-channel population coding.
PMID:38909761 | DOI:10.1016/j.neuroimage.2024.120696
In silico discovery and anti-tumor bioactivities validation of an algal lectin from Kappaphycus alvarezii genome
Int J Biol Macromol. 2024 Jun 21:133311. doi: 10.1016/j.ijbiomac.2024.133311. Online ahead of print.
ABSTRACT
Lectins are proteins that bind specifically and reversibly to carbohydrates, and some of them have significant anti-tumor activities. Compared to those of lectins from land plants, there are far fewer studies on algal lectins, despite of the high biodiversity of algae. However, canonical strategies based on chromatographic feature-oriented screening cannot satisfy the requirement for algal lectin discovery. In this study, prospecting for novel OAAH family lectins throughout 358 genomes of red algae and cyanobacteria was conducted. Then 35 candidate lectins and 1843 of their simulated mutated forms were virtually screened based on predicted binding specificities to characteristic carbohydrates on cancer cells inferred by a deep learning model. A new lectin, named Siye, was discovered in Kappaphycus alvarezii genome and further verified on different cancer cells. Without causing agglutination of erythrocytes, Siye showed significant cytotoxicity to four human cancer cell lines (IC50 values ranging from 0.11 to 3.95 μg/ml), including breast adenocarcinoma HCC1937, lung carcinoma A549, liver cancer HepG2 and romyelocytic leukemia HL60. And the cytotoxicity was induced through promoting apoptosis by regulating the caspase and the p53 pathway within 24 h. This study testifies the feasibility and efficiency of the genome mining guided by evolutionary theory and artificial intelligence in the discovery of algal lectins.
PMID:38909728 | DOI:10.1016/j.ijbiomac.2024.133311
NNBGWO-BRCA marker: Neural Network and binary grey wolf optimization based Breast cancer biomarker discovery framework using multi-omics dataset
Comput Methods Programs Biomed. 2024 Jun 18;254:108291. doi: 10.1016/j.cmpb.2024.108291. Online ahead of print.
ABSTRACT
BACKGROUND AND OBJECTIVE: Breast cancer is a multifaceted condition characterized by diverse features and a substantial mortality rate, underscoring the imperative for timely detection and intervention. The utilization of multi-omics data has gained significant traction in recent years to identify biomarkers and classify subtypes in breast cancer. This kind of research idea from part to whole will also be an inevitable trend in future life science research. Deep learning can integrate and analyze multi-omics data to predict cancer subtypes, which can further drive targeted therapies. However, there are few articles leveraging the nature of deep learning for feature selection. Therefore, this paper proposes a Neural Network and Binary grey Wolf Optimization based BReast CAncer bioMarker (NNBGWO-BRCAMarker) discovery framework using multi-omics data to obtain a series of biomarkers for precise classification of breast cancer subtypes.
METHODS: NNBGWO-BRCAMarker consists of two phases: in the first phase, relevant genes are selected using the weights obtained from a trained feedforward neural network; in the second phase, the binary grey wolf optimization algorithm is leveraged to further screen the selected genes, resulting in a set of potential breast cancer biomarkers.
RESULTS: The SVM classifier with RBF kernel achieved a classification accuracy of 0.9242 ± 0.03 when trained using the 80 biomarkers identified by NNBGWO-BRCAMarker, as evidenced by the experimental results. We conducted a comprehensive gene set analysis, prognostic analysis, and druggability analysis, unveiling 25 druggable genes, 16 enriched pathways strongly linked to specific subtypes of breast cancer, and 8 genes linked to prognostic outcomes.
CONCLUSIONS: The proposed framework successfully identified 80 biomarkers from the multi-omics data, enabling accurate classification of breast cancer subtypes. This discovery may offer novel insights for clinicians to pursue in further studies.
PMID:38909399 | DOI:10.1016/j.cmpb.2024.108291
Monitoring and warning for ammonia nitrogen pollution of urban river based on neural network algorithms
Anal Sci. 2024 Jun 23. doi: 10.1007/s44211-024-00622-7. Online ahead of print.
ABSTRACT
Ammonia nitrogen (AN) pollution frequently occurs in urban rivers with the continuous acceleration of industrialization. Monitoring AN pollution levels and tracing its complex sources often require large-scale testing, which are time-consuming and costly. Due to the lack of reliable data samples, there were few studies investigating the feasibility of water quality prediction of AN concentration with a high fluctuation and non-stationary change through data-driven models. In this study, four deep-learning models based on neural network algorithms including artificial neural network (ANN), recurrent neural network (RNN), long short-term memory (LSTM), and gated recurrent unit (GRU) were employed to predict AN concentration through some easily monitored indicators such as pH, dissolved oxygen, and conductivity, in a real AN-polluted river. The results showed that the GRU model achieved optimal prediction performance with a mean absolute error (MAE) of 0.349 and coefficient of determination (R2) of 0.792. Furthermore, it was found that data preprocessing by the VMD technique improved the prediction accuracy of the GRU model, resulting in an R2 value of 0.822. The prediction model effectively detected and warned against abnormal AN pollution (> 2 mg/L), with a Recall rate of 93.6% and Precision rate of 72.4%. This data-driven method enables reliable monitoring of AN concentration with high-frequency fluctuations and has potential applications for urban river pollution management.
PMID:38909351 | DOI:10.1007/s44211-024-00622-7
Enhancing fall risk assessment: instrumenting vision with deep learning during walks
J Neuroeng Rehabil. 2024 Jun 22;21(1):106. doi: 10.1186/s12984-024-01400-2.
ABSTRACT
BACKGROUND: Falls are common in a range of clinical cohorts, where routine risk assessment often comprises subjective visual observation only. Typically, observational assessment involves evaluation of an individual's gait during scripted walking protocols within a lab to identify deficits that potentially increase fall risk, but subtle deficits may not be (readily) observable. Therefore, objective approaches (e.g., inertial measurement units, IMUs) are useful for quantifying high resolution gait characteristics, enabling more informed fall risk assessment by capturing subtle deficits. However, IMU-based gait instrumentation alone is limited, failing to consider participant behaviour and details within the environment (e.g., obstacles). Video-based eye-tracking glasses may provide additional insight to fall risk, clarifying how people traverse environments based on head and eye movements. Recording head and eye movements can provide insights into how the allocation of visual attention to environmental stimuli influences successful navigation around obstacles. Yet, manual review of video data to evaluate head and eye movements is time-consuming and subjective. An automated approach is needed but none currently exists. This paper proposes a deep learning-based object detection algorithm (VARFA) to instrument vision and video data during walks, complementing instrumented gait.
METHOD: The approach automatically labels video data captured in a gait lab to assess visual attention and details of the environment. The proposed algorithm uses a YoloV8 model trained on with a novel lab-based dataset.
RESULTS: VARFA achieved excellent evaluation metrics (0.93 mAP50), identifying, and localizing static objects (e.g., obstacles in the walking path) with an average accuracy of 93%. Similarly, a U-NET based track/path segmentation model achieved good metrics (IoU 0.82), suggesting that the predicted tracks (i.e., walking paths) align closely with the actual track, with an overlap of 82%. Notably, both models achieved these metrics while processing at real-time speeds, demonstrating efficiency and effectiveness for pragmatic applications.
CONCLUSION: The instrumented approach improves the efficiency and accuracy of fall risk assessment by evaluating the visual allocation of attention (i.e., information about when and where a person is attending) during navigation, improving the breadth of instrumentation in this area. Use of VARFA to instrument vision could be used to better inform fall risk assessment by providing behaviour and context data to complement instrumented e.g., IMU data during gait tasks. That may have notable (e.g., personalized) rehabilitation implications across a wide range of clinical cohorts where poor gait and increased fall risk are common.
PMID:38909239 | DOI:10.1186/s12984-024-01400-2
Saliency-driven explainable deep learning in medical imaging: bridging visual explainability and statistical quantitative analysis
BioData Min. 2024 Jun 22;17(1):18. doi: 10.1186/s13040-024-00370-4.
ABSTRACT
Deep learning shows great promise for medical image analysis but often lacks explainability, hindering its adoption in healthcare. Attribution techniques that explain model reasoning can potentially increase trust in deep learning among clinical stakeholders. In the literature, much of the research on attribution in medical imaging focuses on visual inspection rather than statistical quantitative analysis.In this paper, we proposed an image-based saliency framework to enhance the explainability of deep learning models in medical image analysis. We use adaptive path-based gradient integration, gradient-free techniques, and class activation mapping along with its derivatives to attribute predictions from brain tumor MRI and COVID-19 chest X-ray datasets made by recent deep convolutional neural network models.The proposed framework integrates qualitative and statistical quantitative assessments, employing Accuracy Information Curves (AICs) and Softmax Information Curves (SICs) to measure the effectiveness of saliency methods in retaining critical image information and their correlation with model predictions. Visual inspections indicate that methods such as ScoreCAM, XRAI, GradCAM, and GradCAM++ consistently produce focused and clinically interpretable attribution maps. These methods highlighted possible biomarkers, exposed model biases, and offered insights into the links between input features and predictions, demonstrating their ability to elucidate model reasoning on these datasets. Empirical evaluations reveal that ScoreCAM and XRAI are particularly effective in retaining relevant image regions, as reflected in their higher AUC values. However, SICs highlight variability, with instances of random saliency masks outperforming established methods, emphasizing the need for combining visual and empirical metrics for a comprehensive evaluation.The results underscore the importance of selecting appropriate saliency methods for specific medical imaging tasks and suggest that combining qualitative and quantitative approaches can enhance the transparency, trustworthiness, and clinical adoption of deep learning models in healthcare. This study advances model explainability to increase trust in deep learning among healthcare stakeholders by revealing the rationale behind predictions. Future research should refine empirical metrics for stability and reliability, include more diverse imaging modalities, and focus on improving model explainability to support clinical decision-making.
PMID:38909228 | DOI:10.1186/s13040-024-00370-4
Enhancing automated vehicle identification by integrating YOLO v8 and OCR techniques for high-precision license plate detection and recognition
Sci Rep. 2024 Jun 22;14(1):14389. doi: 10.1038/s41598-024-65272-1.
ABSTRACT
Vehicle identification systems are vital components that enable many aspects of contemporary life, such as safety, trade, transit, and law enforcement. They improve community and individual well-being by increasing vehicle management, security, and transparency. These tasks entail locating and extracting license plates from images or video frames using computer vision and machine learning techniques, followed by recognizing the letters or digits on the plates. This paper proposes a new license plate detection and recognition method based on the deep learning YOLO v8 method, image processing techniques, and the OCR technique for text recognition. For this, the first step was the dataset creation, when gathering 270 images from the internet. Afterward, CVAT (Computer Vision Annotation Tool) was used to annotate the dataset, which is an open-source software platform made to make computer vision tasks easier to annotate and label images and videos. Subsequently, the newly released Yolo version, the Yolo v8, has been employed to detect the number plate area in the input image. Subsequently, after extracting the plate the k-means clustering algorithm, the thresholding techniques, and the opening morphological operation were used to enhance the image and make the characters in the license plate clearer before using OCR. The next step in this process is using the OCR technique to extract the characters. Eventually, a text file containing only the character reflecting the vehicle's country is generated. To ameliorate the efficiency of the proposed approach, several metrics were employed, namely precision, recall, F1-Score, and CLA. In addition, a comparison of the proposed method with existing techniques in the literature has been given. The suggested method obtained convincing results in both detection as well as recognition by obtaining an accuracy of 99% in detection and 98% in character recognition.
PMID:38909147 | DOI:10.1038/s41598-024-65272-1
Three-dimensional atrous inception module for crowd behavior classification
Sci Rep. 2024 Jun 22;14(1):14390. doi: 10.1038/s41598-024-65003-6.
ABSTRACT
Recent advances in deep learning have led to a surge in computer vision research, including the recognition and classification of human behavior in video data. However, most studies have focused on recognizing individual behaviors, whereas recognizing crowd behavior remains a complex problem because of the large number of interactions and similar behaviors among individuals or crowds in video surveillance systems. To solve this problem, we propose a three-dimensional atrous inception module (3D-AIM) network, which is a crowd behavior classification model that uses atrous convolution to explore interactions between individuals or crowds. The 3D-AIM network is a 3D convolutional neural network that can use receptive fields of various sizes to effectively identify specific features that determine crowd behavior. To further improve the accuracy of the 3D-AIM network, we introduced a new loss function called the separation loss function. This loss function focuses the 3D-AIM network more on the features that distinguish one type of crowd behavior from another, thereby enabling a more precise classification. Finally, we demonstrate that the proposed model outperforms existing human behavior classification models in terms of accurately classifying crowd behaviors. These results suggest that the 3D-AIM network with a separation loss function can be valuable for understanding complex crowd behavior in video surveillance systems.
PMID:38909074 | DOI:10.1038/s41598-024-65003-6
A Military Audio Dataset for Situational Awareness and Surveillance
Sci Data. 2024 Jun 22;11(1):668. doi: 10.1038/s41597-024-03511-w.
ABSTRACT
Audio classification related to military activities is a challenging task due to the high levels of background noise and the lack of suitable and publicly available datasets. To bridge this gap, this paper constructs and introduces a new military audio dataset, named MAD, which is suitable for training and evaluating audio classification systems. The proposed MAD dataset is extracted from various military videos and contains 8,075 sound samples from 7 classes corresponding to approximately 12 hours, exhibiting distinctive characteristics not presented in academic datasets typically used for machine learning research. We present a comprehensive description of the dataset, including its acoustic statistics and examples. We further conduct a comprehensive sound classification study of various deep learning algorithms on the MAD dataset. We are also releasing the source code to make it easy to build these systems. The presented dataset will be a valuable resource for evaluating the performance of existing algorithms and for advancing research in the field of acoustic-based hazardous situation surveillance systems.
PMID:38909048 | DOI:10.1038/s41597-024-03511-w
Assessment of Deep Learning-Based Triage Application for Acute Ischemic Stroke on Brain MRI in the ER
Acad Radiol. 2024 Jun 21:S1076-6332(24)00282-4. doi: 10.1016/j.acra.2024.04.046. Online ahead of print.
ABSTRACT
RATIONALE AND OBJECTIVES: To assess a deep learning application (DLA) for acute ischemic stroke (AIS) detection on brain magnetic resonance imaging (MRI) in the emergency room (ER) and the effect of T2-weighted imaging (T2WI) on its performance.
MATERIALS AND METHODS: We retrospectively analyzed brain MRIs taken through the ER from March to October 2021 that included diffusion-weighted imaging (DWI) and fluid-attenuated inversion recovery (FLAIR) sequences. MRIs were processed by the DLA, and sensitivity, specificity, accuracy, and area under the receiver operating characteristic curve (AUROC) were evaluated, with three neuroradiologists establishing the gold standard for detection performance. In addition, we examined the impact of axial T2WI, when available, on the accuracy and processing time of DLA.
RESULTS: The study included 947 individuals (mean age ± standard deviation, 64 years ± 16; 461 men, 486 women), with 239 (25%) positive for AIS. The overall performance of DLA was as follows: sensitivity, 90%; specificity, 89%; accuracy, 89%; and AUROC, 0.95. The average processing time was 24 s. In the subgroup with T2WI, T2WI did not significantly impact MRI assessments but did result in longer processing times (35 s without T2WI compared to 48 s with T2WI, p < 0.001).
CONCLUSION: The DLA successfully identified AIS in the ER setting with an average processing time of 24 s. The absence of performance acquire with axial T2WI suggests that the DLA can diagnose AIS with just axial DWI and FLAIR sequences, potentially shortening the exam duration in the ER.
PMID:38908922 | DOI:10.1016/j.acra.2024.04.046