Deep learning
GNN-DDAS: Drug discovery for identifying anti-schistosome small molecules based on graph neural network
J Comput Chem. 2024 Aug 27. doi: 10.1002/jcc.27490. Online ahead of print.
ABSTRACT
Schistosomiasis is a tropical disease that poses a significant risk to hundreds of millions of people, yet often goes unnoticed. While praziquantel, a widely used anti-schistosome drug, has a low cost and a high cure rate, it has several drawbacks. These include ineffectiveness against schistosome larvae, reduced efficacy in young children, and emerging drug resistance. Discovering new and active anti-schistosome small molecules is therefore critical, but this process presents the challenge of low accuracy in computer-aided methods. To address this issue, we proposed GNN-DDAS, a novel deep learning framework based on graph neural networks (GNN), designed for drug discovery to identify active anti-schistosome (DDAS) small molecules. Initially, a multi-layer perceptron was used to derive sequence features from various representations of small molecule SMILES. Next, GNN was employed to extract structural features from molecular graphs. Finally, the extracted sequence and structural features were then concatenated and fed into a fully connected network to predict active anti-schistosome small molecules. Experimental results showed that GNN-DDAS exhibited superior performance compared to the benchmark methods on both benchmark and real-world application datasets. Additionally, the use of GNNExplainer model allowed us to analyze the key substructure features of small molecules, providing insight into the effectiveness of GNN-DDAS. Overall, GNN-DDAS provided a promising solution for discovering new and active anti-schistosome small molecules.
PMID:39189298 | DOI:10.1002/jcc.27490
NLP-based ergonomics MSD risk root cause analysis and risk controls recommendation
Ergonomics. 2024 Aug 27:1-13. doi: 10.1080/00140139.2024.2394510. Online ahead of print.
ABSTRACT
An ergonomics assessment of the physical risk factors in the workplace is instrumental in predicting and preventing musculoskeletal disorders (MSDs). Using Artificial Intelligence (AI) has become increasingly popular for ergonomics assessments because of the time savings and improved accuracy. However, most of the effort in this area starts and ends with producing risk scores, without providing guidance to reduce the risk. This paper proposes a holistic job improvement process that performs automatic root cause analysis and control recommendations for reducing MSD risk. We apply deep learning-based Natural Language Processing (NLP) techniques such as Part of Speech (PoS) tagging and dependency parsing on textual descriptions of the physical actions performed in the job (e.g. pushing) along with the object (e.g. cart) being acted upon. The action-object inferences provide the entry point to an expert-based Machine Learning (ML) system that automatically identifies the targeted work-related causes (e.g. cart movement forces are too high, due to caster size too small) of the identified MSD risk (e.g. excessive shoulder forces). The proposed framework utilises the root causes identified to recommend control strategies (e.g. provide larger diameter casters, minimum diameter 8" or 203 mm) most likely to mitigate risk, resulting in a more efficient and effective job improvement process.
PMID:39189206 | DOI:10.1080/00140139.2024.2394510
Enhancing Melanoma Diagnosis with Advanced Deep Learning Models Focusing on Vision Transformer, Swin Transformer, and ConvNeXt
Dermatopathology (Basel). 2024 Aug 15;11(3):239-252. doi: 10.3390/dermatopathology11030026.
ABSTRACT
Skin tumors, especially melanoma, which is highly aggressive and progresses quickly to other sites, are an issue in various parts of the world. Nevertheless, the one and only way to save lives is to detect it at its initial stages. This study explores the application of advanced deep learning models for classifying benign and malignant melanoma using dermoscopic images. The aim of the study is to enhance the accuracy and efficiency of melanoma diagnosis with the ConvNeXt, Vision Transformer (ViT) Base-16, and Swin Transformer V2 Small (Swin V2 S) deep learning models. The ConvNeXt model, which integrates principles of both convolutional neural networks and transformers, demonstrated superior performance, with balanced precision and recall metrics. The dataset, sourced from Kaggle, comprises 13,900 uniformly sized images, preprocessed to standardize the inputs for the models. Experimental results revealed that ConvNeXt achieved the highest diagnostic accuracy among the tested models. Experimental results revealed that ConvNeXt achieved an accuracy of 91.5%, with balanced precision and recall rates of 90.45% and 92.8% for benign cases, and 92.61% and 90.2% for malignant cases, respectively. The F1-scores for ConvNeXt were 91.61% for benign cases and 91.39% for malignant cases. This research points out the potential of hybrid deep learning architectures in medical image analysis, particularly for early melanoma detection.
PMID:39189182 | DOI:10.3390/dermatopathology11030026
Deep learning-enhanced R-loop prediction provides mechanistic implications for repeat expansion diseases
iScience. 2024 Jul 25;27(8):110584. doi: 10.1016/j.isci.2024.110584. eCollection 2024 Aug 16.
ABSTRACT
R-loops play diverse functional roles, but controversial genomic localization of R-loops have emerged from experimental approaches, posing significant challenges for R-loop research. The development and application of an accurate computational tool for studying human R-loops remains an unmet need. Here, we introduce DeepER, a deep learning-enhanced R-loop prediction tool. DeepER showcases outstanding performance compared to existing tools, facilitating accurate genome-wide annotation of R-loops and a deeper understanding of the position- and context-dependent effects of nucleotide composition on R-loop formation. DeepER also unveils a strong association between certain tandem repeats and R-loop formation, opening a new avenue for understanding the mechanisms underlying some repeat expansion diseases. To facilitate broader utilization, we have developed a user-friendly web server as an integral component of R-loopBase. We anticipate that DeepER will find extensive applications in the field of R-loop research.
PMID:39188986 | PMC:PMC11345597 | DOI:10.1016/j.isci.2024.110584
Shomikoron: Dataset to discover equations from Bangla Mathematical text
Data Brief. 2024 Jul 17;55:110742. doi: 10.1016/j.dib.2024.110742. eCollection 2024 Aug.
ABSTRACT
Equation Recognition is a mathematical task of identifying equations, which has significance in developing different mathematical systems. In this paper, we introduce a novel Bangla mathematical equation dataset comprising 3430 observations aimed at advancing mathematical Equation Recognition in the Bangla language. To the best of our knowledge, no such dataset exists that was developed to recognize equations from the text. Each entry in the dataset includes a mathematical statement and the corresponding equation. This resource can significantly support research in mathematical Equation Recognition, including the identification of common mathematical operations (such as addition, subtraction, multiplication, division, and roots) and numerical values. With minor adjustments, researchers can also explore combinations of these findings. The dataset is raw and conveniently structured in CSV format, with two columns: "Text" and "Equation," facilitating easy handling for various deep learning and machine learning tasks.
PMID:39188909 | PMC:PMC11345577 | DOI:10.1016/j.dib.2024.110742
A non-enhanced CT-based deep learning diagnostic system for COVID-19 infection at high risk among lung cancer patients
Front Med (Lausanne). 2024 Aug 12;11:1444708. doi: 10.3389/fmed.2024.1444708. eCollection 2024.
ABSTRACT
BACKGROUND: Pneumonia and lung cancer have a mutually reinforcing relationship. Lung cancer patients are prone to contracting COVID-19, with poorer prognoses. Additionally, COVID-19 infection can impact anticancer treatments for lung cancer patients. Developing an early diagnostic system for COVID-19 pneumonia can help improve the prognosis of lung cancer patients with COVID-19 infection.
METHOD: This study proposes a neural network for COVID-19 diagnosis based on non-enhanced CT scans, consisting of two 3D convolutional neural networks (CNN) connected in series to form two diagnostic modules. The first diagnostic module classifies COVID-19 pneumonia patients from other pneumonia patients, while the second diagnostic module distinguishes severe COVID-19 patients from ordinary COVID-19 patients. We also analyzed the correlation between the deep learning features of the two diagnostic modules and various laboratory parameters, including KL-6.
RESULT: The first diagnostic module achieved an accuracy of 0.9669 on the training set and 0.8884 on the test set, while the second diagnostic module achieved an accuracy of 0.9722 on the training set and 0.9184 on the test set. Strong correlation was observed between the deep learning parameters of the second diagnostic module and KL-6.
CONCLUSION: Our neural network can differentiate between COVID-19 pneumonia and other pneumonias on CT images, while also distinguishing between ordinary COVID-19 patients and those with white lung. Patients with white lung in COVID-19 have greater alveolar damage compared to ordinary COVID-19 patients, and our deep learning features can serve as an imaging biomarker.
PMID:39188873 | PMC:PMC11345710 | DOI:10.3389/fmed.2024.1444708
Corrigendum: Music-evoked emotions classification using vision transformer in EEG signals
Front Psychol. 2024 Aug 12;15:1466709. doi: 10.3389/fpsyg.2024.1466709. eCollection 2024.
ABSTRACT
[This corrects the article DOI: 10.3389/fpsyg.2024.1275142.].
PMID:39188869 | PMC:PMC11346311 | DOI:10.3389/fpsyg.2024.1466709
Application of artificial intelligence in the diagnosis and treatment of urinary tumors
Front Oncol. 2024 Aug 12;14:1440626. doi: 10.3389/fonc.2024.1440626. eCollection 2024.
ABSTRACT
Diagnosis and treatment of urological tumors, relying on auxiliary data such as medical imaging, while incorporating individual patient characteristics into treatment selection, has long been a key challenge in clinical medicine. Traditionally, clinicians used extensive experience for decision-making, but recent artificial intelligence (AI) advancements offer new solutions. Machine learning (ML) and deep learning (DL), notably convolutional neural networks (CNNs) in medical image recognition, enable precise tumor diagnosis and treatment. These technologies analyze complex medical image patterns, improving accuracy and efficiency. AI systems, by learning from vast datasets, reveal hidden features, offering reliable diagnostics and personalized treatment plans. Early detection is crucial for tumors like renal cell carcinoma (RCC), bladder cancer (BC), and Prostate Cancer (PCa). AI, coupled with data analysis, improves early detection and reduces misdiagnosis rates, enhancing treatment precision. AI's application in urological tumors is a research focus, promising a vital role in urological surgery with improved patient outcomes. This paper examines ML, DL in urological tumors, and AI's role in clinical decisions, providing insights for future AI applications in urological surgery.
PMID:39188685 | PMC:PMC11345192 | DOI:10.3389/fonc.2024.1440626
Osteoporotic vertebral compression fracture (OVCF) detection using artificial neural networks model based on the AO spine-DGOU osteoporotic fracture classification system
N Am Spine Soc J. 2024 Jul 4;19:100515. doi: 10.1016/j.xnsj.2024.100515. eCollection 2024 Sep.
ABSTRACT
BACKGROUND: Osteoporotic Vertebral Compression Fracture (OVCF) substantially reduces a person's health-related quality of life. Computer Tomography (CT) scan is currently the standard for diagnosis of OVCF. The aim of this paper was to evaluate the OVCF detection potential of artificial neural networks (ANN).
METHODS: Models of artificial intelligence based on deep learning hold promise for quickly and automatically identifying and visualizing OVCF. This study investigated the detection, classification, and grading of OVCF using deep artificial neural networks (ANN). Techniques: Annotation techniques were used to segregate the sagittal images of 1,050 OVCF CT pictures with symptomatic low back pain into 934 CT images for a training dataset (89%) and 116 CT images for a test dataset (11%). A radiologist tagged, cleaned, and annotated the training dataset. Disc deterioration was assessed in all lumbar discs using the AO Spine-DGOU Osteoporotic Fracture Classification System. The detection and grading of OVCF were trained using the deep learning ANN model. By putting an automatic model to the test for dataset grading, the outcomes of the ANN model training were confirmed.
RESULTS: The sagittal lumbar CT training dataset included 5,010 OVCF from OF1, 1942 from OF2, 522 from OF3, 336 from OF4, and none from OF5. With overall 96.04% accuracy, the deep ANN model was able to identify and categorize lumbar OVCF.
CONCLUSIONS: The ANN model offers a rapid and effective way to classify lumbar OVCF by automatically and consistently evaluating routine CT scans using AO Spine-DGOU osteoporotic fracture classification system.
PMID:39188670 | PMC:PMC11345903 | DOI:10.1016/j.xnsj.2024.100515
Harnessing the power of machine learning for crop improvement and sustainable production
Front Plant Sci. 2024 Aug 12;15:1417912. doi: 10.3389/fpls.2024.1417912. eCollection 2024.
ABSTRACT
Crop improvement and production domains encounter large amounts of expanding data with multi-layer complexity that forces researchers to use machine-learning approaches to establish predictive and informative models to understand the sophisticated mechanisms underlying these processes. All machine-learning approaches aim to fit models to target data; nevertheless, it should be noted that a wide range of specialized methods might initially appear confusing. The principal objective of this study is to offer researchers an explicit introduction to some of the essential machine-learning approaches and their applications, comprising the most modern and utilized methods that have gained widespread adoption in crop improvement or similar domains. This article explicitly explains how different machine-learning methods could be applied for given agricultural data, highlights newly emerging techniques for machine-learning users, and lays out technical strategies for agri/crop research practitioners and researchers.
PMID:39188546 | PMC:PMC11346375 | DOI:10.3389/fpls.2024.1417912
Recognition of Forward Head Posture Through 3D Human Pose Estimation With a Graph Convolutional Network: Development and Feasibility Study
JMIR Form Res. 2024 Aug 26;8:e55476. doi: 10.2196/55476.
ABSTRACT
BACKGROUND: Prolonged improper posture can lead to forward head posture (FHP), causing headaches, impaired respiratory function, and fatigue. This is especially relevant in sedentary scenarios, where individuals often maintain static postures for extended periods-a significant part of daily life for many. The development of a system capable of detecting FHP is crucial, as it would not only alert users to correct their posture but also serve the broader goal of contributing to public health by preventing the progression of chronic injuries associated with this condition. However, despite significant advancements in estimating human poses from standard 2D images, most computational pose models do not include measurements of the craniovertebral angle, which involves the C7 vertebra, crucial for diagnosing FHP.
OBJECTIVE: Accurate diagnosis of FHP typically requires dedicated devices, such as clinical postural assessments or specialized imaging equipment, but their use is impractical for continuous, real-time monitoring in everyday settings. Therefore, developing an accessible, efficient method for regular posture assessment that can be easily integrated into daily activities, providing real-time feedback, and promoting corrective action, is necessary.
METHODS: The system sequentially estimates 2D and 3D human anatomical key points from a provided 2D image, using the Detectron2D and VideoPose3D algorithms, respectively. It then uses a graph convolutional network (GCN), explicitly crafted to analyze the spatial configuration and alignment of the upper body's anatomical key points in 3D space. This GCN aims to implicitly learn the intricate relationship between the estimated 3D key points and the correct posture, specifically to identify FHP.
RESULTS: The test accuracy was 78.27% when inputs included all joints corresponding to the upper body key points. The GCN model demonstrated slightly superior balanced performance across classes with an F1-score (macro) of 77.54%, compared to the baseline feedforward neural network (FFNN) model's 75.88%. Specifically, the GCN model showed a more balanced precision and recall between the classes, suggesting its potential for better generalization in FHP detection across diverse postures. Meanwhile, the baseline FFNN model demonstrates a higher precision for FHP cases but at the cost of lower recall, indicating that while it is more accurate in confirming FHP when detected, it misses a significant number of actual FHP instances. This assertion is further substantiated by the examination of the latent feature space using t-distributed stochastic neighbor embedding, where the GCN model presented an isotropic distribution, unlike the FFNN model, which showed an anisotropic distribution.
CONCLUSIONS: Based on 2D image input using 3D human pose estimation joint inputs, it was found that it is possible to learn FHP-related features using the proposed GCN-based network to develop a posture correction system. We conclude the paper by addressing the limitations of our current system and proposing potential avenues for future work in this area.
PMID:39186772 | DOI:10.2196/55476
Non-coplanar CBCT image reconstruction using a generative adversarial network for non-coplanar radiotherapy
J Appl Clin Med Phys. 2024 Aug 26:e14487. doi: 10.1002/acm2.14487. Online ahead of print.
ABSTRACT
PURPOSE: To develop a non-coplanar cone-beam computed tomography (CBCT) image reconstruction method using projections within a limited angle range for non-coplanar radiotherapy.
METHODS: A generative adversarial network (GAN) was utilized to reconstruct non-coplanar CBCT images. Data from 40 patients with brain tumors and two head phantoms were used in this study. In the training stage, the generator of the GAN used coplanar CBCT and non-coplanar projections as the input, and an encoder with a dual-branch structure was utilized to extract features from the coplanar CBCT and non-coplanar projections separately. Non-coplanar CBCT images were then reconstructed using a decoder by combining the extracted features. To improve the reconstruction accuracy of the image details, the generator was adversarially trained using a patch-based convolutional neural network as the discriminator. A newly designed joint loss was used to improve the global structure consistency rather than the conventional GAN loss. The proposed model was evaluated using data from eight patients and two phantoms at four couch angles (±45°, ±90°) that are most commonly used for brain non-coplanar radiotherapy in our department. The reconstructed accuracy was evaluated by calculating the root mean square error (RMSE) and an overall registration error ε, computed by integrating the rigid transformation parameters.
RESULTS: In both patient data and phantom data studies, the qualitative and quantitative metrics results indicated that ± 45° couch angle models performed better than ±90° couch angle models and had statistical differences. In the patient data study, the mean RMSE and ε values of couch angle at 45°, -45°, 90°, and -90° were 58.5 HU and 0.42 mm, 56.8 HU and 0.41 mm, 73.6 HU and 0.48 mm, and 65.3 HU and 0.46 mm, respectively. In the phantom data study, the mean RMSE and ε values of couch angle at 45°, -45°, 90°, and -90° were 91.2 HU and 0.46 mm, 95.0 HU and 0.45 mm, 114.6 HU and 0.58 mm, and 102.9 HU and 0.52 mm, respectively.
CONCLUSIONS: The results show that the reconstructed non-coplanar CBCT images can potentially enable intra-treatment three-dimensional position verification for non-coplanar radiotherapy.
PMID:39186746 | DOI:10.1002/acm2.14487
Vessel trajectory classification via transfer learning with Deep Convolutional Neural Networks
PLoS One. 2024 Aug 26;19(8):e0308934. doi: 10.1371/journal.pone.0308934. eCollection 2024.
ABSTRACT
The classification of vessel trajectories using Automatic Identification System (AIS) data is crucial for ensuring maritime safety and the efficient navigation of ships. The advent of deep learning has brought about more effective classification methods, utilizing Convolutional Neural Networks (CNN). However, existing CNN-based approaches primarily focus on either sailing or loitering movement patterns and struggle to capture valuable features and subtle differences between these patterns from input images. In response to these limitations, we firstly introduce a novel framework, Dense121-VMC, based on Deep Convolutional Neural Networks (DCNN) with transfer learning for simultaneous extraction and classification of both sailing and loitering trajectories. Our approach efficiently performs in extracting significant features from input images and in identifying subtle differences in each vessel's trajectory. Additionally, transfer learning effectively reduces data requirements and addresses the issue of overfitting. Through extended experiments, we demonstrate the novelty of proposed Dense121-VMC framework, achieving notable contributions for vessel trajectory classification.
PMID:39186723 | DOI:10.1371/journal.pone.0308934
Optimized ensemble deep learning for predictive analysis of student achievement
PLoS One. 2024 Aug 26;19(8):e0309141. doi: 10.1371/journal.pone.0309141. eCollection 2024.
ABSTRACT
Education is essential for individuals to lead fulfilling lives and attain greatness by enhancing their value. It improves self-assurance and enables individuals to navigate the complexities of modern society effectively. Despite the obstacles it faces, education continues to develop. The objective of numerous pedagogical approaches is to enhance academic performance. The development of technology, especially artificial intelligence, has caused a significant change in learning. This has made instructional materials available anytime and wherever easily accessible. Higher education institutions are adding technology to conventional teaching strategies to improve learning. This work presents an innovative approach to student performance prediction in educational settings. The strategy combines the DistilBERT with LSTM (DBTM) hybrid approach with the Spotted Hyena Optimizer (SHO) to change parameters. Regarding accuracy, log loss, and execution time, the model significantly improved over earlier models. The challenges presented by the increasing volume of data in graduate and postgraduate programs are effectively addressed by the proposed method. It produces exceptional performance metrics, including a 15-25% decrease in processing time through optimization, 98.7% accuracy, and 0.03% log loss. This work additionally demonstrates the effectiveness of DBTM-SHO in administering extensive datasets and makes an important improvement to educational data mining. It provides a robust foundation for organizations facing the challenges of evaluating student achievement in the era of vast data.
PMID:39186491 | DOI:10.1371/journal.pone.0309141
Assessment of deep learning image reconstruction (DLIR) on image quality in pediatric cardiac CT datasets type of manuscript: Original research
PLoS One. 2024 Aug 26;19(8):e0300090. doi: 10.1371/journal.pone.0300090. eCollection 2024.
ABSTRACT
BAKGROUND: To evaluate the quantitative and qualitative image quality using deep learning image reconstruction (DLIR) of pediatric cardiac computed tomography (CT) compared with conventional image reconstruction methods.
METHODS: Between January 2020 and December 2022, 109 pediatric cardiac CT scans were included in this study. The CT scans were reconstructed using an adaptive statistical iterative reconstruction-V (ASiR-V) with a blending factor of 80% and three levels of DLIR with TrueFidelity (low-, medium-, and high-strength settings). Quantitative image quality was measured using signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR). The edge rise distance (ERD) and angle between 25% and 75% of the line density profile were drawn to evaluate sharpness. Qualitative image quality was assessed using visual grading analysis scores.
RESULTS: A gradual improvement in the SNR and CNR was noted among the strength levels of the DLIR in sequence from low to high. Compared to ASiR-V, high-level DLIR showed significantly improved SNR and CNR (P<0.05). ERD decreased with increasing angle as the level of DLIR increased.
CONCLUSION: High-level DLIR showed improved SNR and CNR compared to ASiR-V, with better sharpness on pediatric cardiac CT scans.
PMID:39186484 | DOI:10.1371/journal.pone.0300090
Generative Adversarial Network with Robust Discriminator Through Multi-Task Learning for Low-Dose CT Denoising
IEEE Trans Med Imaging. 2024 Aug 26;PP. doi: 10.1109/TMI.2024.3449647. Online ahead of print.
ABSTRACT
Reducing the dose of radiation in computed tomography (CT) is vital to decreasing secondary cancer risk. However, the use of low-dose CT (LDCT) images is accompanied by increased noise that can negatively impact diagnoses. Although numerous deep learning algorithms have been developed for LDCT denoising, several challenges persist, including the visual incongruence experienced by radiologists, unsatisfactory performances across various metrics, and insufficient exploration of the networks' robustness in other CT domains. To address such issues, this study proposes three novel accretions. First, we propose a generative adversarial network (GAN) with a robust discriminator through multi-task learning that simultaneously performs three vision tasks: restoration, image-level, and pixel-level decisions. The more multi-tasks that are performed, the better the denoising performance of the generator, which means multi-task learning enables the discriminator to provide more meaningful feedback to the generator. Second, two regulatory mechanisms, restoration consistency (RC) and non-difference suppression (NDS), are introduced to improve the discriminator's representation capabilities. These mechanisms eliminate irrelevant regions and compare the discriminator's results from the input and restoration, thus facilitating effective GAN training. Lastly, we incorporate residual fast Fourier transforms with convolution (Res-FFT-Conv) blocks into the generator to utilize both frequency and spatial representations. This approach provides mixed receptive fields by using spatial (or local), spectral (or global), and residual connections. Our model was evaluated using various pixel- and feature-space metrics in two denoising tasks. Additionally, we conducted visual scoring with radiologists. The results indicate superior performance in both quantitative and qualitative measures compared to state-of-the-art denoising techniques.
PMID:39186436 | DOI:10.1109/TMI.2024.3449647
Synthesizing Real-Time Ultrasound Images of Muscle Based on Biomechanical Simulation and Conditional Diffusion Network
IEEE Trans Ultrason Ferroelectr Freq Control. 2024 Aug 26;PP. doi: 10.1109/TUFFC.2024.3445434. Online ahead of print.
ABSTRACT
Quantitative muscle function analysis based on the ultrasound imaging, has been used for various applications, particularly with recent development of deep learning methods. The nature of speckle noises in ultrasound images poses challenges to accurate and reliable data annotation for supervised learning algorithms. To obtain a large and reliable dataset without manual scanning and labelling, we proposed a synthesizing pipeline to provide synthetic ultrasound datasets of muscle movement with an accurate ground truth, allowing augmenting, training, and evaluating models for different tasks. Our pipeline contained biomechanical simulation using finite element method, an algorithm for reconstructing sparse fascicles, and a diffusion network for ultrasound image generation. With the adjustment of a few parameters, the proposed pipeline can generate a large dataset of real-time ultrasound images with diversity in morphology and pattern. With 3,030 ultrasound images generated, we qualitatively and quantitatively verified that the synthetic images closely matched with the in-vivo images. In addition, we applied the synthetic dataset into different tasks of muscle analysis. Compared to trained on an unaugmented dataset, model trained on synthetic one had better cross-dataset performance, which demonstrates the feasibility of synthesizing pipeline to augment model training and avoid over-fitting. The results of the regression task show potentials under the conditions that the number of datasets or the accurate label are limited. The proposed synthesizing pipeline can not only be used for muscle-related study, but for other similar study and model development, where sequential images are needed for training.
PMID:39186423 | DOI:10.1109/TUFFC.2024.3445434
Gaseous Object Detection
IEEE Trans Pattern Anal Mach Intell. 2024 Aug 26;PP. doi: 10.1109/TPAMI.2024.3449994. Online ahead of print.
ABSTRACT
Object detection, a fundamental and challenging problem in computer vision, has experienced rapid development due to the effectiveness of deep learning. The current objects to be detected are mostly rigid solid substances with apparent and distinct visual characteristics. In this paper, we endeavor on a scarcely explored task named Gaseous Object Detection (GOD), which is undertaken to explore whether the object detection techniques can be extended from solid substances to gaseous substances. Nevertheless, the gas exhibits significantly different visual characteristics: 1) saliency deficiency, 2) arbitrary and ever-changing shapes, 3) lack of distinct boundaries. To facilitate the study on this challenging task, we construct a GOD-Video dataset comprising 600 videos (141,017 frames) that cover various attributes with multiple types of gases. A comprehensive benchmark is established based on this dataset, allowing for a rigorous evaluation of frame-level and video-level detectors. Deduced from the Gaussian dispersion model, the physics-inspired Voxel Shift Field (VSF) is designed to model geometric irregularities and ever-changing shapes in potential 3D space. By integrating VSF into Faster RCNN, the VSF RCNN serves as a simple but strong baseline for gaseous object detection. Our work aims to attract further research into this valuable albeit challenging area.
PMID:39186417 | DOI:10.1109/TPAMI.2024.3449994
Fast and High-Performance Learned Image Compression With Improved Checkerboard Context Model, Deformable Residual Module, and Knowledge Distillation
IEEE Trans Image Process. 2024 Aug 26;PP. doi: 10.1109/TIP.2024.3445737. Online ahead of print.
ABSTRACT
Deep learning-based image compression has made great progresses recently. However, some leading schemes use serial context-adaptive entropy model to improve the rate-distortion (R-D) performance, which is very slow. In addition, the complexities of the encoding and decoding networks are quite high and not suitable for many practical applications. In this paper, we propose four techniques to balance the trade-off between the complexity and performance. We first introduce the deformable residual module to remove more redundancies in the input image, thereby enhancing compression performance. Second, we design an improved checkerboard context model with two separate distribution parameter estimation networks and different probability models, which enables parallel decoding without sacrificing the performance compared to the sequential context-adaptive model. Third, we develop a three-pass knowledge distillation scheme to retrain the decoder and entropy coding, and reduce the complexity of the core decoder network, which transfers both the final and intermediate results of the teacher network to the student network to improve its performance. Fourth, we introduce L1 regularization to make the numerical values of the latent representation more sparse, and we only encode non-zero channels in the encoding and decoding process to reduce the bit rate. This also reduces the encoding and decoding time. Experiments show that compared to the state-of-the-art learned image coding scheme, our method can be about 20 times faster in encoding and 70-90 times faster in decoding, and our R-D performance is also 2.3% higher. Our method achieves better rate-distortion performance than classical image codecs including H.266/VVC-intra (4:4:4) and some recent learned methods, as measured by both PSNR and MS-SSIM metrics on the Kodak and Tecnick-40 datasets.
PMID:39186412 | DOI:10.1109/TIP.2024.3445737
An Innovative Deep Learning Approach to Spinal Fracture Detection in CT Images
Ann Ital Chir. 2024;95(4):657-668. doi: 10.62713/aic.3498.
ABSTRACT
AIM: Spinal fractures, particularly vertebral compression fractures, pose a significant challenge in medical imaging due to their small-scale nature and blurred boundaries in Computed Tomography (CT) scans. However, advanced deep learning models, such as the integration of the You Only Look Once (YOLO) V7 model with Efficient Layer Aggregation Networks (ELAN) and Max-Pooling Convolution (MPConv) architectures, can substantially reduce the loss of small-scale information during computational processing, thus improving detection accuracy. The purpose of this study is to develop an innovative deep learning approach for detecting spinal fractures, particularly vertebral compression fractures, in CT images.
METHODS: We proposed a novel method to precisely identify spinal injury using the YOLO V7 model as a classifier. This model was enhanced by integrating ELAN and MPConv architectures, which were influenced by the Receptive Field Learning and Aggregation (RFLA) small object recognition framework. Standard normalization techniques were utilized to preprocess the CT images. The YOLO V7 model, integrated with ELAN and MPConv architectures, was trained using a dataset containing annotated spinal fractures. Additionally, to mitigate boundary ambiguities in compressive fractures, a Theoretical Receptive Field (TRF) based on Gaussian distribution and an Effective Receptive Field (ERF) were used to capture multi-scale features better. Furthermore, the Wasserstein distance was employed to optimize the model's learning process. A total of 240 CT images from patients diagnosed with spinal fractures were included in this study, sourced from Ningbo No.2 Hospital, ensuring a robust dataset for training the deep learning model.
RESULTS: Our method demonstrated superior performance over conventional object detection networks like YOLO V7 and YOLO V3. Specifically, with a dataset of 200 pathological images and 40 normal spinal images, our method achieved a 3% increase in accuracy compared to YOLO V7.
CONCLUSIONS: The proposed method offers an innovative and more effective approach for identifying vertebral compression fractures in CT scans. These promising findings suggest the method's potential for practical clinical applications, highlighting the significance of deep learning in enhancing patient care and treatment in medical imaging. Future research should incorporate cross-validation and independent validation and test sets to assess the model's robustness and generalizability. Additionally, exploring other deep learning models and methods could further enhance detection accuracy and reliability, contributing to the development of more effective diagnostic tools in medical imaging.
PMID:39186337 | DOI:10.62713/aic.3498