Deep learning
Reinforcement learning-based method for type B aortic dissection localization
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2024 Oct 25;41(5):878-885. doi: 10.7507/1001-5515.202309047.
ABSTRACT
In the segmentation of aortic dissection, there are issues such as low contrast between the aortic dissection and surrounding organs and vessels, significant differences in dissection morphology, and high background noise. To address these issues, this paper proposed a reinforcement learning-based method for type B aortic dissection localization. With the assistance of a two-stage segmentation model, the deep reinforcement learning was utilized to perform the first-stage aortic dissection localization task, ensuring the integrity of the localization target. In the second stage, the coarse segmentation results from the first stage were used as input to obtain refined segmentation results. To improve the recall rate of the first-stage segmentation results and include the segmentation target more completely in the localization results, this paper designed a reinforcement learning reward function based on the direction of recall changes. Additionally, the localization window was separated from the field of view window to reduce the occurrence of segmentation target loss. Unet, TransUnet, SwinUnet, and MT-Unet were selected as benchmark segmentation models. Through experiments, it was verified that the majority of the metrics in the two-stage segmentation process of this paper performed better than the benchmark results. Specifically, the Dice index improved by 1.34%, 0.89%, 27.66%, and 7.37% for each respective model. In conclusion, by incorporating the type B aortic dissection localization method proposed in this paper into the segmentation process, the overall segmentation accuracy is improved compared to the benchmark models. The improvement is particularly significant for models with poorer segmentation performance.
PMID:39462654 | DOI:10.7507/1001-5515.202309047
Deep learning-based whole-brain B(1) (+)-mapping at 7T
Magn Reson Med. 2024 Oct 27. doi: 10.1002/mrm.30359. Online ahead of print.
ABSTRACT
PURPOSE: This study investigates the feasibility of using complex-valued neural networks (NNs) to estimate quantitative transmit magnetic RF field (B1 +) maps from multi-slice localizer scans with different slice orientations in the human head at 7T, aiming to accelerate subject-specific B1 +-calibration using parallel transmission (pTx).
METHODS: Datasets containing channel-wise B1 +-maps and corresponding multi-slice localizers were acquired in axial, sagittal, and coronal orientation in 15 healthy subjects utilizing an eight-channel pTx transceiver head coil. Training included five-fold cross-validation for four network configurations: NN cx tra $$ {\mathrm{NN}}_{\mathrm{cx}}^{\mathrm{tra}} $$ used transversal, NN cx sag $$ {\mathrm{NN}}_{\mathrm{cx}}^{\mathrm{sag}} $$ sagittal, NN cx cor $$ {\mathrm{NN}}_{\mathrm{cx}}^{\mathrm{cor}} $$ coronal data, and NN cx all $$ {\mathrm{NN}}_{\mathrm{cx}}^{\mathrm{all}} $$ was trained on all slice orientations. The resulting maps were compared to B1 +-reference scans using different quality metrics. The proposed network was applied in-vivo at 7T in two unseen test subjects using dynamic kt-point pulses.
RESULTS: Predicted B1 +-maps demonstrated a high similarity with measured B1 +-maps across multiple orientations. The estimation matched the reference with a mean relative error in the magnitude of (2.70 ± 2.86)% and mean absolute phase difference of (6.70 ± 1.99)° for transversal, (1.82 ± 0.69)% and (4.25 ± 1.62)° for sagittal ( NN cx sag $$ {\mathrm{NN}}_{\mathrm{cx}}^{\mathrm{sag}} $$ ), as well as (1.33 ± 0.27)% and (2.66 ± 0.60)° for coronal slices ( NN cx cor $$ {\mathrm{NN}}_{\mathrm{cx}}^{\mathrm{cor}} $$ ) considering brain tissue. NN cx all $$ {\mathrm{NN}}_{\mathrm{cx}}^{\mathrm{all}} $$ trained on all orientations enables a robust prediction of B1 +-maps across different orientations. Achieving a homogenous excitation over the whole brain for an in-vivo application displayed the approach's feasibility.
CONCLUSION: This study demonstrates the feasibility of utilizing complex-valued NNs to estimate multi-slice B1 +-maps in different slice orientations from localizer scans in the human brain at 7T.
PMID:39462473 | DOI:10.1002/mrm.30359
Identification of lineage-specific cis-trans regulatory networks related to kiwifruit ripening initiation
Plant J. 2024 Oct 27. doi: 10.1111/tpj.17093. Online ahead of print.
ABSTRACT
Previous research on the ripening process of many fruit crop varieties typically involved analyses of the conserved genetic factors among species. However, even for seemingly identical ripening processes, the associated gene expression networks often evolved independently, as reflected by the diversity in the interactions between transcription factors (TFs) and the targeted cis-regulatory elements (CREs). In this study, explainable deep learning (DL) frameworks were used to predict expression patterns on the basis of CREs in promoter sequences. We initially screened potential lineage-specific CRE-TF interactions influencing the kiwifruit ripening process, which is triggered by ethylene, similar to the corresponding processes in other climacteric fruit crops. Some novel regulatory relationships affecting ethylene-induced fruit ripening were identified. Specifically, ABI5-like bZIP, G2-like, and MYB81-like TFs were revealed as trans-factors modulating the expression of representative ethylene signaling/biosynthesis-related genes (e.g., ACS1, ERT2, and ERF143). Transient reporter assays and DNA affinity purification sequencing (DAP-Seq) analyses validated these CRE-TF interactions and their regulatory relationships. A comparative analysis with co-expression networking suggested that this DL-based screening can identify regulatory networks independently of co-expression patterns. Our results highlight the utility of an explainable DL approach for identifying novel CRE-TF interactions. These imply that fruit crop species may have evolved lineage-specific fruit ripening-related cis-trans regulatory networks.
PMID:39462454 | DOI:10.1111/tpj.17093
WiTUnet: A U-shaped architecture integrating CNN and Transformer for improved feature alignment and local information fusion
Sci Rep. 2024 Oct 26;14(1):25525. doi: 10.1038/s41598-024-76886-w.
ABSTRACT
Low-dose computed tomography (LDCT) has emerged as the preferred technology for diagnostic medical imaging due to the potential health risks associated with X-ray radiation and conventional computed tomography (CT) techniques. While LDCT utilizes a lower radiation dose compared to standard CT, it results in increased image noise, which can impair the accuracy of diagnoses. To mitigate this issue, advanced deep learning-based LDCT denoising algorithms have been developed. These primarily utilize Convolutional Neural Networks (CNNs) or Transformer Networks and often employ the Unet architecture, which enhances image detail by integrating feature maps from the encoder and decoder via skip connections. However, existing methods focus excessively on the optimization of the encoder and decoder structures while overlooking potential enhancements to the Unet architecture itself. This oversight can be problematic due to significant differences in feature map characteristics between the encoder and decoder, where simple fusion strategies may hinder effective image reconstruction. In this paper, we introduce WiTUnet, a novel LDCT image denoising method that utilizes nested, dense skip pathway in place of traditional skip connections to improve feature integration. Additionally, to address the high computational demands of conventional Transformers on large images, WiTUnet incorporates a windowed Transformer structure that processes images in smaller, non-overlapping segments, significantly reducing computational load. Moreover, our approach includes a Local Image Perception Enhancement (LiPe) module within both the encoder and decoder to replace the standard multi-layer perceptron (MLP) in Transformers, thereby improving the capture and representation of local image features. Through extensive experimental comparisons, WiTUnet has demonstrated superior performance over existing methods in critical metrics such as Peak Signal-to-Noise Ratio (PSNR), Structural Similarity (SSIM), and Root Mean Square Error (RMSE), significantly enhancing noise removal and image quality. The code is available on github https://github.com/woldier/WiTUNet .
PMID:39462127 | DOI:10.1038/s41598-024-76886-w
Graph masked self-distillation learning for prediction of mutation impact on protein-protein interactions
Commun Biol. 2024 Oct 26;7(1):1400. doi: 10.1038/s42003-024-07066-9.
ABSTRACT
Assessing mutation impact on the binding affinity change (ΔΔG) of protein-protein interactions (PPIs) plays a crucial role in unraveling structural-functional intricacies of proteins and developing innovative protein designs. In this study, we present a deep learning framework, PIANO, for improved prediction of ΔΔG in PPIs. The PIANO framework leverages a graph masked self-distillation scheme for protein structural geometric representation pre-training, which effectively captures the structural context representations surrounding mutation sites, and makes predictions using a multi-branch network consisting of multiple encoders for amino acids, atoms, and protein sequences. Extensive experiments demonstrated its superior prediction performance and the capability of pre-trained encoder in capturing meaningful representations. Compared to previous methods, PIANO can be widely applied on both holo complex structures and apo monomer structures. Moreover, we illustrated the practical applicability of PIANO in highlighting pathogenic mutations and crucial proteins, and distinguishing de novo mutations in disease cases and controls in PPI systems. Overall, PIANO offers a powerful deep learning tool, which may provide valuable insights into the study of drug design, therapeutic intervention, and protein engineering.
PMID:39462102 | DOI:10.1038/s42003-024-07066-9
A hybrid container throughput forecasting approach using bi-directional hinterland data of port
Sci Rep. 2024 Oct 26;14(1):25502. doi: 10.1038/s41598-024-77376-9.
ABSTRACT
Accurate forecasting of port container throughput plays a crucial role in optimising port operations, resource allocation, supply chain management, etc. However, existing studies only focus on the impact of port hinterland economic development on container throughput, ignoring the impact of port foreland. This study proposed a container throughput forecasting model based on deep learning, which considers the impact of port hinterland and foreland on container throughput. Real-world experimental results showed that the proposed model with multiple data sources outperformed other forecasting methods, achieving significantly higher accuracy. The implications of this study are significant for port authorities, logistics companies, and policymakers.
PMID:39462082 | DOI:10.1038/s41598-024-77376-9
End-to-end multiple object tracking in high-resolution optical sensors of drones with transformer models
Sci Rep. 2024 Oct 26;14(1):25543. doi: 10.1038/s41598-024-75934-9.
ABSTRACT
Drone aerial imaging has become increasingly important across numerous fields as drone optical sensor technology continues to advance. One critical challenge in this domain is achieving both accurate and efficient multi-object tracking. Traditional deep learning methods often separate object identification from tracking, leading to increased complexity and potential performance degradation. Conventional approaches rely heavily on manual feature engineering and intricate algorithms, which can further limit efficiency. To overcome these limitations, we propose a novel Transformer-based end-to-end multi-object tracking framework. This innovative method leverages self-attention mechanisms to capture complex inter-object relationships, seamlessly integrating object detection and tracking into a unified process. By utilizing end-to-end training, our approach simplifies the tracking pipeline, leading to significant performance improvements. A key innovation in our system is the introduction of a trajectory detection label matching technique. This technique assigns labels based on a comprehensive assessment of object appearance, spatial characteristics, and Gaussian features, ensuring more precise and logical label assignments. Additionally, we incorporate cross-frame self-attention mechanisms to extract long-term object properties, providing robust information for stable and consistent tracking. We further enhance tracking performance through a newly developed self-characteristics module, which extracts semantic features from trajectory information across both current and previous frames. This module ensures that the long-term interaction modules maintain semantic consistency, allowing for more accurate and continuous tracking over time. The refined data and stored trajectories are then used as input for subsequent frame processing, creating a feedback loop that sustains tracking accuracy. Extensive experiments conducted on the VisDrone and UAVDT datasets demonstrate the superior performance of our approach in drone-based multi-object tracking.
PMID:39461992 | DOI:10.1038/s41598-024-75934-9
Detecting command injection attacks in web applications based on novel deep learning methods
Sci Rep. 2024 Oct 26;14(1):25487. doi: 10.1038/s41598-024-74350-3.
ABSTRACT
Web command injection attacks pose significant security threats to web applications, leading to potential server information leakage or severe server disruption. Traditional detection methods struggle with the increasing complexity and obfuscation of these attacks, resulting in poor identification of malicious code, complicated feature extraction processes, and low detection efficiency. To address these challenges, a novel detection model, the Convolutional Channel-BiLSTM Attention (CCBA) model, is proposed, leveraging deep learning techniques to enhance the identification of web command injection attacks. The model utilizes dual CNN convolutional channels for comprehensive feature extraction and employs a BiLSTM network for bidirectional recognition of temporal features. An attention mechanism is also incorporated to assign weights to critical features, improving the model's detection performance. Experimental results demonstrate that the CCBA model achieves 99.3% accuracy and 98.2% recall on a real-world dataset. To validate the robustness and generalization of the model, tests were conducted on two widely recognized public cybersecurity datasets, consistently achieving over 98% accuracy. Compared to existing methods, the proposed model offers a more effective solution for identifying web command injection attacks.
PMID:39461962 | DOI:10.1038/s41598-024-74350-3
Advancing EEG prediction with deep learning and uncertainty estimation
Brain Inform. 2024 Oct 26;11(1):27. doi: 10.1186/s40708-024-00239-6.
ABSTRACT
Deep Learning (DL) has the potential to enhance patient outcomes in healthcare by implementing proficient systems for disease detection and diagnosis. However, the complexity and lack of interpretability impede their widespread adoption in critical high-stakes predictions in healthcare. Incorporating uncertainty estimations in DL systems can increase trustworthiness, providing valuable insights into the model's confidence and improving the explanation of predictions. Additionally, introducing explainability measures, recognized and embraced by healthcare experts, can help address this challenge. In this study, we investigate DL models' ability to predict sex directly from electroencephalography (EEG) data. While sex prediction have limited direct clinical application, its binary nature makes it a valuable benchmark for optimizing deep learning techniques in EEG data analysis. Furthermore, we explore the use of DL ensembles to improve performance over single models and as an approach to increase interpretability and performance through uncertainty estimation. Lastly, we use a data-driven approach to evaluate the relationship between frequency bands and sex prediction, offering insights into their relative importance. InceptionNetwork, a single DL model, achieved 90.7% accuracy and an AUC of 0.947, and the best-performing ensemble, combining variations of InceptionNetwork and EEGNet, achieved 91.1% accuracy in predicting sex from EEG data using five-fold cross-validation. Uncertainty estimation through deep ensembles led to increased prediction performance, and the models were able to classify sex in all frequency bands, indicating sex-specific features across all bands.
PMID:39461914 | DOI:10.1186/s40708-024-00239-6
Patient-specific deep learning tracking framework for real-time 2D target localization in MRI-guided radiotherapy
Int J Radiat Oncol Biol Phys. 2024 Oct 24:S0360-3016(24)03508-9. doi: 10.1016/j.ijrobp.2024.10.021. Online ahead of print.
ABSTRACT
PURPOSE: We propose a tumor tracking framework for 2D cine MRI based on a pair of deep learning (DL) models relying on patient-specific (PS) training.
METHODS AND MATERIALS: The chosen DL models are: 1) an image registration transformer and 2) an auto-segmentation convolutional neural network (CNN). We collected over 1,400,000 cine MRI frames from 219 patients treated on a 0.35 T MRI-linac plus 7,500 frames from additional 35 patients which were manually labelled and subdivided into fine-tuning, validation, and testing sets. The transformer was first trained on the unlabeled data (without segmentations). We then continued training (with segmentations) either on the fine-tuning set or for PS models based on eight randomly selected frames from the first 5 s of each patient's cine MRI. The PS auto-segmentation CNN was trained from scratch with the same eight frames for each patient, without pre-training. Furthermore, we implemented B-spline image registration as a conventional model, as well as different baselines. Output segmentations of all models were compared on the testing set using the Dice similarity coefficient (DSC), the 50% and 95% Hausdorff distance (HD50%/HD95%), and the root-mean-square-error of the target centroid in superior-inferior direction (RMSESI).
RESULTS: The PS transformer and CNN significantly outperformed all other models, achieving a median (inter-quartile range) DSC of 0.92 (0.03)/0.90 (0.04), HD50% of 1.0 (0.1)/1.0 (0.4) mm, HD95% of 3.1 (1.9)/3.8 (2.0) mm and RMSESI of 0.7 (0.4)/0.9 (1.0) mm on the testing set. Their inference time was about 36/8 ms per frame and PS fine-tuning required 3 min for labelling and 8/4 min for training. The transformer was better than the CNN in 9/12 patients, the CNN better in 1/12 patients and the two PS models achieved the same performance on the remaining 2/12 testing patients.
CONCLUSION: For targets in the thorax, abdomen and pelvis, we found two PS DL models to provide accurate real-time target localization during MRI-guided radiotherapy.
PMID:39461599 | DOI:10.1016/j.ijrobp.2024.10.021
Detection of carotid plaques on panoramic radiographs using deep learning
J Dent. 2024 Oct 24:105432. doi: 10.1016/j.jdent.2024.105432. Online ahead of print.
ABSTRACT
OBJECTIVES: Panoramic radiographs (PRs) can reveal an incidental finding of atherosclerosis, or carotid artery calcification (CAC), in 3-15% of examined patients. However, limited training in identification of such calcifications among dental professionals results in missed diagnoses. This study aimed to detect CAC on PRs using an artificial intelligence (AI) model based on a vision transformer.
METHODS: 6,404 PRs were obtained from one hospital and screened for the presence of CAC based on electronic medical records. CAC was manually annotated with bounding boxes by an oral radiologist and reviewed and revised by three experienced clinicians to achieve consensus. An AI approach based on Faster R-CNN and Swin Transformer was trained and evaluated based on 185 PRs with CAC and 185 PRs without CAC. Reported and replicated diagnostic performances of published AI approaches based on convolutional neural networks (CNNs) were used for comparison. Quantitative evaluation of the performance of the models included precision, F1-score, recall, area-under-the-curve (AUC), and average precision (AP).
RESULTS: The proposed method based on Faster R-CNN and Swin Transformer achieved a precision of 0.895, recall of 0.881, F1-score of 0.888, AUC of 0.950, and AP of 0.942, surpassing models based on a CNN.
CONCLUSIONS: The detection performance of this newly developed and validated model was improved compared to previously reported models.
CLINICAL SIGNIFICANCE: Integrating AI models into dental imaging to assist dental professionals in the detection of CAC on PRs has the potential to significantly enhance the early detection of carotid artery atherosclerosis and its clinical management.
PMID:39461583 | DOI:10.1016/j.jdent.2024.105432
Deep learning revealed the distribution and evolution patterns for invertible promoters across bacterial lineages
Nucleic Acids Res. 2024 Oct 26:gkae966. doi: 10.1093/nar/gkae966. Online ahead of print.
ABSTRACT
Invertible promoters (invertons) are crucial regulatory elements in bacteria, facilitating gene expression changes under stress. Despite their importance, their prevalence and the range of regulated gene functions are largely unknown. We introduced DeepInverton, a deep learning model that identifies invertons across a broad phylogenetic spectrum without using sequencing reads. By analyzing 68 733 bacterial genomes and 9382 metagenomes, we have uncovered over 200 000 nonredundant invertons and have also highlighted their abundance in pathogens. Additionally, we identified a post-Cambrian Explosion increase of invertons, paralleling species diversification. Furthermore, we revealed that invertons regulate diverse functions, including antimicrobial resistance and biofilm formation, underscoring their role in environmental adaptation. Notably, the majority of inverton identifications by DeepInverton have been confirmed by the in vitro experiments. The comprehensive inverton profiles have deepened our understanding of invertons at pan-genome and pan-metagenome scales, enabling a broad spectrum of applications in microbial ecology and synthetic biology.
PMID:39460615 | DOI:10.1093/nar/gkae966
The differences in essential facial areas for impressions between humans and deep learning models: An eye-tracking and explainable AI approach
Br J Psychol. 2024 Oct 25. doi: 10.1111/bjop.12744. Online ahead of print.
ABSTRACT
This study explored the facial impressions of attractiveness, dominance and sexual dimorphism using experimental and computational methods. In Study 1, we generated face images with manipulated morphological features using geometric morphometrics. In Study 2, we conducted eye tracking and impression evaluation experiments using these images to examine how facial features influence impression evaluations and explored differences based on the sex of the face images and participants. In Study 3, we employed deep learning methods, specifically using gradient-weighted class activation mapping (Grad-CAM), an explainable artificial intelligence (AI) technique, to extract important features for each impression using the face images and impression evaluation results from Studies 1 and 2. The findings revealed that eye-tracking and deep learning use different features as cues. In the eye-tracking experiments, attention was focused on features such as the eyes, nose and mouth, whereas the deep learning analysis highlighted broader features, including eyebrows and superciliary arches. The computational approach using explainable AI suggests that the determinants of facial impressions can be extracted independently of visual attention.
PMID:39460393 | DOI:10.1111/bjop.12744
Blind Recognition of Frame Synchronization Based on Deep Learning
Sensors (Basel). 2024 Oct 21;24(20):6767. doi: 10.3390/s24206767.
ABSTRACT
In this paper, a deep-learning-based frame synchronization blind recognition algorithm is proposed to improve the detection performance in non-cooperative communication systems. Current methods face challenges in accurately detecting frames under high bit error rates (BER). Our approach begins with flat-top interpolation of binary data and converting it into a series of grayscale images, enabling the application of image processing techniques. By incorporating a scaling factor, we generate RGB images. Based on the matching radius, frame length, and frame synchronization code, RGB images with distinct stripe features are classified as positive samples for each category, while the remaining images are classified as negative samples. Finally, the neural network is trained on these sets to classify test data effectively. Simulation results demonstrate that the proposed algorithm achieves a 100% probability in frame recognition when BER is below 0.2. Even with a BER of 0.25, the recognition probability remains above 90%, which exhibits a performance improvement of over 60% compared with traditional algorithms. This work addresses the shortcomings of existing methods under high error conditions, and the idea of converting sequences into RGB images also provides a reliable solution for frame synchronization in challenging communication environments.
PMID:39460248 | DOI:10.3390/s24206767
Spatial Resolution Enhancement Framework Using Convolutional Attention-Based Token Mixer
Sensors (Basel). 2024 Oct 21;24(20):6754. doi: 10.3390/s24206754.
ABSTRACT
Spatial resolution enhancement in remote sensing data aims to augment the level of detail and accuracy in images captured by satellite sensors. We proposed a novel spatial resolution enhancement framework using the convolutional attention-based token mixer method. This approach leveraged spatial context and semantic information to improve the spatial resolution of images. This method used the multi-head convolutional attention block and sub-pixel convolution to extract spatial and spectral information and fused them using the same technique. The multi-head convolutional attention block can effectively utilize the local information of spatial and spectral dimensions. The method was tested on two kinds of data types, which were the visual-thermal dataset and the visual-hyperspectral dataset. Our method was also compared with the state-of-the-art methods, including traditional methods and deep learning methods. The experiment results showed that the method was effective and outperformed state-of-the-art methods in overall, spatial, and spectral accuracies.
PMID:39460237 | DOI:10.3390/s24206754
Research on Road Internal Disease Identification Algorithm Based on Attention Fusion Mechanisms
Sensors (Basel). 2024 Oct 21;24(20):6757. doi: 10.3390/s24206757.
ABSTRACT
Internal disease in asphalt pavement is a crucial indicator of pavement health and serves as a vital basis for maintenance and rehabilitation decisions. It is closely related to the optimization and allocation of funds by highway maintenance management departments. Accurate and rapid identification of internal pavement diseases is essential for improving overall pavement quality. This study aimed to identify internal pavement diseases using deep learning algorithms, thereby improving the efficiency of determining internal pavement diseases. In this work, a multi-view recognition algorithm model based on deep learning is proposed, with attention fusion mechanisms embedded both between channels and between views. By comparing and analyzing the training and recognition results of different neural networks, it was found that the multi-view recognition algorithm model based on attention fusion demonstrates the best performance in identifying internal pavement diseases.
PMID:39460235 | DOI:10.3390/s24206757
Concatenated CNN-Based Pneumonia Detection Using a Fuzzy-Enhanced Dataset
Sensors (Basel). 2024 Oct 21;24(20):6750. doi: 10.3390/s24206750.
ABSTRACT
Pneumonia is a form of acute respiratory infection affecting the lungs. Symptoms of viral and bacterial pneumonia are similar. Rapid diagnosis of the disease is difficult, since polymerase chain reaction-based methods, which have the greatest reliability, provide results in a few hours, while ensuring high requirements for compliance with the analysis technology and professionalism of the personnel. This study proposed a Concatenated CNN model for pneumonia detection combined with a fuzzy logic-based image improvement method. The fuzzy logic-based image enhancement process is based on a new fuzzification refinement algorithm, with significantly improved image quality and feature extraction for the CCNN model. Four datasets, original and upgraded images utilizing fuzzy entropy, standard deviation, and histogram equalization, were utilized to train the algorithm. The CCNN's performance was demonstrated to be significantly improved by the upgraded datasets, with the fuzzy entropy-added dataset producing the best results. The suggested CCNN attained remarkable classification metrics, including 98.9% accuracy, 99.3% precision, 99.8% F1-score, and 99.6% recall. Experimental comparisons showed that the fuzzy logic-based enhancement worked significantly better than traditional image enhancement methods, resulting in higher diagnostic precision. This study demonstrates how well deep learning models and sophisticated image enhancement techniques work together to analyze medical images.
PMID:39460230 | DOI:10.3390/s24206750
Vision-Based Real-Time Bolt Loosening Detection by Identifying Anti-Loosening Lines
Sensors (Basel). 2024 Oct 20;24(20):6747. doi: 10.3390/s24206747.
ABSTRACT
Bolt loosening detection is crucial for ensuring the safe operation of equipment. This paper presents a vision-based real-time detection method that identifies bolt loosening by recognizing anti-loosening line markers at bolt connections. The method employs the YOLOv10-S deep learning model for high-precision, real-time bolt detection, followed by a two-step Fast-SCNN image segmentation technique. This approach effectively isolates the bolt and nut regions, enabling accurate extraction of the anti-loosening line markers. Key intersection points are calculated using ellipse and line fitting techniques, and the loosening angle is determined through spatial projection transformation. The experimental results demonstrate that, for high-resolution images of 2048 × 1024 pixels, the proposed method achieves an average angle detection error of 1.145° with a detection speed of 32 FPS. Compared to traditional methods and other vision-based approaches, this method offers non-contact measurement, real-time detection capabilities, reduced detection error, and general adaptability to various bolt types and configurations, indicating significant application potential.
PMID:39460227 | DOI:10.3390/s24206747
Vehicle Localization Method in Complex SAR Images Based on Feature Reconstruction and Aggregation
Sensors (Basel). 2024 Oct 20;24(20):6746. doi: 10.3390/s24206746.
ABSTRACT
Due to the small size of vehicle targets, complex background environments, and the discrete scattering characteristics of high-resolution synthetic aperture radar (SAR) images, existing deep learning networks face challenges in extracting high-quality vehicle features from SAR images, which impacts vehicle localization accuracy. To address this issue, this paper proposes a vehicle localization method for SAR images based on feature reconstruction and aggregation with rotating boxes. Specifically, our method first employs a backbone network that integrates the space-channel reconfiguration module (SCRM), which contains spatial and channel attention mechanisms specifically designed for SAR images to extract features. The network then connects a progressive cross-fusion mechanism (PCFM) that effectively combines multi-view features from different feature layers, enhancing the information content of feature maps and improving feature representation quality. Finally, these features containing a large receptive field region and enhanced rich contextual information are input into a rotating box vehicle detection head, which effectively reduces false alarms and missed detections. Experiments on a complex scene SAR image vehicle dataset demonstrate that the proposed method significantly improves vehicle localization accuracy. Our method achieves state-of-the-art performance, which demonstrates the superiority and effectiveness of the proposed method.
PMID:39460226 | DOI:10.3390/s24206746
Wearable Biosensor Smart Glasses Based on Augmented Reality and Eye Tracking
Sensors (Basel). 2024 Oct 20;24(20):6740. doi: 10.3390/s24206740.
ABSTRACT
With the rapid development of wearable biosensor technology, the combination of head-mounted displays and augmented reality (AR) technology has shown great potential for health monitoring and biomedical diagnosis applications. However, further optimizing its performance and improving data interaction accuracy remain crucial issues that must be addressed. In this study, we develop smart glasses based on augmented reality and eye tracking technology. Through real-time information interaction with the server, the smart glasses realize accurate scene perception and analysis of the user's intention and combine with mixed-reality display technology to provide dynamic and real-time intelligent interaction services. A multi-level hardware architecture and optimized data processing process are adopted during the research process to enhance the system's real-time accuracy. Meanwhile, combining the deep learning method with the geometric model significantly improves the system's ability to perceive user behavior and environmental information in complex environments. The experimental results show that when the distance between the subject and the display is 1 m, the eye tracking accuracy of the smart glasses can reach 1.0° with an error of no more than ±0.1°. This study demonstrates that the effective integration of AR and eye tracking technology dramatically improves the functional performance of smart glasses in multiple scenarios. Future research will further optimize smart glasses' algorithms and hardware performance, enhance their application potential in daily health monitoring and medical diagnosis, and provide more possibilities for the innovative development of wearable devices in medical and health management.
PMID:39460220 | DOI:10.3390/s24206740