Deep learning

Accurate predictions on small data with a tabular foundation model

Wed, 2025-01-08 06:00

Nature. 2025 Jan;637(8045):319-326. doi: 10.1038/s41586-024-08328-6. Epub 2025 Jan 8.

ABSTRACT

Tabular data, spreadsheets organized in rows and columns, are ubiquitous across scientific fields, from biomedicine to particle physics to economics and climate science1,2. The fundamental prediction task of filling in missing values of a label column based on the rest of the columns is essential for various applications as diverse as biomedical risk models, drug discovery and materials science. Although deep learning has revolutionized learning from raw data and led to numerous high-profile success stories3-5, gradient-boosted decision trees6-9 have dominated tabular data for the past 20 years. Here we present the Tabular Prior-data Fitted Network (TabPFN), a tabular foundation model that outperforms all previous methods on datasets with up to 10,000 samples by a wide margin, using substantially less training time. In 2.8 s, TabPFN outperforms an ensemble of the strongest baselines tuned for 4 h in a classification setting. As a generative transformer-based foundation model, this model also allows fine-tuning, data generation, density estimation and learning reusable embeddings. TabPFN is a learning algorithm that is itself learned across millions of synthetic datasets, demonstrating the power of this approach for algorithm development. By improving modelling abilities across diverse fields, TabPFN has the potential to accelerate scientific discovery and enhance important decision-making in various domains.

PMID:39780007 | DOI:10.1038/s41586-024-08328-6

Categories: Literature Watch

Computational microscopy with coherent diffractive imaging and ptychography

Wed, 2025-01-08 06:00

Nature. 2025 Jan;637(8045):281-295. doi: 10.1038/s41586-024-08278-z. Epub 2025 Jan 8.

ABSTRACT

Microscopy and crystallography are two essential experimental methodologies for advancing modern science. They complement one another, with microscopy typically relying on lenses to image the local structures of samples, and crystallography using diffraction to determine the global atomic structure of crystals. Over the past two decades, computational microscopy, encompassing coherent diffractive imaging (CDI) and ptychography, has advanced rapidly, unifying microscopy and crystallography to overcome their limitations. Here, I review the innovative developments in CDI and ptychography, which achieve exceptional imaging capabilities across nine orders of magnitude in length scales, from resolving atomic structures in materials at sub-ångstrom resolution to quantitative phase imaging of centimetre-sized tissues, using the same principle and similar computational algorithms. These methods have been applied to determine the 3D atomic structures of crystal defects and amorphous materials, visualize oxygen vacancies in high-temperature superconductors and capture ultrafast dynamics. They have also been used for nanoscale imaging of magnetic, quantum and energy materials, nanomaterials, integrated circuits and biological specimens. By harnessing fourth-generation synchrotron radiation, X-ray-free electron lasers, high-harmonic generation, electron microscopes, optical microscopes, cutting-edge detectors and deep learning, CDI and ptychography are poised to make even greater contributions to multidisciplinary sciences in the years to come.

PMID:39780004 | DOI:10.1038/s41586-024-08278-z

Categories: Literature Watch

Attention-based deep learning for accurate cell image analysis

Wed, 2025-01-08 06:00

Sci Rep. 2025 Jan 8;15(1):1265. doi: 10.1038/s41598-025-85608-9.

ABSTRACT

High-content analysis (HCA) holds enormous potential for drug discovery and research, but widely used methods can be cumbersome and yield inaccurate results. Noisy and redundant signals in cell images impede accurate deep learning-based image analysis. To address these issues, we introduce X-Profiler, a novel HCA method that combines cellular experiments, image processing, and deep learning modeling. X-Profiler combines the convolutional neural network and Transformer to encode high-content images, effectively filtering out noisy signals and precisely characterizing cell phenotypes. In comparative tests on drug-induced cardiotoxicity, mitochondrial toxicity classification, and compound classification, X-Profiler outperformed both DeepProfiler and CellProfiler, as two highly recognized and representative methods in this field. Our results demonstrate the utility and versatility of X-Profiler, and we anticipate its wide application in HCA for advancing drug development and disease research.

PMID:39779905 | DOI:10.1038/s41598-025-85608-9

Categories: Literature Watch

A hybrid machine learning approach for the personalized prognostication of aggressive skin cancers

Wed, 2025-01-08 06:00

NPJ Digit Med. 2025 Jan 8;8(1):15. doi: 10.1038/s41746-024-01329-9.

ABSTRACT

Accurate prognostication guides optimal clinical management in skin cancer. Merkel cell carcinoma (MCC) is the most aggressive form of skin cancer that often presents in advanced stages and is associated with poor survival rates. There are no personalized prognostic tools in use in MCC. We employed explainability analysis to reveal new insights into mortality risk factors for this highly aggressive cancer. We then combined deep learning feature selection with a modified XGBoost framework, to develop a web-based prognostic tool for MCC termed 'DeepMerkel'. DeepMerkel can make accurate personalised, time-dependent survival predictions for MCC from readily available clinical information. It demonstrated generalizability through high predictive performance in an international clinical cohort, out-performing current population-based prognostic staging systems. MCC and DeepMerkel provide the exemplar model of personalised machine learning prognostic tools in aggressive skin cancers.

PMID:39779875 | DOI:10.1038/s41746-024-01329-9

Categories: Literature Watch

A hybrid CNN model for classification of motor tasks obtained from hybrid BCI system

Wed, 2025-01-08 06:00

Sci Rep. 2025 Jan 8;15(1):1360. doi: 10.1038/s41598-024-84883-2.

ABSTRACT

The Hybrid-Brain Computer Interface (BCI) has shown improved performance, especially in classifying multi-class data. Two non-invasive BCI modules are combined to achieve an improved classification which are Electroencephalogram (EEG) and functional Near Infra-red Spectroscopy (fNIRS). Classifying contralateral and ipsilateral motor movements is found challenging among the other mental activity signals. The current work focuses on the performance of deep learning methods like - Convolutional Neural Networks (CNN) and Bidirectional Long-Short term memory (Bi-LSTM) in classifying a four-class motor execution of Right Hand, Left Hand, Right Arm and Left Arm taken from the CORE dataset. The model performance was evaluated using metrics such as Accuracy, F1 - score, Precision, Recall, AUC and ROC curve. The CNN and Hybrid CNN models have resulted in 98.3% and 99% accuracy respectively.

PMID:39779796 | DOI:10.1038/s41598-024-84883-2

Categories: Literature Watch

Nf-Root: A Best-Practice Pipeline for Deep-Learning-Based Analysis of Apoplastic pH in Microscopy Images of Developmental Zones in Plant Root Tissue

Wed, 2025-01-08 06:00

Quant Plant Biol. 2024 Dec 23;5:e12. doi: 10.1017/qpb.2024.11. eCollection 2024.

ABSTRACT

Hormonal mechanisms associated with cell elongation play a vital role in the development and growth of plants. Here, we report Nextflow-root (nf-root), a novel best-practice pipeline for deep-learning-based analysis of fluorescence microscopy images of plant root tissue from A. thaliana. This bioinformatics pipeline performs automatic identification of developmental zones in root tissue images. This also includes apoplastic pH measurements, which is useful for modeling hormone signaling and cell physiological responses. We show that this nf-core standard-based pipeline successfully automates tissue zone segmentation and is both high-throughput and highly reproducible. In short, a deep-learning module deploys deterministically trained convolutional neural network models and augments the segmentation predictions with measures of prediction uncertainty and model interpretability, while aiming to facilitate result interpretation and verification by experienced plant biologists. We observed a high statistical similarity between the manually generated results and the output of the nf-root.

PMID:39777028 | PMC:PMC11706687 | DOI:10.1017/qpb.2024.11

Categories: Literature Watch

Assessment of human emotional reactions to visual stimuli "deep-dreamed" by artificial neural networks

Wed, 2025-01-08 06:00

Front Psychol. 2024 Dec 24;15:1509392. doi: 10.3389/fpsyg.2024.1509392. eCollection 2024.

ABSTRACT

INTRODUCTION: While the fact that visual stimuli synthesized by Artificial Neural Networks (ANN) may evoke emotional reactions is documented, the precise mechanisms that connect the strength and type of such reactions with the ways of how ANNs are used to synthesize visual stimuli are yet to be discovered. Understanding these mechanisms allows for designing methods that synthesize images attenuating or enhancing selected emotional states, which may provide unobtrusive and widely-applicable treatment of mental dysfunctions and disorders.

METHODS: The Convolutional Neural Network (CNN), a type of ANN used in computer vision tasks which models the ways humans solve visual tasks, was applied to synthesize ("dream" or "hallucinate") images with no semantic content to maximize activations of neurons in precisely-selected layers in the CNN. The evoked emotions of 150 human subjects observing these images were self-reported on a two-dimensional scale (arousal and valence) utilizing self-assessment manikin (SAM) figures. Correlations between arousal and valence values and image visual properties (e.g., color, brightness, clutter feature congestion, and clutter sub-band entropy) as well as the position of the CNN's layers stimulated to obtain a given image were calculated.

RESULTS: Synthesized images that maximized activations of some of the CNN layers led to significantly higher or lower arousal and valence levels compared to average subject's reactions. Multiple linear regression analysis found that a small set of selected image global visual features (hue, feature congestion, and sub-band entropy) are significant predictors of the measured arousal, however no statistically significant dependencies were found between image global visual features and the measured valence.

CONCLUSION: This study demonstrates that the specific method of synthesizing images by maximizing small and precisely-selected parts of the CNN used in this work may lead to synthesis of visual stimuli that enhance or attenuate emotional reactions. This method paves the way for developing tools that stimulate, in a non-invasive way, to support wellbeing (manage stress, enhance mood) and to assist patients with certain mental conditions by complementing traditional methods of therapeutic interventions.

PMID:39776961 | PMC:PMC11703666 | DOI:10.3389/fpsyg.2024.1509392

Categories: Literature Watch

Decorrelative network architecture for robust electrocardiogram classification

Wed, 2025-01-08 06:00

Patterns (N Y). 2024 Dec 9;5(12):101116. doi: 10.1016/j.patter.2024.101116. eCollection 2024 Dec 13.

ABSTRACT

To achieve adequate trust in patient-critical medical tasks, artificial intelligence must be able to recognize instances where they cannot operate confidently. Ensemble methods are deployed to estimate uncertainty, but models in an ensemble often share the same vulnerabilities to adversarial attacks. We propose an ensemble approach based on feature decorrelation and Fourier partitioning for teaching networks diverse features, reducing the chance of perturbation-based fooling. We test our approach against white-box attacks in single- and multi-channel electrocardiogram classification and adapt adversarial training and DVERGE into an ensemble framework for comparison. Our results indicate that the combination of decorrelation and Fourier partitioning maintains performance on unperturbed data while demonstrating superior uncertainty estimation on projected gradient descent and smooth adversarial attacks of various magnitudes. Furthermore, our approach does not require expensive optimization with adversarial samples during training. These methods can be applied to other tasks for more robust models.

PMID:39776851 | PMC:PMC11701855 | DOI:10.1016/j.patter.2024.101116

Categories: Literature Watch

Deep Learning for Discrimination of Early Spinal Tuberculosis from Acute Osteoporotic Vertebral Fracture on CT

Wed, 2025-01-08 06:00

Infect Drug Resist. 2025 Jan 3;18:31-42. doi: 10.2147/IDR.S482584. eCollection 2025.

ABSTRACT

BACKGROUND: Early differentiation between spinal tuberculosis (STB) and acute osteoporotic vertebral compression fracture (OVCF) is crucial for determining the appropriate clinical management and treatment pathway, thereby significantly impacting patient outcomes.

OBJECTIVE: To evaluate the efficacy of deep learning (DL) models using reconstructed sagittal CT images in the differentiation of early STB from acute OVCF, with the aim of enhancing diagnostic precision, reducing reliance on MRI and biopsies, and minimizing the risks of misdiagnosis.

METHODS: Data were collected from 373 patients, with 302 patients recruited from a university-affiliated hospital serving as the training and internal validation sets, and an additional 71 patients from another university-affiliated hospital serving as the external validation set. MVITV2, Efficient-Net-B5, ResNet101, and ResNet50 were used as the backbone networks for DL model development, training, and validation. Model evaluation was based on accuracy, precision, sensitivity, F1 score, and area under the curve (AUC). The performance of the DL models was compared with the diagnostic accuracy of two spine surgeons who performed a blinded review.

RESULTS: The MVITV2 model outperformed other architectures in the internal validation set, achieving accuracy of 98.98%, precision of 100%, sensitivity of 97.97%, F1 score of 98.98%, and AUC of 0.997. The performance of the DL models notably exceeded that of the spine surgeons, who achieved accuracy rates of 77.38% and 93.56%. The external validation confirmed the models' robustness and generalizability.

CONCLUSION: The DL models significantly improved the differentiation between STB and OVCF, surpassing experienced spine surgeons in diagnostic accuracy. These models offer a promising alternative to traditional imaging and invasive procedures, potentially promoting early and accurate diagnosis, reducing healthcare costs, and improving patient outcomes. The findings underscore the potential of artificial intelligence for revolutionizing spinal disease diagnostics, and have substantial clinical implications.

PMID:39776757 | PMC:PMC11706012 | DOI:10.2147/IDR.S482584

Categories: Literature Watch

Adaptive Treatment of Metastatic Prostate Cancer Using Generative Artificial Intelligence

Wed, 2025-01-08 06:00

Clin Med Insights Oncol. 2025 Jan 6;19:11795549241311408. doi: 10.1177/11795549241311408. eCollection 2025.

ABSTRACT

Despite the expanding therapeutic options available to cancer patients, therapeutic resistance, disease recurrence, and metastasis persist as hallmark challenges in the treatment of cancer. The rise to prominence of generative artificial intelligence (GenAI) in many realms of human activities is compelling the consideration of its capabilities as a potential lever to advance the development of effective cancer treatments. This article presents a hypothetical case study on the application of generative pre-trained transformers (GPTs) to the treatment of metastatic prostate cancer (mPC). The case explores the design of GPT-supported adaptive intermittent therapy for mPC. Testosterone and prostate-specific antigen (PSA) are assumed to be repeatedly monitored while treatment may involve a combination of androgen deprivation therapy (ADT), androgen receptor-signalling inhibitors (ARSI), chemotherapy, and radiotherapy. The analysis covers various questions relevant to the configuration, training, and inferencing of GPTs for the case of mPC treatment with a particular attention to risk mitigation regarding the hallucination problem and its implications to clinical integration of GenAI technologies. The case study provides elements of an actionable pathway to the realization of GenAI-assisted adaptive treatment of metastatic prostate cancer. As such, the study is expected to help facilitate the design of clinical trials of GenAI-supported cancer treatments.

PMID:39776668 | PMC:PMC11701910 | DOI:10.1177/11795549241311408

Categories: Literature Watch

Predicting the risk of type 2 diabetes mellitus (T2DM) emergence in 5 years using mammography images: a comparison study between radiomics and deep learning algorithm

Wed, 2025-01-08 06:00

J Med Imaging (Bellingham). 2025 Jan;12(1):014501. doi: 10.1117/1.JMI.12.1.014501. Epub 2025 Jan 6.

ABSTRACT

PURPOSE: The prevalence of type 2 diabetes mellitus (T2DM) has been steadily increasing over the years. We aim to predict the occurrence of T2DM using mammography images within 5 years using two different methods and compare their performance.

APPROACH: We examined 312 samples, including 110 positive cases (developed T2DM after 5 years) and 202 negative cases (did not develop T2DM) using two different methods. In the first method, a radiomics-based approach, we utilized radiomics features and machine learning (ML) algorithms. The entire breast region was chosen as the region of interest for extracting radiomics features. Then, a binary breast image was created from which we extracted 668 features and analyzed them using various ML algorithms. In the second method, a complex convolutional neural network (CNN) with a modified ResNet architecture and various kernel sizes was applied to raw mammography images for the prediction task. A nested, stratified five-fold cross-validation was done for both parts A and B to compute accuracy, sensitivity, specificity, and area under the receiver operating curve (AUROC). Hyperparameter tuning was also done to enhance the model's performance and reliability.

RESULTS: The radiomics approach's light gradient boosting model gave 68.9% accuracy, 30.7% sensitivity, 89.5% specificity, and 0.63 AUROC. The CNN method achieved an AUROC of 0.58 over 20 epochs.

CONCLUSION: Radiomics outperformed CNN by 0.05 in terms of AUROC. This may be due to the more straightforward interpretability and clinical relevance of predefined radiomics features compared with the complex, abstract features learned by CNNs.

PMID:39776665 | PMC:PMC11702674 | DOI:10.1117/1.JMI.12.1.014501

Categories: Literature Watch

Deep-blur: Blind identification and deblurring with convolutional neural networks

Wed, 2025-01-08 06:00

Biol Imaging. 2024 Nov 15;4:e13. doi: 10.1017/S2633903X24000096. eCollection 2024.

ABSTRACT

We propose a neural network architecture and a training procedure to estimate blurring operators and deblur images from a single degraded image. Our key assumption is that the forward operators can be parameterized by a low-dimensional vector. The models we consider include a description of the point spread function with Zernike polynomials in the pupil plane or product-convolution expansions, which incorporate space-varying operators. Numerical experiments show that the proposed method can accurately and robustly recover the blur parameters even for large noise levels. For a convolution model, the average signal-to-noise ratio of the recovered point spread function ranges from 13 dB in the noiseless regime to 8 dB in the high-noise regime. In comparison, the tested alternatives yield negative values. This operator estimate can then be used as an input for an unrolled neural network to deblur the image. Quantitative experiments on synthetic data demonstrate that this method outperforms other commonly used methods both perceptually and in terms of SSIM. The algorithm can process a 512 512 image under a second on a consumer graphics card and does not require any human interaction once the operator parameterization has been set up.1.

PMID:39776610 | PMC:PMC11704139 | DOI:10.1017/S2633903X24000096

Categories: Literature Watch

Deep-learning-based image compression for microscopy images: An empirical study

Wed, 2025-01-08 06:00

Biol Imaging. 2024 Dec 20;4:e16. doi: 10.1017/S2633903X24000151. eCollection 2024.

ABSTRACT

With the fast development of modern microscopes and bioimaging techniques, an unprecedentedly large amount of imaging data is being generated, stored, analyzed, and shared through networks. The size of the data poses great challenges for current data infrastructure. One common way to reduce the data size is by image compression. This study analyzes multiple classic and deep-learning-based image compression methods, as well as an empirical study on their impact on downstream deep-learning-based image processing models. We used deep-learning-based label-free prediction models (i.e., predicting fluorescent images from bright-field images) as an example downstream task for the comparison and analysis of the impact of image compression. Different compression techniques are compared in compression ratio, image similarity, and, most importantly, the prediction accuracy of label-free models on original and compressed images. We found that artificial intelligence (AI)-based compression techniques largely outperform the classic ones with minimal influence on the downstream 2D label-free tasks. In the end, we hope this study could shed light on the potential of deep-learning-based image compression and raise the awareness of the potential impacts of image compression on downstream deep-learning models for analysis.

PMID:39776609 | PMC:PMC11704128 | DOI:10.1017/S2633903X24000151

Categories: Literature Watch

Bridging the gap: Integrating cutting-edge techniques into biological imaging with deepImageJ

Wed, 2025-01-08 06:00

Biol Imaging. 2024 Nov 22;4:e14. doi: 10.1017/S2633903X24000114. eCollection 2024.

ABSTRACT

This manuscript showcases the latest advancements in deepImageJ, a pivotal Fiji/ImageJ plugin for bioimage analysis in life sciences. The plugin, known for its user-friendly interface, facilitates the application of diverse pre-trained convolutional neural networks to custom data. The manuscript demonstrates several deepImageJ capabilities, particularly in deploying complex pipelines, three-dimensional (3D) image analysis, and processing large images. A key development is the integration of the Java Deep Learning Library, expanding deepImageJ's compatibility with various deep learning (DL) frameworks, including TensorFlow, PyTorch, and ONNX. This allows for running multiple engines within a single Fiji/ImageJ instance, streamlining complex bioimage analysis workflows. The manuscript details three case studies to demonstrate these capabilities. The first case study explores integrated image-to-image translation followed by nuclei segmentation. The second case study focuses on 3D nuclei segmentation. The third case study showcases large image volume segmentation and compatibility with the BioImage Model Zoo. These use cases underscore deepImageJ's versatility and power to make advanced DLmore accessible and efficient for bioimage analysis. The new developments within deepImageJ seek to provide a more flexible and enriched user-friendly framework to enable next-generation image processing in life science.

PMID:39776608 | PMC:PMC11704127 | DOI:10.1017/S2633903X24000114

Categories: Literature Watch

ProxiMO: Proximal Multi-operator Networks for Quantitative Susceptibility Mapping

Wed, 2025-01-08 06:00

Mach Learn Clin Neuroimaging (2024). 2025;15266:13-23. doi: 10.1007/978-3-031-78761-4_2. Epub 2024 Dec 6.

ABSTRACT

Quantitative Susceptibility Mapping (QSM) is a technique that derives tissue magnetic susceptibility distributions from phase measurements obtained through Magnetic Resonance (MR) imaging. This involves solving an ill-posed dipole inversion problem, however, and thus time-consuming and cumbersome data acquisition from several distinct head orientations becomes necessary to obtain an accurate solution. Most recent (supervised) deep learning methods for single-phase QSM require training data obtained via multiple orientations. In this work, we present an alternative unsupervised learning approach that can efficiently train on single-orientation measurement data alone, named ProxiMO (Proximal Multi-Operator), combining Learned Proximal Convolutional Neural Networks (LP-CNN) with multi-operator imaging (MOI). This integration enables LP-CNN training for QSM on single-phase data without ground truth reconstructions. We further introduce a semi-supervised variant, which further boosts the reconstruction performance, compared to the traditional supervised fashions. Extensive experiments on multicenter datasets illustrate the advantage of unsupervised training and the superiority of the proposed approach for QSM reconstruction. Code is available at https://github.com/shmuelor/ProxiMO.

PMID:39776602 | PMC:PMC11705005 | DOI:10.1007/978-3-031-78761-4_2

Categories: Literature Watch

Integrating Interpretability in Machine Learning and Deep Neural Networks: A Novel Approach to Feature Importance and Outlier Detection in COVID-19 Symptomatology and Vaccine Efficacy

Wed, 2025-01-08 06:00

Viruses. 2024 Nov 29;16(12):1864. doi: 10.3390/v16121864.

ABSTRACT

In this study, we introduce a novel approach that integrates interpretability techniques from both traditional machine learning (ML) and deep neural networks (DNN) to quantify feature importance using global and local interpretation methods. Our method bridges the gap between interpretable ML models and powerful deep learning (DL) architectures, providing comprehensive insights into the key drivers behind model predictions, especially in detecting outliers within medical data. We applied this method to analyze COVID-19 pandemic data from 2020, yielding intriguing insights. We used a dataset consisting of individuals who were tested for COVID-19 during the early stages of the pandemic in 2020. The dataset included self-reported symptoms and test results from a wide demographic, and our goal was to identify the most important symptoms that could help predict COVID-19 infection accurately. By applying interpretability techniques to both machine learning and deep learning models, we aimed to improve understanding of symptomatology and enhance early detection of COVID-19 cases. Notably, even though less than 1% of our cohort reported having a sore throat, this symptom emerged as a significant indicator of active COVID-19 infection, appearing 7 out of 9 times in the top four most important features across all methodologies. This suggests its potential as an early symptom marker. Studies have shown that individuals reporting sore throat may have a compromised immune system, where antibody generation is not functioning correctly. This aligns with our data, which indicates that 5% of patients with sore throats required hospitalization. Our analysis also revealed a concerning trend of diminished immune response post-COVID infection, increasing the likelihood of severe cases requiring hospitalization. This finding underscores the importance of monitoring patients post-recovery for potential complications and tailoring medical interventions accordingly. Our study also raises critical questions about the efficacy of COVID-19 vaccines in individuals presenting with sore throat as a symptom. The results suggest that booster shots might be necessary for this population to ensure adequate immunity, given the observed immune response patterns. The proposed method not only enhances our understanding of COVID-19 symptomatology but also demonstrates its broader utility in medical outlier detection. This research contributes valuable insights to ongoing efforts in creating interpretable models for COVID-19 management and vaccine optimization strategies. By leveraging feature importance and interpretability, these models empower physicians, healthcare workers, and researchers to understand complex relationships within medical data, facilitating more informed decision-making for patient care and public health initiatives.

PMID:39772174 | DOI:10.3390/v16121864

Categories: Literature Watch

FP-YOLOv8: Surface Defect Detection Algorithm for Brake Pipe Ends Based on Improved YOLOv8n

Wed, 2025-01-08 06:00

Sensors (Basel). 2024 Dec 23;24(24):8220. doi: 10.3390/s24248220.

ABSTRACT

To address the limitations of existing deep learning-based algorithms in detecting surface defects on brake pipe ends, a novel lightweight detection algorithm, FP-YOLOv8, is proposed. This algorithm is developed based on the YOLOv8n framework with the aim of improving accuracy and model lightweight design. First, the C2f_GhostV2 module has been designed to replace the original C2f module. It reduces the model's parameter count through its unique design. It achieves improved feature representation by adopting specific technique within its structure. Additionally, it incorporates the decoupled fully connected (DFC) attention mechanism, which minimizes information loss during long-range feature transmission by separately capturing pixel information along horizontal and vertical axes via convolution. Second, the Dynamic ATSS label allocation strategy is applied, which dynamically adjusts label assignments by integrating Anchor IoUs and predicted IoUs, effectively reducing the misclassification of high-quality prediction samples as negative samples. Thus, it improves the detection accuracy of the model. Lastly, an asymmetric small-target detection head, FADH, is proposed to utilize depth-separable convolution to accomplish classification and regression tasks, enabling more precise capture of detailed information across scales and improving the detection of small-target defects. The experimental results show that FP-YOLOv8 achieves a mAP50 of 89.5% and an F1-score of 87% on the ends surface defects dataset, representing improvements of 3.3% and 6.0%, respectively, over the YOLOv8n algorithm, Meanwhile, it reduces model parameters and computational costs by 14.3% and 21.0%. Additionally, compared to the baseline model, the AP50 values for cracks, scratches, and flash defects rise by 5.5%, 5.6%, and 2.3%, respectively. These results validate the efficacy of FP-YOLOv8 in enhancing defect detection accuracy, reducing missed detection rates, and decreasing model parameter counts and computational demands, thus meeting the requirements of online defect detection for brake pipe ends surfaces.

PMID:39771953 | DOI:10.3390/s24248220

Categories: Literature Watch

Fusion of Visible and Infrared Aerial Images from Uncalibrated Sensors Using Wavelet Decomposition and Deep Learning

Wed, 2025-01-08 06:00

Sensors (Basel). 2024 Dec 23;24(24):8217. doi: 10.3390/s24248217.

ABSTRACT

Multi-modal systems extract information about the environment using specialized sensors that are optimized based on the wavelength of the phenomenology and material interactions. To maximize the entropy, complementary systems operating in regions of non-overlapping wavelengths are optimal. VIS-IR (Visible-Infrared) systems have been at the forefront of multi-modal fusion research and are used extensively to represent information in all-day all-weather applications. Prior to image fusion, the image pairs have to be properly registered and mapped to a common resolution palette. However, due to differences in the device physics of image capture, information from VIS-IR sensors cannot be directly correlated, which is a major bottleneck for this area of research. In the absence of camera metadata, image registration is performed manually, which is not practical for large datasets. Most of the work published in this area assumes calibrated sensors and the availability of camera metadata providing registered image pairs, which limits the generalization capability of these systems. In this work, we propose a novel end-to-end pipeline termed DeepFusion for image registration and fusion. Firstly, we design a recursive crop and scale wavelet spectral decomposition (WSD) algorithm for automatically extracting the patch of visible data representing the thermal information. After data extraction, both the images are registered to a common resolution palette and forwarded to the DNN for image fusion. The fusion performance of the proposed pipeline is compared and quantified with state-of-the-art classical and DNN architectures for open-source and custom datasets demonstrating the efficacy of the pipeline. Furthermore, we also propose a novel keypoint-based metric for quantifying the quality of fused output.

PMID:39771950 | DOI:10.3390/s24248217

Categories: Literature Watch

A Scene Knowledge Integrating Network for Transmission Line Multi-Fitting Detection

Wed, 2025-01-08 06:00

Sensors (Basel). 2024 Dec 23;24(24):8207. doi: 10.3390/s24248207.

ABSTRACT

Aiming at the severe occlusion problem and the tiny-scale object problem in the multi-fitting detection task, the Scene Knowledge Integrating Network (SKIN), including the scene filter module (SFM) and scene structure information module (SSIM) is proposed. Firstly, the particularity of the scene in the multi-fitting detection task is analyzed. Hence, the aggregation of the fittings is defined as the scene according to the professional knowledge of the power field and the habit of the operators in identifying the fittings. So, the scene knowledge will include global context information, fitting fine-grained visual information and scene structure information. Then, a scene filter module is designed to learn the global context information and fitting fine-grained visual information, and a scene structure module is designed to learn the scene structure information. Finally, the scene semantic features are used as the carrier to integrate three categories of information into the relative scene features, which can assist in the recognition of the occluded fittings and the tiny-scale fittings after feature mining and feature integration. The experiments show that the proposed network can effectively improve the performance of the multi-fitting detection task compared with the Faster R-CNN and other state-of-the-art models. In particular, the detection performances of the occluded and tiny-scale fittings are significantly improved.

PMID:39771941 | DOI:10.3390/s24248207

Categories: Literature Watch

A Systematic Review on the Advancements in Remote Sensing and Proximity Tools for Grapevine Disease Detection

Wed, 2025-01-08 06:00

Sensors (Basel). 2024 Dec 21;24(24):8172. doi: 10.3390/s24248172.

ABSTRACT

Grapevines (Vitis vinifera L.) are one of the most economically relevant crops worldwide, yet they are highly vulnerable to various diseases, causing substantial economic losses for winegrowers. This systematic review evaluates the application of remote sensing and proximal tools for vineyard disease detection, addressing current capabilities, gaps, and future directions in sensor-based field monitoring of grapevine diseases. The review covers 104 studies published between 2008 and October 2024, identified through searches in Scopus and Web of Science, conducted on 25 January 2024, and updated on 10 October 2024. The included studies focused exclusively on the sensor-based detection of grapevine diseases, while excluded studies were not related to grapevine diseases, did not use remote or proximal sensing, or were not conducted in field conditions. The most studied diseases include downy mildew, powdery mildew, Flavescence dorée, esca complex, rots, and viral diseases. The main sensors identified for disease detection are RGB, multispectral, hyperspectral sensors, and field spectroscopy. A trend identified in recent published research is the integration of artificial intelligence techniques, such as machine learning and deep learning, to improve disease detection accuracy. The results demonstrate progress in sensor-based disease monitoring, with most studies concentrating on specific diseases, sensor platforms, or methodological improvements. Future research should focus on standardizing methodologies, integrating multi-sensor data, and validating approaches across diverse vineyard contexts to improve commercial applicability and sustainability, addressing both economic and environmental challenges.

PMID:39771913 | DOI:10.3390/s24248172

Categories: Literature Watch

Pages