Deep learning
Combining Pre- and Post-Demosaicking Noise Removal for RAW Video
IEEE Trans Image Process. 2025 Jan 15;PP. doi: 10.1109/TIP.2025.3527886. Online ahead of print.
ABSTRACT
Denoising is one of the fundamental steps of the processing pipeline that converts data captured by a camera sensor into a display-ready image or video. It is generally performed early in the pipeline, usually before demosaicking, although studies swapping their order or even conducting them jointly have been proposed. With the advent of deep learning, the quality of denoising algorithms has steadily increased. Even so, modern neural networks still have a hard time adapting to new noise levels and scenes, which is indispensable for real-world applications. With those in mind, we propose a self-similarity-based denoising scheme that weights both a pre- and a post-demosaicking denoiser for Bayer-patterned CFA video data. We show that a balance between the two leads to better image quality, and we empirically find that higher noise levels benefit from a higher influence pre-demosaicking. We also integrate temporal trajectory prefiltering steps before each denoiser, which further improve texture reconstruction. The proposed method only requires an estimation of the noise model at the sensor, accurately adapts to any noise level, and is competitive with the state of the art, making it suitable for real-world videography.
PMID:40031011 | DOI:10.1109/TIP.2025.3527886
Torsion Graph Neural Networks
IEEE Trans Pattern Anal Mach Intell. 2025 Jan 13;PP. doi: 10.1109/TPAMI.2025.3528449. Online ahead of print.
ABSTRACT
Geometric deep learning (GDL) models have demonstrated a great potential for the analysis of non-Euclidian data. They are developed to incorporate the geometric and topological information of non-Euclidian data into the end-to-end deep learning architectures. Motivated by the recent success of discrete Ricci curvature in graph neural network (GNNs), we propose TorGNN, an analytic Torsion enhanced Graph Neural Network model. The essential idea is to characterize graph local structures with an analytic torsion based weight formula. Mathematically, analytic torsion is a topological invariant that can distinguish spaces which are homotopy equivalent but not homeomorphic. In our TorGNN, for each edge, a corresponding local simplicial complex is identified, then the analytic torsion (for this local simplicial complex) is calculated, and further used as a weight (for this edge) in message-passing process. Our TorGNN model is validated on link prediction tasks from sixteen different types of networks and node classification tasks from four types of networks. It has been found that our TorGNN can achieve superior performance on both tasks, and outperform various state-of-the-art models. This demonstrates that analytic torsion is a highly efficient topological invariant in the characterization of graph structures and can significantly boost the performance of GNNs.
PMID:40030998 | DOI:10.1109/TPAMI.2025.3528449
Latent Weight Quantization for Integerized Training of Deep Neural Networks
IEEE Trans Pattern Anal Mach Intell. 2025 Jan 9;PP. doi: 10.1109/TPAMI.2025.3527498. Online ahead of print.
ABSTRACT
Existing methods for integerized training speed up deep learning by using low-bitwidth integerized weights, activations, gradients, and optimizer buffers. However, they overlook the issue of full-precision latent weights, which consume excessive memory to accumulate gradient-based updates for optimizing the integerized weights. In this paper, we propose the first latent weight quantization schema for general integerized training, which minimizes quantization perturbation to training process via residual quantization with optimized dual quantizer. We leverage residual quantization to eliminate the correlation between latent weight and integerized weight for suppressing quantization noise. We further propose dual quantizer with optimal nonuniform codebook to avoid frozen weight and ensure statistically unbiased training trajectory as full-precision latent weight. The codebook is optimized to minimize the disturbance on weight update under importance guidance and achieved with a three-segment polyline approximation for hardware-friendly implementation. Extensive experiments show that the proposed schema allows integerized training with lowest 4-bit latent weight for various architectures including ResNets, MobileNetV2, and Transformers, and yields negligible performance loss in image classification and text generation. Furthermore, we successfully fine-tune Large Language Models with up to 13 billion parameters on one single GPU using the proposed schema.
PMID:40030978 | DOI:10.1109/TPAMI.2025.3527498
Learning-Based Modeling and Predictive Control for Unknown Nonlinear System With Stability Guarantees
IEEE Trans Neural Netw Learn Syst. 2025 Jan 10;PP. doi: 10.1109/TNNLS.2024.3525264. Online ahead of print.
ABSTRACT
This work focuses on the safety of learning-based control for unknown nonlinear system, considering the stability of learned dynamics and modeling mismatch between the learned dynamics and the true one. A learning-based scheme imposing the stability constraint is proposed in this work for modeling and stable control of unknown nonlinear system. Specifically, a linear representation of unknown nonlinear dynamics is established using the Koopman theory. Then, a deep learning approach is utilized to approximate embedding functions of Koopman operator for unknown system. For the safe manipulation of proposed scheme in the real-world applications, a stable constraint of learned dynamics and Lipschitz constraint of embedding functions are considered for learning a stable model for prediction and control. Moreover, a robust predictive control scheme is adopted to eliminate the effect of modeling mismatch between the learned dynamics and the true one, such that the stabilization of unknown nonlinear system is achieved. Finally, the effectiveness of proposed scheme is demonstrated on the tethered space robot (TSR) with unknown nonlinear dynamics.
PMID:40030974 | DOI:10.1109/TNNLS.2024.3525264
Irregular Artificial Vision Optimization Strategies Based on Transformer Saliency Detection
IEEE J Biomed Health Inform. 2025 Jan 10;PP. doi: 10.1109/JBHI.2024.3524642. Online ahead of print.
ABSTRACT
To improve the performance of object recognition under artificial prosthetic vision, this study proposes a two-stage method. The first stage is to extract the saliency and edge Mask of the object (SMP, EMP). Then, the irregular visual information of the object is processed using Irregularity Correction (IC). We design eye-hand coordination tasks and simulate artificial vision with retinal prostheses to validate strategy effectiveness, and select direct pixelation (DP) as a control group. Each subject retained a phosphene map in the same stochastic pattern in all his/her trails. The real-time experimental results showed that the deep saliency-based optimization strategies improved the performance of the subjects when completing tasks, in terms of head movement, recognition accuracy, and response time, and counts for successful small-objects recognition. The subjects have the smallest-scale average head movement (76.53 deg ± 20.75 deg), higher average objects recognition accuracy (91.18% ± 2.52%), and less time for finishing the task (35.71 s ± 8.66 s) and better successful search times of the small target objects (1.35 ± 0.33) under the SMP strategy. When integrating with IC, subjects' average performances have further improved to 63.39 ± 15.38 deg, 94.22% ± 3.94%, 25.76 s ± 6.24 s and 1.05 ± 0.30 respectively, which also significantly outperformed the DP condition. These results indicated that when utilizing the deep-learning-based saliency detection and IC processing, subjects could shorten the searching process and were able to discern the target objects more reliably. This work could be informative to future prosthetic devices considering implementation with the technique of artificial intelligence.
PMID:40030970 | DOI:10.1109/JBHI.2024.3524642
Physiological Information Preserving Video Compression for rPPG
IEEE J Biomed Health Inform. 2025 Jan 7;PP. doi: 10.1109/JBHI.2025.3526837. Online ahead of print.
ABSTRACT
Remote photoplethysmography (rPPG) has recently attracted much attention due to its non-contact measurement convenience and great potential in health care and computer vision applications. Early rPPG studies were mostly developed on self-collected uncompressed video data, which limited their application in scenarios that require long-distance real-time video transmission, and also hindered the generation of large-scale publicly available benchmark datasets. In recent years, with the popularization of high-definition video and the rise of telemedicine, the pressure of storage and real-time video transmission under limited bandwidth have made the compression of rPPG video inevitable. However, video compression can adversely affect rPPG measurements. This is due to the fact that conventional video compression algorithms are not specifically proposed to preserve physiological signals. Based on this, we propose a video compression scheme specifically designed for rPPG application. The proposed approach consists of three main strategies: 1) facial ROI-based computational resource reallocation; 2) rPPG signal preserving bit resource reallocation; and 3) temporal domain up- and down-sampling coding. UBFC-rPPG, ECG-Fitness, and a self-collected dataset are used to evaluate the performance of the proposed method. The results demonstrate that the proposed method can preserve almost all physiological information after compressing the original video to 1/60 of its original size. The proposed method is expected to promote the development of telemedicine and deep learning techniques relying on large-scale datasets in the field of rPPG measurement.
PMID:40030966 | DOI:10.1109/JBHI.2025.3526837
P2TC: A Lightweight Pyramid Pooling Transformer-CNN Network for Accurate 3D Whole Heart Segmentation
IEEE J Biomed Health Inform. 2025 Jan 7;PP. doi: 10.1109/JBHI.2025.3526727. Online ahead of print.
ABSTRACT
Cardiovascular disease is a leading global cause of death, requiring accurate heart segmentation for diagnosis and surgical planning. Deep learning methods have been demonstrated to achieve superior performances in cardiac structures segmentation. However, there are still limitations in 3D whole heart segmentation, such as inadequate spatial context modeling, difficulty in capturing long-distance dependencies, high computational complexity, and limited representation of local high-level semantic information. To tackle the above problems, we propose a lightweight Pyramid Pooling Transformer-CNN (P2TC) network for accurate 3D whole heart segmentation. The proposed architecture comprises a dual encoder-decoder structure with a 3D pyramid pooling Transformer for multi-scale information fusion and a lightweight large-kernel Convolutional Neural Network (CNN) for local feature extraction. The decoder has two branches for precise segmentation and contextual residual handling. The first branch is used to generate segmentation masks for pixel-level classification based on the features extracted by the encoder to achieve accurate segmentation of cardiac structures. The second branch highlights contextual residuals across slices, enabling the network to better handle variations and boundaries. Extensive experimental results on the Multi-Modality Whole Heart Segmentation (MM-WHS) 2017 challenge dataset demonstrate that P2TC outperforms the most advanced methods, achieving the Dice scores of 92.6% and 88.1% in Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) modalities respectively, which surpasses the baseline model by 1.5% and 1.7%, and achieves state-of-the-art segmentation results. Our code will be released via https://github.com/Countdown229/P2TC.
PMID:40030965 | DOI:10.1109/JBHI.2025.3526727
Non-invasive Detection of Adenoid Hypertrophy Using Deep Learning Based on Heart-Lung Sounds
IEEE J Biomed Health Inform. 2025 Jan 10;PP. doi: 10.1109/JBHI.2025.3527403. Online ahead of print.
ABSTRACT
Adenoid hypertrophy is one of the most common upper respiratory tract disorders during childhood, leading to a range of symptoms such as nasal congestion, mouth breathing and obstructive sleep apnea. Current diagnostic methods, including computerized tomography scans and nasal endoscopy, are invasive or involve ionizing radiation, rendering them unsuitable for long-term assessments. To address these clinical challenges, this paper proposes a novel deep learning approach for the non-invasive detection of adenoid hypertrophy using heartlung sounds. Firstly, we established a heart-lung sound database with corresponding labels indicating adenoid size. Subsequently, we employed three different deep learning tasks to explore the association between heart-lung sounds and adenoid size. In particular, it includes binary classification to distinguish between normal and abnormal cases, four-grade classification to assess the severity of adenoid hypertrophy, and regression models to predict the actual size of the adenoids. The experimental results demonstrate that the deep learning models can effectively predict the condition of adenoid hypertrophy based on heart-lung sounds. In resource-constrained clinical environments, the proposed methods for adenoid hypertrophy automatic detection provide a simple and non-invasive approach, which can reduce healthcare costs and facilitate remote self-screening.
PMID:40030964 | DOI:10.1109/JBHI.2025.3527403
DiffuSeg: Domain-driven Diffusion for Medical Image Segmentation
IEEE J Biomed Health Inform. 2025 Jan 7;PP. doi: 10.1109/JBHI.2025.3526806. Online ahead of print.
ABSTRACT
In recent years, the deployment of supervised machine learning techniques for segmentation tasks has significantly increased. Nonetheless, the annotation process for extensive datasets remains costly, labor-intensive, and error-prone. While acquiring sufficiently large datasets to train deep learning models is feasible, these datasets often experience a distribution shift relative to the actual test data. This problem is particularly critical in the domain of medical imaging, where it adversely affects the efficacy of automatic segmentation models. In this work, we introduce DiffuSeg, a novel conditional diffusion model developed for medical image data, that exploits any labels to synthesize new images in the target domain. This allows a number of new research directions, including the segmentation task that motivates this work. Our method only requires label maps from any existing datasets and unlabelled images from the target domain for image diffusion. To learn the target domain knowledge, a feature factorization variational autoencoder is proposed to provide conditional information for the diffusion model. Consequently, the segmentation network can be trained with the given labels and the synthetic images, thus avoiding human annotations. Initially, we apply our method to the MNIST dataset and subsequently adapt it for use with medical image segmentation datasets, such as retinal fundus images for vessel segmentation and MRI images for heart segmentation. Our approach exhibits significant improvements over relevant baselines in both image generation and segmentation accuracy, especially in scenarios where annotations for the target dataset are unavailable during training. An open-source implementation of our approach can be released after reviewing.
PMID:40030962 | DOI:10.1109/JBHI.2025.3526806
Deep Learning-Based Diagnostic Model for Parkinson's Disease Using Handwritten Spiral and Wave Images
Curr Med Sci. 2025 Mar 3. doi: 10.1007/s11596-025-00017-3. Online ahead of print.
ABSTRACT
OBJECTIVE: To develop and validate a deep neural network (DNN) model for diagnosing Parkinson's Disease (PD) using handwritten spiral and wave images, and to compare its performance with various machine learning (ML) and deep learning (DL) models.
METHODS: The study utilized a dataset of 204 images (102 spiral and 102 wave) from PD patients and healthy subjects. The images were preprocessed using the Histogram of Oriented Gradients (HOG) descriptor and augmented to increase dataset diversity. The DNN model was designed with an input layer, three convolutional layers, two max-pooling layers, two dropout layers, and two dense layers. The model was trained and evaluated using metrics such as accuracy, sensitivity, specificity, and loss. The DNN model was compared with nine ML models (random forest, logistic regression, AdaBoost, k-nearest neighbor, gradient boost, naïve Bayes, support vector machine, decision tree) and two DL models (convolutional neural network, DenseNet-201).
RESULTS: The DNN model outperformed all other models in diagnosing PD from handwritten spiral and wave images. On spiral images, the DNN model achieved accuracies of 41.24% over naïve Bayes, 31.24% over decision tree, and 27.9% over support vector machine. On wave images, the DNN model achieved accuracies of 40% over naïve Bayes, 36.67% over decision tree, and 30% over support vector machine. The DNN model demonstrated significant improvements in sensitivity and specificity compared to other models.
CONCLUSIONS: The DNN model significantly improves the accuracy of PD diagnosis using handwritten spiral and wave images, outperforming several ML and DL models. This approach offers a promising diagnostic tool for early PD detection and provides a foundation for future work to incorporate additional features and enhance detection accuracy.
PMID:40029495 | DOI:10.1007/s11596-025-00017-3
Graphene-based FETs for advanced biocatalytic profiling: investigating heme peroxidase activity with machine learning insights
Mikrochim Acta. 2025 Mar 3;192(3):199. doi: 10.1007/s00604-025-06955-y.
ABSTRACT
Graphene-based field-effect transistors (GFETs) are rapidly gaining recognition as powerful tools for biochemical analysis due to their exceptional sensitivity and specificity. In this study, we utilize a GFET system to explore the peroxidase-based biocatalytic behavior of horseradish peroxidase (HRP) and the heme molecule, the latter serving as the core component responsible for HRP's enzymatic activity. Our primary objective is to evaluate the effectiveness of GFETs in analyzing the peroxidase activity of these compounds. We highlight the superior sensitivity of graphene-based FETs in detecting subtle variations in enzyme activity, which is critical for accurate biochemical analysis. Using the transconductance measurement system of GFETs, we investigate the mechanisms of enzymatic reactions, focusing on suicide inactivation in HRP and heme bleaching under two distinct scenarios. In the first scenario, we investigate the inactivation of HRP in the presence of hydrogen peroxide and ascorbic acid as cosubstrate. In the second scenario, we explore the bleaching of the heme molecule under conditions of hydrogen peroxide exposure, without the addition of any cosubstrate. Our findings demonstrate that this advanced technique enables precise monitoring and comprehensive analysis of these enzymatic processes. Additionally, we employed a machine learning algorithm based on a multilayer perceptron deep learning architecture to detect the enzyme parameters under various chemical and environmental conditions. Integrating machine learning and probabilistic methods significantly enhances the accuracy of enzyme behavior predictions.
PMID:40029395 | DOI:10.1007/s00604-025-06955-y
Leveraging Digital Perceptual Technologies for Analysis of Human Biomechanical Processes: A Contactless Approach for Workload Assessment
IISE Trans Occup Ergon Hum Factors. 2025 Mar 3:1-14. doi: 10.1080/24725838.2025.2469076. Online ahead of print.
ABSTRACT
OCCUPATIONAL APPLICATIONWe present a computer vision framework that is intended to help enhance workplace safety and productivity across diverse occupational domains by monitoring worker movements and identifying ergonomic risks. By analyzing movement patterns and biomechanics, use of this framework could promote safe and healthy working conditions, helping to prevent injuries and mitigate workplace accidents. Additionally, application of the framework could aid in assessing assistive technologies that support workers with various conditions in completing tasks safely and efficiently, thereby helping to boost productivity.
PMID:40028793 | DOI:10.1080/24725838.2025.2469076
RESNET-50 with ontological visual features based medicinal plants classification
Network. 2025 Mar 3:1-37. doi: 10.1080/0954898X.2024.2447878. Online ahead of print.
ABSTRACT
The proper study and administration of biodiversity relies heavily on accurate plant species identification. To determine a plant's species by manual identification, experts use a series of keys based on measurements of various plant features. The manual procedure, however, is tiresome and lengthy. Recently, advancements in technology have prompted the need for more effective approaches to satisfy species identification standards, such as the creation of digital-image-processing and template tools. There are significant obstacles to fully automating the recognition of plant species, despite the many current research on the topic. In this work, the leaf classification was performed using the ontological relationship between the leaf features and their classes. This relationship was identified by using the swarm intelligence techniques called particle swarm and cuckoo search algorithm. Finally, these features were trained using the traditional machine learning algorithm regression neural network. To increase the effectiveness of the ontology, the machine learning approach results were combined with the deep learning approach called RESNET50 using association rule. The proposed ontology model produced an identification accuracy of 98.8% for GRNN model, 99% accuracy for RESNET model and 99.9% for the combined model for 15 types of medicinal leaf sets.
PMID:40028706 | DOI:10.1080/0954898X.2024.2447878
A GPR-based framework for assessing corrosivity of concrete structures using frequency domain approach
Heliyon. 2025 Feb 11;11(4):e42641. doi: 10.1016/j.heliyon.2025.e42641. eCollection 2025 Feb 28.
ABSTRACT
Ground-penetrating radar (GPR) is a prominent non-destructive testing (NDT) method for corrosivity evaluation in concrete structures. Most GPR interpretation methods rely solely on the absolute values of rebar reflection intensity, making them vulnerable to misinterpretation of the effects of complex factors. This study introduces a more comprehensive GPR data interpretation method, encompassing analysis in time and time-frequency domains. The developed method constitutes efficient GPR data collection and pre-processing, deep learning rebar recognition, and frequency domain analysis using the Short-Time Fourier Transform (STFT). The center frequency of rebar responses was normalized and depth-corrected to standardize the analysis method. The GPR condition mapping thresholds were optimized and validated using ground truth conditions from hammer tapping and reinforcement exposure of reinforced concrete walls. The method demonstrated superior performance compared to the traditional amplitude-based approach in detecting and quantifying the extent of corrosion-induced deterioration, with an average accuracy of 0.80 for active corrosion and 0.84 for active-corrosion with corrosion-induced delamination.
PMID:40028599 | PMC:PMC11872417 | DOI:10.1016/j.heliyon.2025.e42641
DeepRaman: Implementing surface-enhanced Raman scattering together with cutting-edge machine learning for the differentiation and classification of bacterial endotoxins
Heliyon. 2025 Feb 8;11(4):e42550. doi: 10.1016/j.heliyon.2025.e42550. eCollection 2025 Feb 28.
ABSTRACT
To classify raw SERS Raman spectra from biological materials, we propose DeepRaman, a new architecture inspired by the Progressive Fourier Transform and integrated with the scalogram transformation approach. Unlike standard machine learning approaches such as PCA, LDA, SVM, RF, GBM etc, DeepRaman functions independently, requiring no human interaction, and can be used to much smaller datasets than traditional CNNs. Performance of DeepRaman on 14 endotoxins bacteria and on a public data achieved an extraordinary accuracy of 99 percent. This provides exact endotoxin classification and has tremendous potential for accelerated medical diagnostics and treatment decision-making in cases of pathogenic infections.
BACKGROUND: Bacterial endotoxin, a lipopolysaccharide exuded by bacteria during their growth and infection process, serves as a valuable biomarker for bacterial identification. It is a vital component of the outer membrane layer in Gram-negative bacteria. By employing silver nanorod-based array substrates, surface-enhanced Raman scattering (SERS) spectra were obtained for two separate datasets: Eleven endotoxins produced by bacteria, each having an 8.75 pg average detection quantity per measurement, and three controls chitin, lipoteichoic acid (LTA), bacterial peptidoglycan (PGN), because their structures differ greatly from those of LPS.
OBJECTIVE: This study utilized various classical machine learning techniques, such as support vector machines, k-nearest neighbors, and random forests, in conjunction with a modified deep learning approach called DeepRaman. These algorithms were employed to distinguish and categorize bacterial endotoxins, following appropriate spectral pre-processing, which involved novel filtering techniques and advanced feature extraction methods.
RESULT: Most traditional machine learning algorithms achieved distinction accuracies of over 99 percent, whereas DeepRaman demonstrated an exceptional accuracy of 100 percent. This method offers precise endotoxin classification and holds significant potential for expedited medical diagnoses and therapeutic decision-making in cases of pathogenic infections.
CONCLUSION: We present the effectiveness of DeepRaman, an innovative architecture inspired by the Progressive Fourier Transform and integrated with the scalogram transformation method, in classifying raw SERS Raman spectral data from biological specimens with unparalleled accuracy relative to conventional machine learning algorithms. Notably, this Convolutional Neural Network (CNN) operates autonomously, requiring no human intervention, and can be applied with substantially smaller datasets than traditional CNNs. Furthermore, it exhibits remarkable proficiency in managing challenging baseline scenarios that often lead to failures in other techniques, thereby promoting the broader clinical adoption of Raman spectroscopy.
PMID:40028585 | PMC:PMC11870271 | DOI:10.1016/j.heliyon.2025.e42550
Framework for smartphone-based grape detection and vineyard management using UAV-trained AI
Heliyon. 2025 Feb 6;11(4):e42525. doi: 10.1016/j.heliyon.2025.e42525. eCollection 2025 Feb 28.
ABSTRACT
Viticulture benefits significantly from rapid grape bunch identification and counting, enhancing yield and quality. Recent technological and machine learning advancements, particularly in deep learning, have provided the tools necessary to create more efficient, automated processes that significantly reduce the time and effort required for these tasks. On one hand, drone, or Unmanned Aerial Vehicles (UAV) imagery combined with deep learning algorithms has revolutionised agriculture by automating plant health classification, disease identification, and fruit detection. However, these advancements often remain inaccessible to farmers due to their reliance on specialized hardware like ground robots or UAVs. On the other hand, most farmers have access to smartphones. This article proposes a novel approach combining UAVs and smartphone technologies. An AI-based framework is introduced, integrating a 5-stage AI pipeline combining object detection and pixel-level segmentation algorithms to automatically detect grape bunches in smartphone images of a commercial vineyard with vertical trellis training. By leveraging UAV-captured data for training, the proposed model not only accelerates the detection process but also enhances the accuracy and adaptability of grape bunch detection across different devices, surpassing the efficiency of traditional and purely UAV-based methods. To this end, using a dataset of UAV videos recorded during early growth stages in July (BBCH77-BBCH79), the X-Decoder segments vegetation in the front of the frames from their background and surroundings. X-Decoder is particularly advantageous because it can be seamlessly integrated into the AI pipeline without requiring changes to how data is captured, making it more versatile than other methods. Then, YOLO is trained using the videos and further applied to images taken by farmers with common smartphones (Xiaomi Poco X3 Pro and iPhone X). In addition, a web app was developed to connect the system with mobile technology easily. The proposed approach achieved a precision of 0.92 and recall of 0.735, with an F1 score of 0.82 and an Average Precision (AP) of 0.802 under different operation conditions, indicating high accuracy and reliability in detecting grape bunches. In addition, the AI-detected grape bunches were compared with the actual ground truth, achieving an R2 value as high as 0.84, showing the robustness of the system. This study highlights the potential of using smartphone imaging and web applications together, making an effort to integrate these models into a real platform for farmers, offering a practical, affordable, accessible, and scalable solution. While smartphone-based image collection for model training is labour-intensive and costly, incorporating UAV data accelerates the process, facilitating the creation of models that generalise across diverse data sources and platforms. This blend of UAV efficiency and smartphone precision significantly cuts vineyard monitoring time and effort.
PMID:40028582 | PMC:PMC11869025 | DOI:10.1016/j.heliyon.2025.e42525
Evaluating pedestrian crossing safety: Implementing and evaluating a convolutional neural network model trained on paired aerial and subjective perspective images
Heliyon. 2025 Feb 3;11(4):e42428. doi: 10.1016/j.heliyon.2025.e42428. eCollection 2025 Feb 28.
ABSTRACT
With pedestrian crossings implicated in a significant proportion of vehicle-pedestrian accidents and the French government's initiatives to improve pedestrian safety, there is a pressing need for efficient, large-scale evaluation of pedestrian crossings. This study proposes the deployment of advanced deep learning neural networks to automate the assessment of pedestrian crossings and roundabouts, leveraging aerial and street-level imagery sourced from Google Maps and Google Street View. Utilizing ConvNextV2, ResNet50, and ResNext50 models, we conducted a comprehensive analysis of pedestrian crossings across various urban and rural settings in France, focusing on nine identified risk factors. Our methodology incorporates Mask R-CNN for precise segmentation and detection of zebra crossings and roundabouts, overcoming traditional data annotation challenges and extending coverage to underrepresented areas. The analysis reveals that the ConvNextV2 model, in particular, demonstrates superior performance across most tasks, despite challenges such as data imbalance and the complex nature of variables like visibility and parking proximity. The findings highlight the potential of convolutional neural networks in improving pedestrian safety by enabling scalable and objective evaluations of crossings. The study underscores the necessity for continued dataset augmentation and methodological advancements to tackle identified challenges. Our research contributes to the broader field of road safety by demonstrating the feasibility and effectiveness of automated, image-based pedestrian crossing audits, paving the way for more informed and effective safety interventions.
PMID:40028551 | PMC:PMC11872108 | DOI:10.1016/j.heliyon.2025.e42428
A fully automated machine-learning-based workflow for radiation treatment planning in prostate cancer
Clin Transl Radiat Oncol. 2025 Feb 11;52:100933. doi: 10.1016/j.ctro.2025.100933. eCollection 2025 May.
ABSTRACT
INTRODUCTION: The integration of artificial intelligence into radiotherapy planning for prostate cancer has demonstrated promise in enhancing efficiency and consistency. In this study, we assess the clinical feasibility of a fully automated machine learning (ML)-based "one-click" workflow that combines ML-based segmentation and treatment planning. The proposed workflow was designed to create a clinically acceptable radiotherapy plan within the inter-observer variation of conventional plans.
METHODS: We evaluated the fully-automated workflow on five low-risk prostate cancer patients treated with external beam radiotherapy and compared the results with conventional optimized and inverse planned radiotherapy plans based on the contours of six different experienced radiation oncologists. Both qualitative and quantitative metrics were analyzed. Additionally, we evaluated the dose distribution of the ML-based and conventional radiation treatment plans on the different segmentations (manual vs. manual and manual vs. automation).
RESULTS: The automatic deep-learning segmentation of the target volume revealed a close agreement between the deep-learning based and expert contours referring to Dice Similarity- and Hausdorff index. However, the deep-learning based CTVs had a significantly smaller volume than the expert CTVs (47.1 cm3 vs. 62.6 cm3). The fully automated ML-based plans provide clinically acceptable dose coverage within the range of inter-observer variability observed in the manual plans. Due to the smaller segmentation of the CTV the dose coverage of the CTV and PTV (expert contours) were significantly lower than that of the manual plans.
CONCLUSION: Our study indicates that the tested fully automated ML-based workflow is clinically feasible and leads to comparable results to conventional radiation treatment plans. This represents a promising step towards efficient and standardized prostate cancer treatment. Nevertheless, in the evaluated cohort, auto segmentation was associated with smaller target volumes compared to manual contours, highlighting the necessity of improving segmentation models and prospective testing of automation in radiation therapy.
PMID:40028424 | PMC:PMC11871478 | DOI:10.1016/j.ctro.2025.100933
Generative Deep Learning-Based Efficient Design of Organic Molecules with Tailored Properties
ACS Cent Sci. 2024 Aug 30;11(2):219-227. doi: 10.1021/acscentsci.4c00656. eCollection 2025 Feb 26.
ABSTRACT
Innovative approaches to design molecules with tailored properties are required in various research areas. Deep learning methods can accelerate the discovery of new materials by leveraging molecular structure-property relationships. In this study, we successfully developed a generative deep learning (Gen-DL) model that was trained on a large experimental database (DBexp) including 71,424 molecule/solvent pairs and was able to design molecules with target properties in various solvents. The Gen-DL model can generate molecules with specified optical properties, such as electronic absorption/emission peak position and bandwidth, extinction coefficient, photoluminescence (PL) quantum yield, and PL lifetime. The Gen-DL model was shown to leverage the essential design principles of conjugation effects, Stokes shifts, and solvent effects when it generated molecules with target optical properties. Additionally, the Gen-DL model was demonstrated to generate practically useful molecules developed for real-world applications. Accordingly, the Gen-DL model can be a promising tool for the discovery and design of novel molecules with tailored properties in various research areas, such as organic photovoltaics (OPVs), organic light-emitting diodes (OLEDs), organic photodiodes (OPDs), bioimaging dyes, and so on.
PMID:40028364 | PMC:PMC11869130 | DOI:10.1021/acscentsci.4c00656
Detection of canine external ear canal lesions using artificial intelligence
Vet Dermatol. 2025 Mar 3. doi: 10.1111/vde.13332. Online ahead of print.
ABSTRACT
BACKGROUND: Early and accurate diagnosis of otitis externa is crucial for correct management yet can often be challenging. Artificial intelligence (AI) is a valuable diagnostic tool in human medicine. Currently, no such tool is available in veterinary dermatology/otology.
OBJECTIVES: As a proof-of-concept, we developed and evaluated a novel YOLOv5 object detection model for identifying healthy ear canals, otitis or masses in the canine ear canal.
ANIMALS: Digital images of ear canals from dogs with healthy ears, otitis and masses in the ear canal were used.
MATERIALS AND METHODS: Four variants of the YOLOv5 model were trained, each using a different training dataset. The prediction performance metrics used to evaluate them include F1/confidence-curves, mean average precision (mAP50), precision (P), recall (R) and average precision (AP) for accuracy. These are quantifiable performance metrics used to evaluate the efficacy of each variant.
RESULTS: All four variants were capable of detecting and classifying the ear canal. However, training datasets with many duplicates (A and C) inflated performance metrics as a consequence of data leakage, potentially compromising their effectiveness on unseen images. Additionally, larger datasets (without duplicates) demonstrated superior performance metrics compared to model variants trained on smaller datasets (without duplicates).
CONCLUSIONS AND CLINICAL RELEVANCE: This novel AI object detection model has the potential for application in the field of veterinary dermatology. An external validation study is needed prior to clinical deployment.
PMID:40026191 | DOI:10.1111/vde.13332