Deep learning
Development of an Interpretable Deep Learning Model for Pathological Tumor Response Assessment After Neoadjuvant Therapy
Biol Proced Online. 2024 Apr 17;26(1):10. doi: 10.1186/s12575-024-00234-5.
ABSTRACT
BACKGROUND: Neoadjuvant therapy followed by surgery has become the standard of care for locally advanced esophageal squamous cell carcinoma (ESCC) and accurate pathological response assessment is critical to assess the therapeutic efficacy. However, it can be laborious and inconsistency between different observers may occur. Hence, we aim to develop an interpretable deep-learning model for efficient pathological response assessment following neoadjuvant therapy in ESCC.
METHODS: This retrospective study analyzed 337 ESCC resection specimens from 2020-2021 at the Pudong-Branch (Cohort 1) and 114 from 2021-2022 at the Puxi-Branch (External Cohort 2) of Fudan University Shanghai Cancer Center. Whole slide images (WSIs) from these two cohorts were generated using different scanning machines to test the ability of the model in handling color variations. Four pathologists independently assessed the pathological response. The senior pathologists annotated tumor beds and residual tumor percentages on WSIs to determine consensus labels. Furthermore, 1850 image patches were randomly extracted from Cohort 1 WSIs and binarily classified for tumor viability. A deep-learning model employing knowledge distillation was developed to automatically classify positive patches for each WSI and estimate the viable residual tumor percentages. Spatial heatmaps were output for model explanations and visualizations.
RESULTS: The approach achieved high concordance with pathologist consensus, with an R^2 of 0.8437, a RAcc_0.1 of 0.7586, a RAcc_0.3 of 0.9885, which were comparable to two senior pathologists (R^2 of 0.9202/0.9619, RAcc_0.1 of 8506/0.9425, RAcc_0.3 of 1.000/1.000) and surpassing two junior pathologists (R^2 of 0.5592/0.5474, RAcc_0.1 of 0.5287/0.5287, RAcc_0.3 of 0.9080/0.9310). Visualizations enabled the localization of residual viable tumor to augment microscopic assessment.
CONCLUSION: This work illustrates deep learning's potential for assisting pathological response assessment. Spatial heatmaps and patch examples provide intuitive explanations of model predictions, engendering clinical trust and adoption (Code and data will be available at https://github.com/WinnieLaugh/ESCC_Percentage once the paper has been conditionally accepted). Integrating interpretable computational pathology could help enhance the efficiency and consistency of tumor response assessment and empower precise oncology treatment decisions.
PMID:38632527 | DOI:10.1186/s12575-024-00234-5
Intracranial aneurysm detection: an object detection perspective
Int J Comput Assist Radiol Surg. 2024 Apr 17. doi: 10.1007/s11548-024-03132-z. Online ahead of print.
ABSTRACT
PURPOSE: Intracranial aneurysm detection from 3D Time-Of-Flight Magnetic Resonance Angiography images is a problem of increasing clinical importance. Recently, a streak of methods have shown promising performance by using segmentation neural networks. However, these methods may be less relevant in a clinical settings where diagnostic decisions rely on detecting objects rather than their segmentation.
METHODS: We introduce a 3D single-stage object detection method tailored for small object detection such as aneurysms. Our anchor-free method incorporates fast data annotation, adapted data sampling and generation to address class imbalance problem, and spherical representations for improved detection.
RESULTS: A comprehensive evaluation was conducted, comparing our method with the state-of-the-art SCPM-Net, nnDetection and nnUNet baselines, using two datasets comprising 402 subjects. The evaluation used adapted object detection metrics. Our method exhibited comparable or superior performance, with an average precision of 78.96%, sensitivity of 86.78%, and 0.53 false positives per case.
CONCLUSION: Our method significantly reduces the detection complexity compared to existing methods and highlights the advantages of object detection over segmentation-based approaches for aneurysm detection. It also holds potential for application to other small object detection problems.
PMID:38632166 | DOI:10.1007/s11548-024-03132-z
Tooth numbering and classification on bitewing radiographs: an artificial intelligence pilot study
Oral Surg Oral Med Oral Pathol Oral Radiol. 2024 Feb 20:S2212-4403(24)00070-1. doi: 10.1016/j.oooo.2024.02.012. Online ahead of print.
ABSTRACT
OBJECTIVE: The aim of this study is to assess the efficacy of employing a deep learning methodology for the automated identification and enumeration of permanent teeth in bitewing radiographs. The experimental procedures and techniques employed in this study are described in the following section.
STUDY DESIGN: A total of 1248 bitewing radiography images were annotated using the CranioCatch labeling program, developed in Eskişehir, Turkey. The dataset has been partitioned into 3 subsets: training (n = 1000, 80% of the total), validation (n = 124, 10% of the total), and test (n = 124, 10% of the total) sets. The images were subjected to a 3 × 3 clash operation in order to enhance the clarity of the labeled regions.
RESULTS: The F1, sensitivity and precision results of the artificial intelligence model obtained using the Yolov5 architecture in the test dataset were found to be 0.9913, 0.9954, and 0.9873, respectively.
CONCLUSION: The utilization of numerical identification for teeth within deep learning-based artificial intelligence algorithms applied to bitewing radiographs has demonstrated notable efficacy. The utilization of clinical decision support system software, which is augmented by artificial intelligence, has the potential to enhance the efficiency and effectiveness of dental practitioners.
PMID:38632035 | DOI:10.1016/j.oooo.2024.02.012
Multi-center Dose Prediction Using Attention-aware Deep learning Algorithm Based on Transformers for Cervical Cancer Radiotherapy
Clin Oncol (R Coll Radiol). 2024 Mar 26:S0936-6555(24)00119-5. doi: 10.1016/j.clon.2024.03.022. Online ahead of print.
ABSTRACT
AIMS: Accurate dose delivery is crucial for cervical cancer volumetric modulated arc therapy (VMAT). We aimed to develop a robust deep-learning (DL) algorithm for fast and accurate dose prediction of cervical cancer VMAT in multicenter datasets and then explore the feasibility of the DL algorithm to endometrial cancer VMAT with different prescriptions.
MATERIALS AND METHODS: We proposed the AtTranNet algorithm for three-dimensional dose prediction. A total of 367 cervical patients were enrolled in this study. Three hundred twenty-two cervical patients from 3 centers were randomly divided into 70%, 10%, and 20% as training, validation, and testing sets, respectively. Forty-five cervical patients from another center were selected for external testing. Moreover, 70 patients of endometrial cancer with different prescriptions were further selected to test the model. Prediction precision was evaluated by dosimetric difference, dose map, and dose-volume histogram metrics.
RESULTS: The prediction results were all clinically acceptable. The mean absolute error within the body in internal testing was 0.66 ± 0.63%. The maximum |δD| for planning target volume was observed in D98, which is 1.24 ± 2.73 Gy. The maximum |δD| for organs at risk was observed in Dmean of bladder, which is 4.79 ± 3.14 Gy. The maximum |δV| were observed in V40 of pelvic bones, which is 4.77 ± 4.48%.
CONCLUSION: AtTranNet showed the feasibility and reasonable accuracy in the dose prediction for cervical cancer in multiple centers. The model can also be generalized for endometrial cancer with different prescriptions without any transfer learning.
PMID:38631974 | DOI:10.1016/j.clon.2024.03.022
High-resolution 3T to 7T ADC map synthesis with a hybrid CNN-transformer model
Med Phys. 2024 Apr 17. doi: 10.1002/mp.17079. Online ahead of print.
ABSTRACT
BACKGROUND: 7 Tesla (7T) apparent diffusion coefficient (ADC) maps derived from diffusion-weighted imaging (DWI) demonstrate improved image quality and spatial resolution over 3 Tesla (3T) ADC maps. However, 7T magnetic resonance imaging (MRI) currently suffers from limited clinical unavailability, higher cost, and increased susceptibility to artifacts.
PURPOSE: To address these issues, we propose a hybrid CNN-transformer model to synthesize high-resolution 7T ADC maps from multimodal 3T MRI.
METHODS: The Vision CNN-Transformer (VCT), composed of both Vision Transformer (ViT) blocks and convolutional layers, is proposed to produce high-resolution synthetic 7T ADC maps from 3T ADC maps and 3T T1-weighted (T1w) MRI. ViT blocks enabled global image context while convolutional layers efficiently captured fine detail. The VCT model was validated on the publicly available Human Connectome Project Young Adult dataset, comprising 3T T1w, 3T DWI, and 7T DWI brain scans. The Diffusion Imaging in Python library was used to compute ADC maps from the DWI scans. A total of 171 patient cases were randomly divided into 130 training cases, 20 validation cases, and 21 test cases. The synthetic ADC maps were evaluated by comparing their similarity to the ground truth volumes with the following metrics: peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), and mean squared error (MSE). In addition, RESULTS: The results are as follows: PSNR: 27.0 ± 0.9 dB, SSIM: 0.945 ± 0.010, and MSE: 2.0E-3 ± 0.4E-3. Both qualitative and quantitative results demonstrate that VCT performs favorably against other state-of-the-art methods. We have introduced various efficiency improvements, including the implementation of flash attention and training on 176×208 resolution images. These enhancements have resulted in the reduction of parameters and training time per epoch by 50% in comparison to ResViT. Specifically, the training time per epoch has been shortened from 7.67 min to 3.86 min.
CONCLUSION: We propose a novel method to predict high-resolution 7T ADC maps from low-resolution 3T ADC maps and T1w MRI. Our predicted images demonstrate better spatial resolution and contrast compared to 3T MRI and prediction results made by ResViT and pix2pix. These high-quality synthetic 7T MR images could be beneficial for disease diagnosis and intervention, producing higher resolution and conformal contours, and as an intermediate step in generating synthetic CT for radiation therapy, especially when 7T MRI scanners are unavailable.
PMID:38630982 | DOI:10.1002/mp.17079
Prediction of remaining surgery duration in laparoscopic videos based on visual saliency and the transformer network
Int J Med Robot. 2024 Apr;20(2):e2632. doi: 10.1002/rcs.2632.
ABSTRACT
BACKGROUND: Real-time prediction of the remaining surgery duration (RSD) is important for optimal scheduling of resources in the operating room.
METHODS: We focus on the intraoperative prediction of RSD from laparoscopic video. An extensive evaluation of seven common deep learning models, a proposed one based on the Transformer architecture (TransLocal) and four baseline approaches, is presented. The proposed pipeline includes a CNN-LSTM for feature extraction from salient regions within short video segments and a Transformer with local attention mechanisms.
RESULTS: Using the Cholec80 dataset, TransLocal yielded the best performance (mean absolute error (MAE) = 7.1 min). For long and short surgeries, the MAE was 10.6 and 4.4 min, respectively. Thirty minutes before the end of surgery MAE = 6.2 min, 7.2 and 5.5 min for all long and short surgeries, respectively.
CONCLUSIONS: The proposed technique achieves state-of-the-art results. In the future, we aim to incorporate intraoperative indicators and pre-operative data.
PMID:38630888 | DOI:10.1002/rcs.2632
Evaluation of AlphaFold2 Structures for Hit Identification across Multiple Scenarios
J Chem Inf Model. 2024 Apr 17. doi: 10.1021/acs.jcim.3c01976. Online ahead of print.
ABSTRACT
The introduction of AlphaFold2 (AF2) has sparked significant enthusiasm and generated extensive discussion within the scientific community, particularly among drug discovery researchers. Although previous studies have addressed the performance of AF2 structures in virtual screening (VS), a more comprehensive investigation is still necessary considering the paramount importance of structural accuracy in drug design. In this study, we evaluate the performance of AF2 structures in VS across three common drug discovery scenarios: targets with holo, apo, and AF2 structures; targets with only apo and AF2 structures; and targets exclusively with AF2 structures. We utilized both the traditional physics-based Glide and the deep-learning-based scoring function RTMscore to rank the compounds in the DUD-E, DEKOIS 2.0, and DECOY data sets. The results demonstrate that, overall, the performance of VS on AF2 structures is comparable to that on apo structures but notably inferior to that on holo structures across diverse scenarios. Moreover, when a target has solely AF2 structure, selecting the holo structure of the target from different subtypes within the same protein family produces comparable results with the AF2 structure for VS on the data set of the AF2 structures, and significantly better results than the AF2 structures on its own data set. This indicates that utilizing AF2 structures for docking-based VS may not yield most satisfactory outcomes, even when solely AF2 structures are available. Moreover, we rule out the possibility that the variations in VS performance between the binding pockets of AF2 and holo structures arise from the differences in their biological assembly composition.
PMID:38630855 | DOI:10.1021/acs.jcim.3c01976
Deep learning assists detection of esophageal cancer and precursor lesions in a prospective, randomized controlled study
Sci Transl Med. 2024 Apr 17;16(743):eadk5395. doi: 10.1126/scitranslmed.adk5395. Epub 2024 Apr 17.
ABSTRACT
Endoscopy is the primary modality for detecting asymptomatic esophageal squamous cell carcinoma (ESCC) and precancerous lesions. Improving detection rate remains challenging. We developed a system based on deep convolutional neural networks (CNNs) for detecting esophageal cancer and precancerous lesions [high-risk esophageal lesions (HrELs)] and validated its efficacy in improving HrEL detection rate in clinical practice (trial registration ChiCTR2100044126 at www.chictr.org.cn). Between April 2021 and March 2022, 3117 patients ≥50 years old were consecutively recruited from Taizhou Hospital, Zhejiang Province, and randomly assigned 1:1 to an experimental group (CNN-assisted endoscopy) or a control group (unassisted endoscopy) based on block randomization. The primary endpoint was the HrEL detection rate. In the intention-to-treat population, the HrEL detection rate [28 of 1556 (1.8%)] was significantly higher in the experimental group than in the control group [14 of 1561 (0.9%), P = 0.029], and the experimental group detection rate was twice that of the control group. Similar findings were observed between the experimental and control groups [28 of 1524 (1.9%) versus 13 of 1534 (0.9%), respectively; P = 0.021]. The system's sensitivity, specificity, and accuracy for detecting HrELs were 89.7, 98.5, and 98.2%, respectively. No adverse events occurred. The proposed system thus improved HrEL detection rate during endoscopy and was safe. Deep learning assistance may enhance early diagnosis and treatment of esophageal cancer and may become a useful tool for esophageal cancer screening.
PMID:38630847 | DOI:10.1126/scitranslmed.adk5395
Short-term forecasting approach of single well production based on multi-intelligent agent hybrid model
PLoS One. 2024 Apr 17;19(4):e0301349. doi: 10.1371/journal.pone.0301349. eCollection 2024.
ABSTRACT
The short-term prediction of single well production can provide direct data support for timely guiding the optimization and adjustment of oil well production parameters and studying and judging oil well production conditions. In view of the coupling effect of complex factors on the daily output of a single well, a short-term prediction method based on a multi-agent hybrid model is proposed, and a short-term prediction process of single well output is constructed. First, CEEMDAN method is used to decompose and reconstruct the original data set, and the sliding window method is used to compose the data set with the obtained components. Features of components by decomposition are described as feature vectors based on values of fuzzy entropy and autocorrelation coefficient, through which those components are divided into two groups using cluster algorithm for prediction with two sub models. Optimized online sequential extreme learning machine and the deep learning model based on encoder-decoder structure using self-attention are developed as sub models to predict the grouped data, and the final predicted production comes from the sum of prediction values by sub models. The validity of this method for short-term production prediction of single well daily oil production is verified. The statistical value of data deviation and statistical test methods are introduced as the basis for comparative evaluation, and comparative models are used as the reference model to evaluate the prediction effect of the above multi-agent hybrid model. Results indicated that the proposed hybrid model has performed better with MAE value of 0.0935, 0.0694 and 0.0593 in three cases, respectively. By comparison, the short-term prediction method of single well production based on multi-agent hybrid model has considerably improved the statistical value of prediction deviation of selected oil well data in different periods. Through statistical test, the multi-agent hybrid model is superior to the comparative models. Therefore, the short-term prediction method of single well production based on a multi-agent hybrid model can effectively optimize oilfield production parameters and study and judge oil well production conditions.
PMID:38630729 | DOI:10.1371/journal.pone.0301349
A deep learning-based dynamic deformable adaptive framework for locating the root region of the dynamic flames
PLoS One. 2024 Apr 17;19(4):e0301839. doi: 10.1371/journal.pone.0301839. eCollection 2024.
ABSTRACT
Traditional optical flame detectors (OFDs) in flame detection are susceptible to environmental interference, which will inevitably cause detection errors and miscalculations when confronted with a complex environment. The conventional deep learning-based models can mitigate the interference of complex environments by flame image feature extraction, which significantly improves the precision of flame recognition. However, these models focus on identifying the general profile of the static flame, but neglect to effectively locate the source of the dynamic flame. Therefore, this paper proposes a novel dynamic flame detection method named Dynamic Deformable Adaptive Framework (DDAF) for locating the flame root region dynamically. Specifically, to address limitations in flame feature extraction of existing detection models, the Deformable Convolution Network v2 (DCNv2) is introduced for more flexible adaptation to the deformations and scale variations of target objects. The Context Augmentation Module (CAM) is used to convey flame features into Dynamic Head (DH) to feature extraction from different aspects. Subsequently, the Layer-Adaptive Magnitude-based Pruning (LAMP) where the connection with the smallest LAMP score is pruned sequentially is employed to further enhance the speed of model detection. More importantly, both the coarse- and fine-grained location techniques are designed in the Inductive Modeling (IM) to accurately delineate the flame root region for effective fire control. Additionally, the Temporal Consistency-based Detection (TCD) contributes to improving the robustness of model detection by leveraging the temporal information presented in consecutive frames of a video sequence. Compared with the classical deep learning method, the experimental results on the custom flame dataset demonstrate that the AP0.5 value is improved by 4.4%, while parameters and FLOPs are reduced by 25.3% and 25.9%, respectively. The framework of this research extends applicability to a variety of flame detection scenarios, including industrial safety and combustion process control.
PMID:38630706 | DOI:10.1371/journal.pone.0301839
Leveraging transfer learning with deep learning for crime prediction
PLoS One. 2024 Apr 17;19(4):e0296486. doi: 10.1371/journal.pone.0296486. eCollection 2024.
ABSTRACT
Crime remains a crucial concern regarding ensuring a safe and secure environment for the public. Numerous efforts have been made to predict crime, emphasizing the importance of employing deep learning approaches for precise predictions. However, sufficient crime data and resources for training state-of-the-art deep learning-based crime prediction systems pose a challenge. To address this issue, this study adopts the transfer learning paradigm. Moreover, this study fine-tunes state-of-the-art statistical and deep learning methods, including Simple Moving Averages (SMA), Weighted Moving Averages (WMA), Exponential Moving Averages (EMA), Long Short Term Memory (LSTM), Bi-directional Long Short Term Memory (BiLSTMs), and Convolutional Neural Networks and Long Short Term Memory (CNN-LSTM) for crime prediction. Primarily, this study proposed a BiLSTM based transfer learning architecture due to its high accuracy in predicting weekly and monthly crime trends. The transfer learning paradigm leverages the fine-tuned BiLSTM model to transfer crime knowledge from one neighbourhood to another. The proposed method is evaluated on Chicago, New York, and Lahore crime datasets. Experimental results demonstrate the superiority of transfer learning with BiLSTM, achieving low error values and reduced execution time. These prediction results can significantly enhance the efficiency of law enforcement agencies in controlling and preventing crime.
PMID:38630687 | DOI:10.1371/journal.pone.0296486
Large language models approach expert-level clinical knowledge and reasoning in ophthalmology: A head-to-head cross-sectional study
PLOS Digit Health. 2024 Apr 17;3(4):e0000341. doi: 10.1371/journal.pdig.0000341. eCollection 2024 Apr.
ABSTRACT
Large language models (LLMs) underlie remarkable recent advanced in natural language processing, and they are beginning to be applied in clinical contexts. We aimed to evaluate the clinical potential of state-of-the-art LLMs in ophthalmology using a more robust benchmark than raw examination scores. We trialled GPT-3.5 and GPT-4 on 347 ophthalmology questions before GPT-3.5, GPT-4, PaLM 2, LLaMA, expert ophthalmologists, and doctors in training were trialled on a mock examination of 87 questions. Performance was analysed with respect to question subject and type (first order recall and higher order reasoning). Masked ophthalmologists graded the accuracy, relevance, and overall preference of GPT-3.5 and GPT-4 responses to the same questions. The performance of GPT-4 (69%) was superior to GPT-3.5 (48%), LLaMA (32%), and PaLM 2 (56%). GPT-4 compared favourably with expert ophthalmologists (median 76%, range 64-90%), ophthalmology trainees (median 59%, range 57-63%), and unspecialised junior doctors (median 43%, range 41-44%). Low agreement between LLMs and doctors reflected idiosyncratic differences in knowledge and reasoning with overall consistency across subjects and types (p>0.05). All ophthalmologists preferred GPT-4 responses over GPT-3.5 and rated the accuracy and relevance of GPT-4 as higher (p<0.05). LLMs are approaching expert-level knowledge and reasoning skills in ophthalmology. In view of the comparable or superior performance to trainee-grade ophthalmologists and unspecialised junior doctors, state-of-the-art LLMs such as GPT-4 may provide useful medical advice and assistance where access to expert ophthalmologists is limited. Clinical benchmarks provide useful assays of LLM capabilities in healthcare before clinical trials can be designed and conducted.
PMID:38630683 | DOI:10.1371/journal.pdig.0000341
MMM and MMMSynth: Clustering of heterogeneous tabular data, and synthetic data generation
PLoS One. 2024 Apr 17;19(4):e0302271. doi: 10.1371/journal.pone.0302271. eCollection 2024.
ABSTRACT
We provide new algorithms for two tasks relating to heterogeneous tabular datasets: clustering, and synthetic data generation. Tabular datasets typically consist of heterogeneous data types (numerical, ordinal, categorical) in columns, but may also have hidden cluster structure in their rows: for example, they may be drawn from heterogeneous (geographical, socioeconomic, methodological) sources, such that the outcome variable they describe (such as the presence of a disease) may depend not only on the other variables but on the cluster context. Moreover, sharing of biomedical data is often hindered by patient confidentiality laws, and there is current interest in algorithms to generate synthetic tabular data from real data, for example via deep learning. We demonstrate a novel EM-based clustering algorithm, MMM ("Madras Mixture Model"), that outperforms standard algorithms in determining clusters in synthetic heterogeneous data, and recovers structure in real data. Based on this, we demonstrate a synthetic tabular data generation algorithm, MMMsynth, that pre-clusters the input data, and generates cluster-wise synthetic data assuming cluster-specific data distributions for the input columns. We benchmark this algorithm by testing the performance of standard ML algorithms when they are trained on synthetic data and tested on real published datasets. Our synthetic data generation algorithm outperforms other literature tabular-data generators, and approaches the performance of training purely with real data.
PMID:38630664 | DOI:10.1371/journal.pone.0302271
Mass-Produced Skin-Inspired Piezoresistive Sensing Array with Interlocking Interface for Object Recognition
ACS Nano. 2024 Apr 17. doi: 10.1021/acsnano.4c00112. Online ahead of print.
ABSTRACT
E-skins, capable of responding to mechanical stimuli, hold significant potential in the field of robot haptics. However, it is a challenge to obtain e-skins with both high sensitivity and mechanical stability. Here, we present a bioinspired piezoresistive sensor with hierarchical structures based on polyaniline/polystyrene core-shell nanoparticles polymerized on air-laid paper. The combination of laser-etched reusable templates and sensitive materials that can be rapidly synthesized enables large-scale production. Benefiting from the substantially enlarged deformation of the hierarchical structure, the developed piezoresistive electronics exhibit a decent sensitivity of 21.67 kPa-1 and a subtle detection limit of 3.4 Pa. Moreover, an isolation layer is introduced to enhance the interface stability of the e-skin, with a fracture limit of 66.34 N/m. Furthermore, the e-skin can be seamlessly integrated onto gloves without any detachment issues. With the assistance of deep learning, it achieves a 98% accuracy rate in object recognition. We anticipate that this strategy will render e-skin with more robust interfaces and heightened sensing capabilities, offering a favorable pathway for large-scale production.
PMID:38630641 | DOI:10.1021/acsnano.4c00112
Prediction model of measurement errors in current transformers based on deep learning
Rev Sci Instrum. 2024 Apr 1;95(4):044704. doi: 10.1063/5.0190206.
ABSTRACT
The long-term monitoring stability of electronic current transformers is crucial for accurately obtaining the current signal of the power grid. However, it is difficult to accurately distinguish between the fluctuation of non-stationary random signals on the primary side of the power grid and the gradual error of the transformers themselves. A current transformer error prediction model, CNN-MHA-BiLSTM, based on the golden jackal optimization (GJO) algorithm, which is used to obtain the optimal parameter values, bidirectional long short-term memory (BiLSTM) network, convolutional neural networks (CNNs), and multi-head attention (MHA), is proposed to address the difficulty of measuring error evaluation. This model can be used to determine the operation of transformers and can be widely applied to assist in determining the stability of transformer operation and early faults. First, CNN is used to mine the vertical detail features of error data at a certain moment, improving the speed of error prediction. Furthermore, a cascaded network with BiLSTM as the core is constructed to extract the horizontal historical features of the error data. The GJO algorithm is used to adjust the parameters of the BiLSTM model; optimize the hidden layer nodes, training frequency, and learning rate; and integrate MHA mechanism to promote the model to pay attention to the characteristic changes of the data in order to improve the accuracy of error prediction. Finally, this method is applied to the operation data of transformer in substations, and four time periods of data are selected to verify the model effectiveness of the current transformer dataset. The analysis results of single step and multi-step examples indicate that the proposed model has significant advantages in terms of accuracy and stability in error prediction.
PMID:38629931 | DOI:10.1063/5.0190206
Detection and Recognition of the Invasive species, Hylurgus ligniperda, in Traps, Based on a Cascaded Convolution Neural Network
Pest Manag Sci. 2024 Apr 17. doi: 10.1002/ps.8126. Online ahead of print.
ABSTRACT
BACKGROUND: Hylurgus ligniperda, an invasive species originating from Eurasia, is now a major forestry quarantine pest worldwide. In recent years, it has caused significant damage in China. While traps have been effective in monitoring and controlling pests, manual inspections are labor-intensive and require expertise in insect classification. To address this, we applied a two-stage cascade convolutional neural network, YOLOX-MobileNetV2 (YOLOX-Mnet), for identifying H. ligniperda and other pests captured in traps. This method streamlines target and non-target insect detection from trap images, offering a more efficient alternative to manual inspections.
RESULTS: Two cascade convolutional neural network models were employed in two stages to detect both target and non-target insects from images captured in the same forest. Initially, You Only Look Once X (YOLOX) served as the target detection model, identifying insects and non-insects from the collected images, with non-insect targets subsequently filtered out. In the second stage, MobileNetV2, a classification network, classified the captured insects. This approach effectively reduced false positives from non-insect objects, enabled the inclusion of additional classification terms for multi-class insect classification models, and utilized sample control strategies to enhance classification performance.
CONCLUSION: Application of the cascade convolutional neural network model accurately identified H. ligniperda, and Mean F1-score of all kinds of insects in the trap was 0.98. Compared to traditional insect classification, this method offers great improvement in the identification and early warning of forest pests, as well as provide technical support for the early prevention and control of forest pests. This article is protected by copyright. All rights reserved.
PMID:38629795 | DOI:10.1002/ps.8126
Fully automated explainable abdominal CT contrast media phase classification using organ segmentation and machine learning
Med Phys. 2024 Apr 17. doi: 10.1002/mp.17076. Online ahead of print.
ABSTRACT
BACKGROUND: Contrast-enhanced computed tomography (CECT) provides much more information compared to non-enhanced CT images, especially for the differentiation of malignancies, such as liver carcinomas. Contrast media injection phase information is usually missing on public datasets and not standardized in the clinic even in the same region and language. This is a barrier to effective use of available CECT images in clinical research.
PURPOSE: The aim of this study is to detect contrast media injection phase from CT images by means of organ segmentation and machine learning algorithms.
METHODS: A total number of 2509 CT images split into four subsets of non-contrast (class #0), arterial (class #1), venous (class #2), and delayed (class #3) after contrast media injection were collected from two CT scanners. Seven organs including the liver, spleen, heart, kidneys, lungs, urinary bladder, and aorta along with body contour masks were generated by pre-trained deep learning algorithms. Subsequently, five first-order statistical features including average, standard deviation, 10, 50, and 90 percentiles extracted from the above-mentioned masks were fed to machine learning models after feature selection and reduction to classify the CT images in one of four above mentioned classes. A 10-fold data split strategy was followed. The performance of our methodology was evaluated in terms of classification accuracy metrics.
RESULTS: The best performance was achieved by Boruta feature selection and RF model with average area under the curve of more than 0.999 and accuracy of 0.9936 averaged over four classes and 10 folds. Boruta feature selection selected all predictor features. The lowest classification was observed for class #2 (0.9888), which is already an excellent result. In the 10-fold strategy, only 33 cases from 2509 cases (∼1.4%) were misclassified. The performance over all folds was consistent.
CONCLUSIONS: We developed a fast, accurate, reliable, and explainable methodology to classify contrast media phases which may be useful in data curation and annotation in big online datasets or local datasets with non-standard or no series description. Our model containing two steps of deep learning and machine learning may help to exploit available datasets more effectively.
PMID:38629779 | DOI:10.1002/mp.17076
In Situ Root Dataset Expansion Strategy Based on an Improved CycleGAN Generator
Plant Phenomics. 2024 Feb 12;6:0148. doi: 10.34133/plantphenomics.0148. eCollection 2024.
ABSTRACT
The root system plays a vital role in plants' ability to absorb water and nutrients. In situ root research offers an intuitive approach to exploring root phenotypes and their dynamics. Deep-learning-based root segmentation methods have gained popularity, but they require large labeled datasets for training. This paper presents an expansion method for in situ root datasets using an improved CycleGAN generator. In addition, spatial-coordinate-based target background separation method is proposed, which solves the issue of background pixel variations caused by generator errors. Compared to traditional threshold segmentation methods, this approach demonstrates superior speed, accuracy, and stability. Moreover, through time-division soil image acquisition, diverse culture medium can be replaced in in situ root images, thereby enhancing dataset versatility. After validating the performance of the Improved_UNet network on the augmented dataset, the optimal results show a 0.63% increase in mean intersection over union, 0.41% in F1, and 0.04% in accuracy. In terms of generalization performance, the optimal results show a 33.6% increase in mean intersection over union, 28.11% in F1, and 2.62% in accuracy. The experimental results confirm the feasibility and practicality of the proposed dataset augmentation strategy. In the future, we plan to combine normal mapping with rendering software to achieve more accurate shading simulations of in situ roots. In addition, we aim to create a broader range of images that encompass various crop varieties and soil types.
PMID:38629084 | PMC:PMC11020132 | DOI:10.34133/plantphenomics.0148
Improved Transformer for Time Series Senescence Root Recognition
Plant Phenomics. 2024 Mar 28;6:0159. doi: 10.34133/plantphenomics.0159. eCollection 2024.
ABSTRACT
The root is an important organ for plants to obtain nutrients and water, and its phenotypic characteristics are closely related to its functions. Deep-learning-based high-throughput in situ root senescence feature extraction has not yet been published. In light of this, this paper suggests a technique based on the transformer neural network for retrieving cotton's in situ root senescence properties. High-resolution in situ root pictures with various levels of senescence are the main subject of the investigation. By comparing the semantic segmentation of the root system by general convolutional neural networks and transformer neural networks, SegFormer-UN (large) achieves the optimal evaluation metrics with mIoU, mRecall, mPrecision, and mF1 metric values of 81.52%, 86.87%, 90.98%, and 88.81%, respectively. The segmentation results indicate more accurate predictions at the connections of root systems in the segmented images. In contrast to 2 algorithms for cotton root senescence extraction based on deep learning and image processing, the in situ root senescence recognition algorithm using the SegFormer-UN model has a parameter count of 5.81 million and operates at a fast speed, approximately 4 min per image. It can accurately identify senescence roots in the image. We propose that the SegFormer-UN model can rapidly and nondestructively identify senescence root in in situ root images, providing important methodological support for efficient crop senescence research.
PMID:38629083 | PMC:PMC11018523 | DOI:10.34133/plantphenomics.0159
Fast and Efficient Root Phenotyping via Pose Estimation
Plant Phenomics. 2024 Apr 12;6:0175. doi: 10.34133/plantphenomics.0175. eCollection 2024.
ABSTRACT
Image segmentation is commonly used to estimate the location and shape of plants and their external structures. Segmentation masks are then used to localize landmarks of interest and compute other geometric features that correspond to the plant's phenotype. Despite its prevalence, segmentation-based approaches are laborious (requiring extensive annotation to train) and error-prone (derived geometric features are sensitive to instance mask integrity). Here, we present a segmentation-free approach that leverages deep learning-based landmark detection and grouping, also known as pose estimation. We use a tool originally developed for animal motion capture called SLEAP (Social LEAP Estimates Animal Poses) to automate the detection of distinct morphological landmarks on plant roots. Using a gel cylinder imaging system across multiple species, we show that our approach can reliably and efficiently recover root system topology at high accuracy, few annotated samples, and faster speed than segmentation-based approaches. In order to make use of this landmark-based representation for root phenotyping, we developed a Python library (sleap-roots) for trait extraction directly comparable to existing segmentation-based analysis software. We show that pose-derived root traits are highly accurate and can be used for common downstream tasks including genotype classification and unsupervised trait mapping. Altogether, this work establishes the validity and advantages of pose estimation-based plant phenotyping. To facilitate adoption of this easy-to-use tool and to encourage further development, we make sleap-roots, all training data, models, and trait extraction code available at: https://github.com/talmolab/sleap-roots and https://osf.io/k7j9g/.
PMID:38629082 | PMC:PMC11020144 | DOI:10.34133/plantphenomics.0175