Deep learning
Accelerated spine MRI with deep learning based image reconstruction: a prospective comparison with standard MRI
Acad Radiol. 2024 Nov 22:S1076-6332(24)00850-X. doi: 10.1016/j.acra.2024.11.004. Online ahead of print.
ABSTRACT
RATIONALE AND OBJECTIVES: To evaluate the performance of deep learning (DL) reconstructed MRI in terms of image acquisition time, overall image quality and diagnostic interchangeability compared to standard-of-care (SOC) MRI.
MATERIALS AND METHODS: This prospective study recruited participants between July 2023 and August 2023 who had spinal discomfort. All participants underwent two separate MRI examinations (Standard and accelerated scanning). Signal-to-noise ratios (SNR), contrast-to-noise ratios (CNR) and similarity metrics were calculated for quantitative evaluation. Four radiologists performed subjective quality and lesion characteristic assessment. Wilcoxon test was used to assess the differences of SNR, CNR and subjective image quality between DL and SOC. Various lesions of spine were also tested for interchangeability using individual equivalence index. Interreader and intrareader agreement and concordance (κ and Kendall τ and W statistics) were computed and McNemar tests were performed for comprehensive evaluation.
RESULTS: 200 participants (107 male patients, mean age 46.56 ± 17.07 years) were included. Compared with SOC, DL enabled scan time reduced by approximately 40%. The SNR and CNR of DL were significantly higher than those of SOC (P < 0.001). DL showed varying degrees of improvement (0-0.35) in each of similarity metrics. All absolute individual equivalence indexes were less than 4%, indicating interchangeability between SOC and DL. Kappa and Kendall showed a good to near-perfect agreement in range of 0.72-0.98. There is no difference between SOC and DL regarding subjective scoring and frequency of lesion detection.
CONCLUSION: Compared to SOC, DL provided high-quality image for diagnosis and reduced examination time for patients. DL was found to be interchangeable with SOC in detecting various spinal abnormalities.
PMID:39580249 | DOI:10.1016/j.acra.2024.11.004
Enhancing decision confidence in AI using Monte Carlo dropout for Raman spectra classification
Anal Chim Acta. 2024 Dec 15;1332:343346. doi: 10.1016/j.aca.2024.343346. Epub 2024 Oct 16.
ABSTRACT
BACKGROUND: Machine learning algorithms for bacterial strain identification using Raman spectroscopy have been widely used in microbiology. During the training phase, existing datasets are augmented and used to optimize model architecture and hyperparameters. After training, it is presumed that the models have reached their peak performance and are used for inference without being further enhanced. Our methodology combines Monte Carlo Dropout (MCD) with convolutional neural networks (CNNs) by utilizing dropout during the inference phase, which enables to measure the model uncertainty, a critical but often ignored aspect in deep learning models.
RESULTS: We categorize unseen input data into two subsets based on the uncertainty of their prediction by employing MCD and defining the threshold using the Gaussian Mixture Model (GMM). The final prediction is obtained on the subset of testing data that exhibits lower model uncertainty, thereby enhancing the reliability of the results. To validate our method, we applied it to two Raman spectra datasets. As a result, we have observed an increase in accuracy of 9 % for Dataset 1 (from 83.10 % to 92.10 %) and 12.82 % for Dataset 2 (from 83.86 % to 96.68 %). These improvements were observed within specific subsets of the data: 826 out of 1206 spectra in Dataset 1 and 1700 out of 3000 spectra in Dataset 2. This demonstrates the effectiveness of our approach in improving prediction accuracy by focusing on data with lower uncertainty.
SIGNIFICANCE: Different from routine prediction based on mere probabilities, we believe this uncertainty-guided prediction is more effective to ensure a high prediction rate rather than the prediction on the entire dataset. By guiding the decision-making of a model on higher-confidence subsets, our methodology can enhance the accuracy of classification in critical areas like disease diagnosis and safety monitoring. This targeted approach is to advance microbial identification and produces more trustworthy predictions.
PMID:39580162 | DOI:10.1016/j.aca.2024.343346
Rapid and accurate bacteria identification through deep-learning-based two-dimensional Raman spectroscopy
Anal Chim Acta. 2024 Dec 15;1332:343376. doi: 10.1016/j.aca.2024.343376. Epub 2024 Oct 29.
ABSTRACT
Surface-enhanced Raman spectroscopy (SERS) offers a distinctive vibrational fingerprint of the molecules and has led to widespread applications in medical diagnosis, biochemistry, and virology. With the rapid development of artificial intelligence (AI) technology, AI-enabled Raman spectroscopic techniques, as a promising avenue for biosensing applications, have significantly boosted bacteria identification. By converting spectra into images, the dataset is enriched with more detailed information, allowing AI to identify bacterial isolates with enhanced precision. However, previous studies usually suffer from a trade-off between high-resolution spectrograms for high-accuracy identification and short training time for data processing. Here, we present an efficient bacteria identification strategy that combines deep learning models with a spectrogram encoding algorithm based on wavelet packet transform and Gramian angular field techniques. In contrast to the direct analysis of raw Raman spectra, our approach utilizes wavelet packet transform techniques to compress the spectra by a factor of 1/15, while concurrently maintaining state-of-the-art accuracy by amplifying the subtle differences via Gramian angular field techniques. The results demonstrate that our approach can achieve a 99.64 % and a 90.55 % identification accuracy for two types of bacterial isolates and thirty types of bacterial isolates, respectively, while a 90 % reduction in training time compared to the conventional methods. To verify the model's stability, Gaussian noises were superimposed on the testing dataset, showing a specific generalization ability and superior performance. This algorithm has the potential for integration into on-site testing protocols and is readily updatable with new bacterial isolates. This study provides profound insights and contributes to the current understanding of spectroscopy, paving the way for accurate and rapid bacteria identification in diverse applications of environment monitoring, food safety, microbiology, and public health.
PMID:39580159 | DOI:10.1016/j.aca.2024.343376
Contextualizing predictive minds
Neurosci Biobehav Rev. 2024 Nov 21:105948. doi: 10.1016/j.neubiorev.2024.105948. Online ahead of print.
ABSTRACT
The structure of human memory seems to be optimized for efficient prediction, planning, and behavior. We propose that these capacities rely on a tripartite structure of memory that includes concepts, events, and contexts-three layers that constitute the mental world model. We suggest that the mechanism that critically increases adaptivity and flexibility is the tendency to contextualize. This tendency promotes local, context-encoding abstractions, which focus event- and concept-based planning and inference processes on the task and situation at hand. As a result, cognitive contextualization offers a solution to the frame problem-the need to select relevant features of the environment from the rich stream of sensorimotor signals. We draw evidence for our proposal from developmental psychology and neuroscience. Adopting a computational stance, we present evidence from cognitive modeling research which suggests that context sensitivity is a feature that is critical for maximizing the efficiency of cognitive processes. Finally, we turn to recent deep-learning architectures which independently demonstrate how context-sensitive memory can emerge in a self-organized learning system constrained with cognitively-inspired inductive biases.
PMID:39580009 | DOI:10.1016/j.neubiorev.2024.105948
Implementing deep learning on edge devices for snoring detection and reduction
Comput Biol Med. 2024 Nov 22;184:109458. doi: 10.1016/j.compbiomed.2024.109458. Online ahead of print.
ABSTRACT
This study introduces MinSnore, a novel deep learning model tailored for real-time snoring detection and reduction, specifically designed for deployment on low-configuration edge devices. By integrating MobileViTV3 blocks into the Dynamic MobileNetV3 backbone model architecture, MinSnore leverages both Convolutional Neural Networks (CNNs) and transformers to deliver enhanced feature representations with minimal computational overhead. The model was pre-trained on a diverse dataset of 46,349 audio files using the Self-Supervised Learning with Barlow Twins (SSL-BT) method, followed by fine-tuning on 17,355 segmented clips extracted from this dataset. MinSnore represents a significant breakthrough in snoring detection, achieving an accuracy of 96.37 %, precision of 96.31 %, recall of 94.12 %, and an F1-score of 95.02 %. When deployed on a single-board computer like a Raspberry Pi, the system demonstrated a reduction in snoring duration during real-world experiments. These results underscore the importance of this work in addressing sleep-related health issues through an efficient, low-cost, and highly accurate snoring mitigation solution.
PMID:39579667 | DOI:10.1016/j.compbiomed.2024.109458
Spatial resolution enhancement using deep learning improves chest disease diagnosis based on thick slice CT
NPJ Digit Med. 2024 Nov 23;7(1):335. doi: 10.1038/s41746-024-01338-8.
ABSTRACT
CT is crucial for diagnosing chest diseases, with image quality affected by spatial resolution. Thick-slice CT remains prevalent in practice due to cost considerations, yet its coarse spatial resolution may hinder accurate diagnoses. Our multicenter study develops a deep learning synthetic model with Convolutional-Transformer hybrid encoder-decoder architecture for generating thin-slice CT from thick-slice CT on a single center (1576 participants) and access the synthetic CT on three cross-regional centers (1228 participants). The qualitative image quality of synthetic and real thin-slice CT is comparable (p = 0.16). Four radiologists' accuracy in diagnosing community-acquired pneumonia using synthetic thin-slice CT surpasses thick-slice CT (p < 0.05), and matches real thin-slice CT (p > 0.99). For lung nodule detection, sensitivity with thin-slice CT outperforms thick-slice CT (p < 0.001) and comparable to real thin-slice CT (p > 0.05). These findings indicate the potential of our model to generate high-quality synthetic thin-slice CT as a practical alternative when real thin-slice CT is preferred but unavailable.
PMID:39580609 | DOI:10.1038/s41746-024-01338-8
Improved facial emotion recognition model based on a novel deep convolutional structure
Sci Rep. 2024 Nov 23;14(1):29050. doi: 10.1038/s41598-024-79167-8.
ABSTRACT
Facial Emotion Recognition (FER) is a very challenging task due to the varying nature of facial expressions, occlusions, illumination, pose variations, cultural and gender differences, and many other aspects that cause a drastic degradation in quality of facial images. In this paper, an anti-aliased deep convolution network (AA-DCN) model has been developed and proposed to explore how anti-aliasing can increase and improve recognition fidelity of facial emotions. The AA-DCN model detects eight distinct emotions from image data. Furthermore, their features have been extracted using the proposed model and numerous classical deep learning algorithms. The proposed AA-DCN model has been applied to three different datasets to evaluate its performance: The Cohn-Kanade Extending (CK+) database has been utilized, achieving an ultimate accuracy of 99.26% in (5 min, 25 s), the Japanese female facial expressions (JAFFE) obtained 98% accuracy in (8 min, 13 s), and on one of the most challenging FER datasets; the Real-world Affective Face (RAF) dataset; reached 82%, in low training time (12 min, 2s). The experimental results demonstrate that the anti-aliased DCN model is significantly increasing emotion recognition while improving the aliasing artifacts caused by the down-sampling layers.
PMID:39580589 | DOI:10.1038/s41598-024-79167-8
Medical language model specialized in extracting cardiac knowledge
Sci Rep. 2024 Nov 23;14(1):29059. doi: 10.1038/s41598-024-80165-z.
ABSTRACT
The advent of the Transformer has significantly altered the course of research in Natural Language Processing (NLP) within the domain of deep learning, making Transformer-based studies the mainstream in subsequent NLP research. There has also been considerable advancement in domain-specific NLP research, including the development of specialized language models for medical. These medical-specific language models were trained on medical data and demonstrated high performance. While these studies have treated the medical field as a single domain, in reality, medical is divided into multiple departments, each requiring a high level of expertise and treated as a unique domain. Recognizing this, our research focuses on constructing a model specialized for cardiology within the medical sector. Our study encompasses the creation of open-source datasets, training, and model evaluation in this nuanced domain.
PMID:39580531 | DOI:10.1038/s41598-024-80165-z
A 3D dental model dataset with pre/post-orthodontic treatment for automatic tooth alignment
Sci Data. 2024 Nov 23;11(1):1277. doi: 10.1038/s41597-024-04138-7.
ABSTRACT
Traditional orthodontic treatment relies on subjective estimations of orthodontists and iterative communication with technicians to achieve desired tooth alignments. This process is time-consuming, complex, and highly dependent on the orthodontist's experience. With the development of artificial intelligence, there's a growing interest in leveraging deep learning methods to achieve tooth alignment automatically. However, the absence of publicly available datasets containing pre/post-orthodontic 3D dental models has impeded the advancement of intelligent orthodontic solutions. To address this limitation, this paper proposes the first public 3D orthodontic dental dataset, comprising 1,060 pairs of pre/post-treatment dental models sourced from 435 patients. The proposed dataset encompasses 3D dental models with diverse malocclusion, e.g., tooth crowding, deep overbite, and deep overjet; and comprehensive professional annotations, including tooth segmentation labels, tooth position information, and crown landmarks. We also present technical validations for tooth alignment and orthodontic effect evaluation. The proposed dataset is expected to contribute to improving the efficiency and quality of target tooth position design in clinical orthodontic treatment utilizing deep learning methods.
PMID:39580508 | DOI:10.1038/s41597-024-04138-7
An ultrasonography of thyroid nodules dataset with pathological diagnosis annotation for deep learning
Sci Data. 2024 Nov 23;11(1):1272. doi: 10.1038/s41597-024-04156-5.
ABSTRACT
Ultrasonography (US) of thyroid nodules is often time consuming and may be inconsistent between observers, with a low positivity rate for malignancy in biopsies. Even after determining the ultrasound Thyroid Imaging Reporting and Data System (TIRADS) stage, Fine needle aspiration biopsy (FNAB) is still required to obtain a definitive diagnosis. Although various deep learning methods were developed in medical field, they tend to be trained using TI-RADS reports as image labels. Here, we present a large US dataset with pathological diagnosis annotation for each case, designed for developing deep learning algorithms to directly infer histological status from thyroid ultrasound images. The dataset was collected from two retrospective cohorts, which consists of 8508 US images from 842 cases. Additionally, we explained three deep learning models used as validation examples using this dataset.
PMID:39580501 | DOI:10.1038/s41597-024-04156-5
Enhancing advanced cervical cell categorization with cluster-based intelligent systems by a novel integrated CNN approach with skip mechanisms and GAN-based augmentation
Sci Rep. 2024 Nov 23;14(1):29040. doi: 10.1038/s41598-024-80260-1.
ABSTRACT
Cervical cancer is one of the biggest challenges in global health, thus it forms a critical need for early detection technologies that could improve patient prognosis and inform treatment decisions. This development in the form of an early detection mechanism increases the chances of successful treatment and survival, as early diagnosis promptly offers interventions that can dramatically reduce the rate of deaths attributed to this disease. Here, a customized Convolutional Neural Network (CNN) model is proposed for cervical cancerous cell detection. It includes three convolutional layers with increasing filter sizes and max-pooling layers, followed by dropout and dense layers for improved feature extraction and robust learning. By using ResNet models as inspiration, the model further innovates by incorporating skip connections into the CNN design. By enabling direct feature transmission from earlier to later layers, skip links enhance gradient flow and help preserve important spatial information. By boosting feature propagation, this integration increases the model's ability to recognize minute patterns in cervical cell images, hence increasing classification accuracy. In our methodology, the SIPaKMeD dataset has been employed which contains 4049 cervical cell images that are arranged into five different categories. To address class imbalance, Generative Adversarial Networks (GANs) have been applied for data augmentation; that is, synthetic images have been created, that improve the diversity of the dataset and further enhance the robustness of the same. The present model is astonishingly accurate in classifying five cervical cell types: koilocytes, superficial-intermediate, parabasal, dyskeratotic, and metaplastic, thus significantly enhancing early detection and diagnosis of cervical cancer. The model gives an excellent performance because it has a validation accuracy of 99.11% and a training accuracy of 99.82%. It is a reliable model in the diagnosis of cervical cancerous cells because it ensures advancement in the computer-assisted cervical cancer detection system.
PMID:39580498 | DOI:10.1038/s41598-024-80260-1
Deep learning based heat transfer simulation of the casting process
Sci Rep. 2024 Nov 23;14(1):29068. doi: 10.1038/s41598-024-80515-x.
ABSTRACT
To avoid the necessity of constitutional models, computational intensity, and the time-consuming nature inherent in numerical simulations, a pioneering approach utilizing deep learning techniques has been adopted to swiftly predict temperature fields during the solidification phase of casting processes. This methodology involves the development of rapid prediction models based on modified U-net network architectures, augmented by the integration of Inception and CBAM (Convolutional Block Attention Module) modules. The construction of the training set involved utilizing 200 diverse geometric models with each containing three kinds of components (casting, mold, and chill), where the temperature fields at a specific time, ti, were input data, while that of the subsequent time point, ti+1, served as the corresponding labels. The geometric models were generated by the erosion of 2D arbitrary shapes through an erosion method, and then their associated temperature fields were obtained via FDM-based numerical simulation. The trained deep learning models exhibit proficiency in promptly forecasting temperature fields during the solidification process for arbitrarily shaped castings at different times. The average accuracy of the predicted outcomes reaches 94.5% as the absolute temperature error set as 7 ℃ and the prediction just takes one second for a time step. Notably, these models are adept at handling multi-component with multi-materials within a geometry model, such as casting, chill, and mold corresponding to the intricate casting process.
PMID:39580492 | DOI:10.1038/s41598-024-80515-x
Integration of the bulk transcriptome and single-cell transcriptome reveals efferocytosis features in lung adenocarcinoma prognosis and immunotherapy by combining deep learning
Cancer Cell Int. 2024 Nov 23;24(1):388. doi: 10.1186/s12935-024-03571-3.
ABSTRACT
BACKGROUND: Efferocytosis (ER) refers to the process of phagocytic clearance of programmed dead cells, and studies have shown that it is closely related to tumor immune escape.
METHODS: This study was based on a comprehensive analysis of TCGA, GEO and CTRP databases. ER-related genes were collected from previous literature, univariate Cox regression was performed and consistent clustering was performed to categorize lung adenocarcinoma (LUAD) patients into two subgroups. Lasso regression and multivariate Cox regression analyses were used to construct ER-related prognostic features, and multiple immune infiltration algorithms were used to assess the correlation between the extracellular burial-related risk score (ERGRS) and tumor microenvironment (TME). And the key gene HAVCR1 was identified by deep learning, etc. Finally, pan-cancer analysis of the key genes was performed and in vitro experiments were conducted to verify the promotional effect of HAVCR1 on LUAD progression.
RESULTS: A total of 33 ER-related genes associated with the prognosis of LUAD were identified, and the prognostic signature of ERGRS was successfully constructed to predict the overall survival (OS) and treatment response of LUAD patients. The high-risk group was highly enriched in some oncogenic pathways, while the low-ERGRS group was highly enriched in some immune-related pathways. In addition, the high ERGRS group had higher TMB, TNB and TIDE scores and lower immune scores. The low-risk group had better immunotherapeutic response and less likelihood of immune escape. Drug sensitivity analysis revealed that BRD-K92856060, monensin and hexaminolevulinate may be potential therapeutic agents for the high-risk group. And ERGRS was validated in several cohorts. In addition, HAVCR1 is one of the key genes, and knockdown of HAVCR1 in vitro significantly reduced the proliferation, migration and invasion ability of lung adenocarcinoma cells.
CONCLUSION: Our study developed a novel prognostic signature of efferocytosis-related genes. This prognostic signature accurately predicted survival prognosis as well as treatment outcome in LUAD patients and explored the role of HAVCR1 in lung adenocarcinoma progression.
PMID:39580462 | DOI:10.1186/s12935-024-03571-3
Supervised multiple kernel learning approaches for multi-omics data integration
BioData Min. 2024 Nov 23;17(1):53. doi: 10.1186/s13040-024-00406-9.
ABSTRACT
BACKGROUND: Advances in high-throughput technologies have originated an ever-increasing availability of omics datasets. The integration of multiple heterogeneous data sources is currently an issue for biology and bioinformatics. Multiple kernel learning (MKL) has shown to be a flexible and valid approach to consider the diverse nature of multi-omics inputs, despite being an underused tool in genomic data mining.
RESULTS: We provide novel MKL approaches based on different kernel fusion strategies. To learn from the meta-kernel of input kernels, we adapted unsupervised integration algorithms for supervised tasks with support vector machines. We also tested deep learning architectures for kernel fusion and classification. The results show that MKL-based models can outperform more complex, state-of-the-art, supervised multi-omics integrative approaches.
CONCLUSION: Multiple kernel learning offers a natural framework for predictive models in multi-omics data. It proved to provide a fast and reliable solution that can compete with and outperform more complex architectures. Our results offer a direction for bio-data mining research, biomarker discovery and further development of methods for heterogeneous data integration.
PMID:39580456 | DOI:10.1186/s13040-024-00406-9
AI-powered detection and quantification of post-harvest physiological deterioration (PPD) in cassava using YOLO foundation models and K-means clustering
Plant Methods. 2024 Nov 23;20(1):178. doi: 10.1186/s13007-024-01309-w.
ABSTRACT
BACKGROUND: Post-harvest physiological deterioration (PPD) poses a significant challenge to the cassava industry, leading to substantial economic losses. This study aims to address this issue by developing a comprehensive framework in collaboration with cassava breeders.
RESULTS: Advanced deep learning (DL) techniques such as Segment Anything Model (SAM) and YOLO foundation models (YOLOv7, YOLOv8, YOLOv9, and YOLO-NAS), were used to accurately categorize PPD severity from RGB images captured by cameras or cell phones. YOLOv8 achieved the highest overall mean Average Precision (mAP) of 80.4%, demonstrating superior performance in detecting and classifying different PPD levels across all three models. Although YOLO-NAS had some instability during training, it demonstrated stronger performance in detecting the PPD_0 class, with a mAP of 91.3%. YOLOv7 exhibited the lowest performance across all classes, with an overall mAP of 75.5%. Despite challenges with similar color intensities in the image data, the combination of SAM, image processing techniques such as RGB color filtering, and machine learning (ML) algorithms was effective in removing yellow and gray color sections, significantly reducing the Mean Absolute Error (MAE) in PPD estimation from 20.01 to 15.50. Moreover, Artificial Intelligence (AI)-based algorithms allow for efficient analysis of large datasets, enabling rapid screening of cassava roots for PPD symptoms. This approach is much faster and more streamlined compared to the labor-intensive and time-consuming manual visual scoring methods.
CONCLUSION: These results highlight the significant advancements in PPD detection and quantification in cassava samples using cutting-edge AI techniques. The integration of YOLO foundation models, alongside SAM and image processing methods, has demonstrated promising precision even in scenarios where experts struggle to differentiate closely related classes. This AI-powered model not only effectively streamlines the PPD assessment in the pre-breeding pipeline but also enhances the overall effectiveness of cassava breeding programs, facilitating the selection of PPD-resistant varieties through controlled screening. By improving the precision of PPD assessments, this research contributes to the broader goal of enhancing cassava productivity, quality, and resilience, ultimately supporting global food security efforts.
PMID:39580444 | DOI:10.1186/s13007-024-01309-w
Screening for severe coronary stenosis in patients with apparently normal electrocardiograms based on deep learning
BMC Med Inform Decis Mak. 2024 Nov 22;24(1):355. doi: 10.1186/s12911-024-02764-0.
ABSTRACT
BACKGROUND: Patients with severe coronary arterystenosis may present with apparently normal electrocardiograms (ECGs), making it difficult to detect adverse health conditions during routine screenings or physical examinations. Consequently, these patients might miss the optimal window for treatment.
METHODS: We aimed to develop an effective model to distinguish severe coronary stenosis from no or mild coronary stenosis in patients with apparently normal ECGs. A total of 392 patients, including 138 with severe stenosis, were selected for the study. Deep learning (DL) models were trained from scratch and using pre-trained parameters via transfer learning. These models were evaluated based on ECG data alone and in combination with clinical information, including age, sex, hypertension, diabetes, dyslipidemia and smoking status.
RESULTS: We found that DL models trained from scratch using ECG data alone achieved a specificity of 74.6% but exhibited low sensitivity (54.5%), comparable to the performance of logistic regression using clinical data. Adding clinical information to the ECG DL model trained from scratch improved sensitivity (90.9%) but reduced specificity (42.3%). The best performance was achieved by combining clinical information with the ECG transfer learning model, resulting in an area under the receiver operating characteristic curve (AUC) of 0.847, with 84.8% sensitivity and 70.4% specificity.
CONCLUSIONS: The findings demonstrate the effectiveness of DL models in identifying severe coronary stenosis in patients with apparently normal ECGs and validate an efficient approach utilizing existing ECG models. By employing transfer learning techniques, we can extract "deep features" that summarize the inherent information of ECGs with relatively low computational expense.
PMID:39578851 | DOI:10.1186/s12911-024-02764-0
Multimodal machine learning for language and speech markers identification in mental health
BMC Med Inform Decis Mak. 2024 Nov 22;24(1):354. doi: 10.1186/s12911-024-02772-0.
ABSTRACT
BACKGROUND: There are numerous papers focusing on diagnosing mental health disorders using unimodal and multimodal approaches. However, our literature review shows that the majority of these studies either use unimodal approaches to diagnose a variety of mental disorders or employ multimodal approaches to diagnose a single mental disorder instead. In this research we combine these approaches by first identifying and compiling an extensive list of mental health disorder markers for a wide range of mental illnesses which have been used for both unimodal and multimodal methods, which is subsequently used for determining whether the multimodal approach can outperform the unimodal approaches.
METHODS: For this study we used the well known and robust multimodal DAIC-WOZ dataset derived from clinical interviews. Here we focus on the modalities text and audio. First, we constructed two unimodal models to analyze text and audio data, respectively, using feature extraction, based on the extensive list of mental disorder markers that has been identified and compiled by us using related and earlier studies. For our unimodal text model, we also propose an initial pragmatic binary label creation process. Then, we employed an early fusion strategy to combine our text and audio features before model processing. Our fused feature set was then given as input to various baseline machine and deep learning algorithms, including Support Vector Machines, Logistic Regressions, Random Forests, and fully connected neural network classifiers (Dense Layers). Ultimately, the performance of our models was evaluated using accuracy, AUC-ROC score, and two F1 metrics: one for the prediction of positive cases and one for the prediction of negative cases.
RESULTS: Overall, the unimodal text models achieved an accuracy ranging from 78% to 87% and an AUC-ROC score between 85% and 93%, while the unimodal audio models attained an accuracy of 64% to 72% and AUC-ROC scores of 53% to 75%. The experimental results indicated that our multimodal models achieved comparable accuracy (ranging from 80% to 87%) and AUC-ROC scores (between 84% and 93%) to those of the unimodal text models. However, the majority of the multimodal models managed to outperform the unimodal models in F1 scores, particularly in the F1 score of the positive class (F1 of 1s), which reflects how well the models perform in identifying the presence of a marker.
CONCLUSIONS: In conclusion, by refining the binary label creation process and by improving the feature engineering process of the unimodal acoustic model, we argue that the multimodal model can outperform both unimodal approaches. This study underscores the importance of multimodal integration in the field of mental health diagnostics and sets the stage for future research to explore more sophisticated fusion techniques and deeper learning models.
PMID:39578814 | DOI:10.1186/s12911-024-02772-0
miRStart 2.0: enhancing miRNA regulatory insights through deep learning-based TSS identification
Nucleic Acids Res. 2024 Nov 23:gkae1086. doi: 10.1093/nar/gkae1086. Online ahead of print.
ABSTRACT
MicroRNAs (miRNAs) are small non-coding RNAs that regulate gene expression by binding to the 3'-untranslated regions of target mRNAs, influencing various biological processes at the post-transcriptional level. Identifying miRNA transcription start sites (TSSs) and transcription factors' (TFs) regulatory roles is crucial for elucidating miRNA function and transcriptional regulation. miRStart 2.0 integrates over 4500 high-throughput datasets across five data types, utilizing a multi-modal approach to annotate 28 828 putative TSSs for 1745 human and 1181 mouse miRNAs, supported by sequencing-based signals. Over 6 million tissue-specific TF-miRNA interactions, integrated from ChIP-seq data, are supplemented by DNase hypersensitivity and UCSC conservation data, with network visualizations. Our deep learning-based model outperforms existing tools in miRNA TSS prediction, achieving the most overlaps with both cell-specific and non-cell-specific validated TSSs. The user-friendly web interface and visualization tools make miRStart 2.0 easily accessible to researchers, enabling efficient identification of miRNA upstream regulatory elements in relation to their TSSs. This updated database provides systems-level insights into gene regulation and disease mechanisms, offering a valuable resource for translational research, facilitating the discovery of novel therapeutic targets and precision medicine strategies. miRStart 2.0 is now accessible at https://awi.cuhk.edu.cn/∼miRStart2.
PMID:39578697 | DOI:10.1093/nar/gkae1086
Human essential gene identification based on feature fusion and feature screening
IET Syst Biol. 2024 Nov 22. doi: 10.1049/syb2.12105. Online ahead of print.
ABSTRACT
Essential genes are necessary to sustain the life of a species under adequate nutritional conditions. These genes have attracted significant attention for their potential as drug targets, especially in developing broad-spectrum antibacterial drugs. However, studying essential genes remains challenging due to their variability in specific environmental conditions. In this study, the authors aim to develop a powerful prediction model for identifying essential genes in humans. The authors first obtained the essential gene data from human cancer cell lines and characterised gene sequences using 7 feature encoding methods such as Kmer, the Composition of K-spaced Nucleic Acid Pairs, and Z-curve. Subsequently, feature fusion and feature optimisation strategies were employed to select the impactful features. Finally, machine learning algorithms were applied to construct the prediction models and evaluate their performance. The single-feature-based model achieved the highest area under the Receiver Operating Characteristic curve (AUC) of 0.830. After fusing and filtering these features, the classical machine learning models achieved the highest AUC at 0.823 while the deep learning model reached 0.860. Results obtained by the authors show that compared to using individual features, feature fusion and feature optimisation strategies significantly improved model performance. Moreover, the study provided an advantageous method for essential gene identification compared to other methods.
PMID:39578676 | DOI:10.1049/syb2.12105
Crop classification in the middle reaches of the Hei River based on model transfer
Sci Rep. 2024 Nov 22;14(1):28964. doi: 10.1038/s41598-024-80327-z.
ABSTRACT
Crop classification using remote sensing technology is highly important for monitoring agricultural resources and managing water usage, especially in water-scarce regions like the Hei River. Crop classification requires a substantial number of labeled samples, but the collection of labeled samples demands significant resources and sample data may not be available for some years. To classify crops in sample-free years in the middle reaches of the Hei River, we generated multisource spectral data (MSSD) based on a spectral library and sample data. We pre-trained a model using labeled samples, followed by fine-tuning the model with MSSD to complete the crop classification for the years without samples. We conduct experiments using three CNN-based deep learning models and a machine learning model (RF). The experimental results indicate that in the model transfer experiments, using a fine-tuned model yields accurate classification results, with overall accuracy exceeding 90%. When the amount of labeled sample data is limited, fine-tuning the model based on MSSD can enhance the accuracy of crop classification. Overall, fine-tuning models based on MSSD can significantly enhance the accuracy of model transfer and reduce the reliance of deep learning models on large-scale sample datasets. The method to classify crops in the middle reaches of the Hei River can provide data support for local resource utilization and policy formulation.
PMID:39578651 | DOI:10.1038/s41598-024-80327-z