Deep learning
Automatic classification of spinal osteosarcoma and giant cell tumor of bone using optimized DenseNet
J Bone Oncol. 2024 May 11;46:100606. doi: 10.1016/j.jbo.2024.100606. eCollection 2024 Jun.
ABSTRACT
OBJECTIVE: This study aims to explore an optimized deep-learning model for automatically classifying spinal osteosarcoma and giant cell tumors. In particular, it aims to provide a reliable method for distinguishing between these challenging diagnoses in medical imaging.
METHODS: This research employs an optimized DenseNet model with a self-attention mechanism to enhance feature extraction capabilities and reduce misclassification in differentiating spinal osteosarcoma and giant cell tumors. The model utilizes multi-scale feature map extraction for improved classification accuracy. The paper delves into the practical use of Gradient-weighted Class Activation Mapping (Grad-CAM) for enhancing medical image classification, specifically focusing on its application in diagnosing spinal osteosarcoma and giant cell tumors. The results demonstrate that the implementation of Grad-CAM visualization techniques has improved the performance of the deep learning model, resulting in an overall accuracy of 85.61%. Visualizations of images for these medical conditions using Grad-CAM, with corresponding class activation maps that indicate the tumor regions where the model focuses during predictions.
RESULTS: The model achieves an overall accuracy of 80% or higher, with sensitivity exceeding 80% and specificity surpassing 80%. The average area under the curve AUC for spinal osteosarcoma and giant cell tumors is 0.814 and 0.882, respectively. The model significantly supports orthopedics physicians in developing treatment and care plans.
CONCLUSION: The DenseNet-based automatic classification model accurately distinguishes spinal osteosarcoma from giant cell tumors. This study contributes to medical image analysis, providing a valuable tool for clinicians in accurate diagnostic classification. Future efforts will focus on expanding the dataset and refining the algorithm to enhance the model's applicability in diverse clinical settings.
PMID:38778836 | PMC:PMC11109027 | DOI:10.1016/j.jbo.2024.100606
Computer-vision-based artificial intelligence for detection and recognition of instruments and organs during radical laparoscopic gastrectomy for gastric cancer: a multicenter study
Zhonghua Wei Chang Wai Ke Za Zhi. 2024 May 25;27(5):464-470. doi: 10.3760/cma.j.cn441530-20240125-00041.
ABSTRACT
Objective: To investigate the feasibility and accuracy of computer vision-based artificial intelligence technology in detecting and recognizing instruments and organs in the scenario of radical laparoscopic gastrectomy for gastric cancer. Methods: Eight complete laparoscopic distal radical gastrectomy surgery videos were collected from four large tertiary hospitals in China (First Medical Center of Chinese PLA General Hospital [three cases], Liaoning Cancer Hospital [two cases], Liyang Branch of Jiangsu Province People's Hospital [two cases], and Fudan University Shanghai Cancer Center [one case]). PR software was used to extract frames every 5-10 seconds and convert them into image frames. To ensure quality, deduplication was performed manually to remove obvious duplication and blurred image frames. After conversion and deduplication, there were 3369 frame images with a resolution of 1,920×1,080 PPI. LabelMe was used for instance segmentation of the images into the following 23 categories: veins, arteries, sutures, needle holders, ultrasonic knives, suction devices, bleeding, colon, forceps, gallbladder, small gauze, Hem-o-lok, Hem-o-lok appliers, electrocautery hooks, small intestine, hepatogastric ligaments, liver, omentum, pancreas, spleen, surgical staplers, stomach, and trocars. The frame images were randomly allocated to training and validation sets in a 9:1 ratio. The YOLOv8 deep learning framework was used for model training and validation. Precision, recall, average precision (AP), and mean average precision (mAP) were used to evaluate detection and recognition accuracy. Results: The training set contained 3032 frame images comprising 30 895 instance segmentation counts across 23 categories. The validation set contained 337 frame images comprising 3407 instance segmentation counts. The YOLOv8m model was used for training. The loss curve of the training set showed a smooth gradual decrease in loss value as the number of iteration calculations increased. In the training set, the AP values of all 23 categories were above 0.90, with a mAP of 0.99, whereas in the validation set, the mAP of the 23 categories was 0.82. As to individual categories, the AP values for ultrasonic knives, needle holders, forceps, gallbladders, small pieces of gauze, and surgical staplers were 0.96, 0.94, 0.91, 0.91, 0.91, and 0.91, respectively. The model successfully inferred and applied to a 5-minutes video segment of laparoscopic gastroenterostomy suturing. Conclusion: The primary finding of this multicenter study is that computer vision can efficiently, accurately, and in real-time detect organs and instruments in various scenarios of radical laparoscopic gastrectomy for gastric cancer.
PMID:38778686 | DOI:10.3760/cma.j.cn441530-20240125-00041
Enhancing clinical utility: deep learning-based embryo scoring model for non-invasive aneuploidy prediction
Reprod Biol Endocrinol. 2024 May 22;22(1):58. doi: 10.1186/s12958-024-01230-w.
ABSTRACT
BACKGROUND: The best method for selecting embryos ploidy is preimplantation genetic testing for aneuploidies (PGT-A). However, it takes more labour, money, and experience. As such, more approachable, non- invasive techniques were still needed. Analyses driven by artificial intelligence have been presented recently to automate and objectify picture assessments.
METHODS: In present retrospective study, a total of 3448 biopsied blastocysts from 979 Time-lapse (TL)-PGT cycles were retrospectively analyzed. The "intelligent data analysis (iDA) Score" as a deep learning algorithm was used in TL incubators and assigned each blastocyst with a score between 1.0 and 9.9.
RESULTS: Significant differences were observed in iDAScore among blastocysts with different ploidy. Additionally, multivariate logistic regression analysis showed that higher scores were significantly correlated with euploidy (p < 0.001). The Area Under the Curve (AUC) of iDAScore alone for predicting euploidy embryo is 0.612, but rose to 0.688 by adding clinical and embryonic characteristics.
CONCLUSIONS: This study provided additional information to strengthen the clinical applicability of iDAScore. This may provide a non-invasive and inexpensive alternative for patients who have no available blastocyst for biopsy or who are economically disadvantaged. However, the accuracy of embryo ploidy is still dependent on the results of next-generation sequencing technology (NGS) analysis.
PMID:38778410 | DOI:10.1186/s12958-024-01230-w
Testing the generalizability and effectiveness of deep learning models among clinics: sperm detection as a pilot study
Reprod Biol Endocrinol. 2024 May 22;22(1):59. doi: 10.1186/s12958-024-01232-8.
ABSTRACT
BACKGROUND: Deep learning has been increasingly investigated for assisting clinical in vitro fertilization (IVF). The first technical step in many tasks is to visually detect and locate sperm, oocytes, and embryos in images. For clinical deployment of such deep learning models, different clinics use different image acquisition hardware and different sample preprocessing protocols, raising the concern over whether the reported accuracy of a deep learning model by one clinic could be reproduced in another clinic. Here we aim to investigate the effect of each imaging factor on the generalizability of object detection models, using sperm analysis as a pilot example.
METHODS: Ablation studies were performed using state-of-the-art models for detecting human sperm to quantitatively assess how model precision (false-positive detection) and recall (missed detection) were affected by imaging magnification, imaging mode, and sample preprocessing protocols. The results led to the hypothesis that the richness of image acquisition conditions in a training dataset deterministically affects model generalizability. The hypothesis was tested by first enriching the training dataset with a wide range of imaging conditions, then validated through internal blind tests on new samples and external multi-center clinical validations.
RESULTS: Ablation experiments revealed that removing subsets of data from the training dataset significantly reduced model precision. Removing raw sample images from the training dataset caused the largest drop in model precision, whereas removing 20x images caused the largest drop in model recall. by incorporating different imaging and sample preprocessing conditions into a rich training dataset, the model achieved an intraclass correlation coefficient (ICC) of 0.97 (95% CI: 0.94-0.99) for precision, and an ICC of 0.97 (95% CI: 0.93-0.99) for recall. Multi-center clinical validation showed no significant differences in model precision or recall across different clinics and applications.
CONCLUSIONS: The results validated the hypothesis that the richness of data in the training dataset is a key factor impacting model generalizability. These findings highlight the importance of diversity in a training dataset for model evaluation and suggest that future deep learning models in andrology and reproductive medicine should incorporate comprehensive feature sets for enhanced generalizability across clinics.
PMID:38778327 | DOI:10.1186/s12958-024-01232-8
The innovation of AI-based software in oral diseases: clinical-histopathological correlation diagnostic accuracy primary study
BMC Oral Health. 2024 May 22;24(1):598. doi: 10.1186/s12903-024-04347-x.
ABSTRACT
BACKGROUND: Machine learning (ML) through artificial intelligence (AI) could provide clinicians and oral pathologists to advance diagnostic problems in the field of potentially malignant lesions, oral cancer, periodontal diseases, salivary gland disease, oral infections, immune-mediated disease, and others. AI can detect micro-features beyond human eyes and provide solution in critical diagnostic cases.
OBJECTIVE: The objective of this study was developing a software with all needed feeding data to act as AI-based program to diagnose oral diseases. So our research question was: Can we develop a Computer-Aided Software for accurate diagnosis of oral diseases based on clinical and histopathological data inputs?
METHOD: The study sample included clinical images, patient symptoms, radiographic images, histopathological images and texts for the oral diseases of interest in the current study (premalignant lesions, oral cancer, salivary gland neoplasms, immune mediated oral mucosal lesions, oral reactive lesions) total oral diseases enrolled in this study was 28 diseases retrieved from the archives of oral maxillofacial pathology department. Total 11,200 texts and 3000 images (2800 images were used for training data to the program and 100 images were used as test data to the program and 100 cases for calculating accuracy, sensitivity& specificity).
RESULTS: The correct diagnosis rates for group 1 (software users), group 2 (microscopic users) and group 3 (hybrid) were 87%, 90.6, 95% respectively. The reliability for inter-observer value was done by calculating Cronbach's alpha and interclass correlation coefficient. The test revealed for group 1, 2 and 3 the following values respectively 0.934, 0.712 & 0.703. All groups showed acceptable reliability especially for Diagnosis Oral Diseases Software (DODS) that revealed higher reliability value than other groups. However, The accuracy, sensitivity & specificity of this software was lower than those of oral pathologists (master's degree).
CONCLUSION: The correct diagnosis rate of DODS was comparable to oral pathologists using standard microscopic examination. The DODS program could be utilized as diagnostic guidance tool with high reliability & accuracy.
PMID:38778322 | DOI:10.1186/s12903-024-04347-x
Automated tear film break-up time measurement for dry eye diagnosis using deep learning
Sci Rep. 2024 May 22;14(1):11723. doi: 10.1038/s41598-024-62636-5.
ABSTRACT
In the realm of ophthalmology, precise measurement of tear film break-up time (TBUT) plays a crucial role in diagnosing dry eye disease (DED). This study aims to introduce an automated approach utilizing artificial intelligence (AI) to mitigate subjectivity and enhance the reliability of TBUT measurement. We employed a dataset of 47 slit lamp videos for development, while a test dataset of 20 slit lamp videos was used for evaluating the proposed approach. The multistep approach for TBUT estimation involves the utilization of a Dual-Task Siamese Network for classifying video frames into tear film breakup or non-breakup categories. Subsequently, a postprocessing step incorporates a Gaussian filter to smooth the instant breakup/non-breakup predictions effectively. Applying a threshold to the smoothed predictions identifies the initiation of tear film breakup. Our proposed method demonstrates on the evaluation dataset a precise breakup/non-breakup classification of video frames, achieving an Area Under the Curve of 0.870. At the video level, we observed a strong Pearson correlation coefficient (r) of 0.81 between TBUT assessments conducted using our approach and the ground truth. These findings underscore the potential of AI-based approaches in quantifying TBUT, presenting a promising avenue for advancing diagnostic methodologies in ophthalmology.
PMID:38778145 | DOI:10.1038/s41598-024-62636-5
A deep learning model for brain segmentation across pediatric and adult populations
Sci Rep. 2024 May 22;14(1):11735. doi: 10.1038/s41598-024-61798-6.
ABSTRACT
Automated quantification of brain tissues on MR images has greatly contributed to the diagnosis and follow-up of neurological pathologies across various life stages. However, existing solutions are specifically designed for certain age ranges, limiting their applicability in monitoring brain development from infancy to late adulthood. This retrospective study aims to develop and validate a brain segmentation model across pediatric and adult populations. First, we trained a deep learning model to segment tissues and brain structures using T1-weighted MR images from 390 patients (age range: 2-81 years) across four different datasets. Subsequently, the model was validated on a cohort of 280 patients from six distinct test datasets (age range: 4-90 years). In the initial experiment, the proposed deep learning-based pipeline, icobrain-dl, demonstrated segmentation accuracy comparable to both pediatric and adult-specific models across diverse age groups. Subsequently, we evaluated intra- and inter-scanner variability in measurements of various tissues and structures in both pediatric and adult populations computed by icobrain-dl. Results demonstrated significantly higher reproducibility compared to similar brain quantification tools, including childmetrix, FastSurfer, and the medical device icobrain v5.9 (p-value< 0.01). Finally, we explored the potential clinical applications of icobrain-dl measurements in diagnosing pediatric patients with Cerebral Visual Impairment and adult patients with Alzheimer's Disease.
PMID:38778071 | DOI:10.1038/s41598-024-61798-6
A bi-directional segmentation method for prostate ultrasound images under semantic constraints
Sci Rep. 2024 May 22;14(1):11701. doi: 10.1038/s41598-024-61238-5.
ABSTRACT
Due to the lack of sufficient labeled data for the prostate and the extensive and complex semantic information in ultrasound images, accurately and quickly segmenting the prostate in transrectal ultrasound (TRUS) images remains a challenging task. In this context, this paper proposes a solution for TRUS image segmentation using an end-to-end bidirectional semantic constraint method, namely the BiSeC model. The experimental results show that compared with classic or popular deep learning methods, this method has better segmentation performance, with the Dice Similarity Coefficient (DSC) of 96.74% and the Intersection over Union (IoU) of 93.71%. Our model achieves a good balance between actual boundaries and noise areas, reducing costs while ensuring the accuracy and speed of segmentation.
PMID:38778034 | DOI:10.1038/s41598-024-61238-5
BranchLabelNet: Anatomical Human Airway Labeling Approach using a Dividing-and-Grouping Multi-Label Classification
Med Biol Eng Comput. 2024 May 23. doi: 10.1007/s11517-024-03119-7. Online ahead of print.
ABSTRACT
Anatomical airway labeling is crucial for precisely identifying airways displaying symptoms such as constriction, increased wall thickness, and modified branching patterns, facilitating the diagnosis and treatment of pulmonary ailments. This study introduces an innovative airway labeling methodology, BranchLabelNet, which accounts for the fractal nature of airways and inherent hierarchical branch nomenclature. In developing this methodology, branch-related parameters, including position vectors, generation levels, branch lengths, areas, perimeters, and more, are extracted from a dataset of 1000 chest computed tomography (CT) images. To effectively manage this intricate branch data, we employ an n-ary tree structure that captures the complicated relationships within the airway tree. Subsequently, we employ a divide-and-group deep learning approach for multi-label classification, streamlining the anatomical airway branch labeling process. Additionally, we address the challenge of class imbalance in the dataset by incorporating the Tomek Links algorithm to maintain model reliability and accuracy. Our proposed airway labeling method provides robust branch designations and achieves an impressive average classification accuracy of 95.94% across fivefold cross-validation. This approach is adaptable for addressing similar complexities in general multi-label classification problems within biomedical systems.
PMID:38777935 | DOI:10.1007/s11517-024-03119-7
Standalone deep learning versus experts for diagnosis lung cancer on chest computed tomography: a systematic review
Eur Radiol. 2024 May 22. doi: 10.1007/s00330-024-10804-6. Online ahead of print.
ABSTRACT
PURPOSE: To compare the diagnostic performance of standalone deep learning (DL) algorithms and human experts in lung cancer detection on chest computed tomography (CT) scans.
MATERIALS AND METHODS: This study searched for studies on PubMed, Embase, and Web of Science from their inception until November 2023. We focused on adult lung cancer patients and compared the efficacy of DL algorithms and expert radiologists in disease diagnosis on CT scans. Quality assessment was performed using QUADAS-2, QUADAS-C, and CLAIM. Bivariate random-effects and subgroup analyses were performed for tasks (malignancy classification vs invasiveness classification), imaging modalities (CT vs low-dose CT [LDCT] vs high-resolution CT), study region, software used, and publication year.
RESULTS: We included 20 studies on various aspects of lung cancer diagnosis on CT scans. Quantitatively, DL algorithms exhibited superior sensitivity (82%) and specificity (75%) compared to human experts (sensitivity 81%, specificity 69%). However, the difference in specificity was statistically significant, whereas the difference in sensitivity was not statistically significant. The DL algorithms' performance varied across different imaging modalities and tasks, demonstrating the need for tailored optimization of DL algorithms. Notably, DL algorithms matched experts in sensitivity on standard CT, surpassing them in specificity, but showed higher sensitivity with lower specificity on LDCT scans.
CONCLUSION: DL algorithms demonstrated improved accuracy over human readers in malignancy and invasiveness classification on CT scans. However, their performance varies by imaging modality, underlining the importance of continued research to fully assess DL algorithms' diagnostic effectiveness in lung cancer.
CLINICAL RELEVANCE STATEMENT: DL algorithms have the potential to refine lung cancer diagnosis on CT, matching human sensitivity and surpassing in specificity. These findings call for further DL optimization across imaging modalities, aiming to advance clinical diagnostics and patient outcomes.
KEY POINTS: Lung cancer diagnosis by CT is challenging and can be improved with AI integration. DL shows higher accuracy in lung cancer detection on CT than human experts. Enhanced DL accuracy could lead to improved lung cancer diagnosis and outcomes.
PMID:38777902 | DOI:10.1007/s00330-024-10804-6
A Self-supervised Learning-Based Fine-Grained Classification Model for Distinguishing Malignant From Benign Subcentimeter Solid Pulmonary Nodules
Acad Radiol. 2024 May 21:S1076-6332(24)00287-3. doi: 10.1016/j.acra.2024.05.002. Online ahead of print.
ABSTRACT
RATIONALE AND OBJECTIVES: Diagnosing subcentimeter solid pulmonary nodules (SSPNs) remains challenging in clinical practice. Deep learning may perform better than conventional methods in differentiating benign and malignant pulmonary nodules. This study aimed to develop and validate a model for differentiating malignant and benign SSPNs using CT images.
MATERIALS AND METHODS: This retrospective study included consecutive patients with SSPNs detected between January 2015 and October 2021 as an internal dataset. Malignancy was confirmed pathologically; benignity was confirmed pathologically or via follow-up evaluations. The SSPNs were segmented manually. A self-supervision pre-training-based fine-grained network was developed for predicting SSPN malignancy. The pre-trained model was established using data from the National Lung Screening Trial, Lung Nodule Analysis 2016, and a database of 5478 pulmonary nodules from the previous study, with subsequent fine-tuning using the internal dataset. The model's efficacy was investigated using an external cohort from another center, and its accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC) were determined.
RESULTS: Overall, 1276 patients (mean age, 56 ± 10 years; 497 males) with 1389 SSPNs (mean diameter, 7.5 ± 2.0 mm; 625 benign) were enrolled. The internal dataset was specifically enriched for malignancy. The model's performance in the internal testing set (316 SSPNs) was: AUC, 0.964 (95% confidence interval (95%CI): 0.942-0.986); accuracy, 0.934; sensitivity, 0.965; and specificity, 0.908. The model's performance in the external test set (202 SSPNs) was: AUC, 0.945 (95% CI: 0.910-0.979); accuracy, 0.911; sensitivity, 0.977; and specificity, 0.860.
CONCLUSION: This deep learning model was robust and exhibited good performance in predicting the malignancy of SSPNs, which could help optimize patient management.
PMID:38777719 | DOI:10.1016/j.acra.2024.05.002
Development and validation of a reliable method for automated measurements of psoas muscle volume in CT scans using deep learning-based segmentation: a cross-sectional study
BMJ Open. 2024 May 22;14(5):e079417. doi: 10.1136/bmjopen-2023-079417.
ABSTRACT
OBJECTIVES: We aimed to develop an automated method for measuring the volume of the psoas muscle using CT to aid sarcopenia research efficiently.
METHODS: We used a data set comprising the CT scans of 520 participants who underwent health check-ups at a health promotion centre. We developed a psoas muscle segmentation model using deep learning in a three-step process based on the nnU-Net method. The automated segmentation method was evaluated for accuracy, reliability, and time required for the measurement.
RESULTS: The Dice similarity coefficient was used to compare the manual segmentation with automated segmentation; an average Dice score of 0.927 ± 0.019 was obtained, with no critical outliers. Our automated segmentation system had an average measurement time of 2 min 20 s ± 20 s, which was 48 times shorter than that of the manual measurement method (111 min 6 s ± 25 min 25 s).
CONCLUSION: We have successfully developed an automated segmentation method to measure the psoas muscle volume that ensures consistent and unbiased estimates across a wide range of CT images.
PMID:38777592 | DOI:10.1136/bmjopen-2023-079417
Artificial Intelligence for Breast Cancer Risk Assessment
Radiol Clin North Am. 2024 Jul;62(4):619-625. doi: 10.1016/j.rcl.2024.02.004. Epub 2024 Mar 21.
ABSTRACT
Breast cancer risk prediction models based on common clinical risk factors are used to identify women eligible for high-risk screening and prevention. Unfortunately, these models have only modest discriminatory accuracy with disparities in performance in underrepresented race and ethnicity groups. The field of artificial intelligence (AI) and deep learning are rapidly advancing the field of breast cancer risk prediction with the development of mammography-based AI breast cancer risk models. Early studies suggest mammography-based AI risk models may perform better than traditional risk factor-based models with more equitable performance.
PMID:38777538 | DOI:10.1016/j.rcl.2024.02.004
Characterization and quantification of in-vitro equine bone resorption in 3D using muCT and deep learning-aided feature segmentation
Bone. 2024 May 20:117131. doi: 10.1016/j.bone.2024.117131. Online ahead of print.
ABSTRACT
High cyclic strains induce formation of microcracks in bone, triggering targeted bone remodeling, which entails osteoclastic resorption. Racehorse bone is an ideal model for studying the effects of high-intensity loading, as it is subject to focal formation of microcracks and subsequent bone resorption. The volume of resorption in vitro is considered a direct indicator of osteoclast activity but indirect 2D measurements are used more often. Our objective was to develop an accurate, high-throughput method to quantify equine osteoclast resorption volume in μCT 3D images. Here, equine osteoclasts were cultured on equine bone slices and imaged with μCT pre- and postculture. Individual resorption events were then isolated and analyzed in 3D. Modal volume, maximum depth, and aspect ratio of resorption events were calculated. A convolutional neural network (CNN U-Net-like) was subsequently trained to identify resorption events on post-culture μCT images alone, without the need for pre-culture imaging, using archival bone slices with known resorption areas and paired CTX-I biomarker levels in culture media. 3D resorption volume measurements strongly correlated with both the CTX-I levels (p < 0.001) and area measurements (p < 0.001). Our 3D analysis shows that the shapes of resorption events form a continuous spectrum, rather than previously reported pit and trench categories. With more extensive resorption, shapes of increasing complexity appear, although simpler resorption cavity morphologies (small, rounded) remain most common, in acord with the left-hand limit paradigm. Finally, we show that 2D measurements of in vitro osteoclastic resorption are a robust and reliable proxy.
PMID:38777311 | DOI:10.1016/j.bone.2024.117131
Promoting the Shift From Pixel-Level Correlations to Object Semantics Learning by Rethinking Computer Vision Benchmark Data Sets
Neural Comput. 2024 May 20:1-17. doi: 10.1162/neco_a_01677. Online ahead of print.
ABSTRACT
In computer vision research, convolutional neural networks (CNNs) have demonstrated remarkable capabilities at extracting patterns from raw pixel data, achieving state-of-the-art recognition accuracy. However, they significantly differ from human visual perception, prioritizing pixel-level correlations and statistical patterns, often overlooking object semantics. To explore this difference, we propose an approach that isolates core visual features crucial for human perception and object recognition: color, texture, and shape. In experiments on three benchmarks-Fruits 360, CIFAR-10, and Fashion MNIST-each visual feature is individually input into a neural network. Results reveal data set-dependent variations in classification accuracy, highlighting that deep learning models tend to learn pixel-level correlations instead of fundamental visual features. To validate this observation, we used various combinations of concatenated visual features as input for a neural network on the CIFAR-10 data set. CNNs excel at learning statistical patterns in images, achieving exceptional performance when training and test data share similar distributions. To substantiate this point, we trained a CNN on CIFAR-10 data set and evaluated its performance on the "dog" class from CIFAR-10 and on an equivalent number of examples from the Stanford Dogs data set. The CNN poor performance on Stanford Dogs images underlines the disparity between deep learning and human visual perception, highlighting the need for models that learn object semantics. Specialized benchmark data sets with controlled variations hold promise for aligning learned representations with human cognition in computer vision research.
PMID:38776966 | DOI:10.1162/neco_a_01677
Prediction model of early recurrence of multimodal hepatocellular carcinoma with tensor fusion
Phys Med Biol. 2024 May 22. doi: 10.1088/1361-6560/ad4f45. Online ahead of print.
ABSTRACT
Clinical decision-making in oncology involves multimodal data, encompassing histopathological, radiological, and clinical factors. Several computer-aided multimodal decision-making systems have emerged in recent years to predict the recurrence of hepatocellular carcinoma (HCC) after hepatectomy, but they tend to employ simplistic feature-level concatenation, resulting in redundancy and hampering overall performance. More notably, these models often lack an effective integration with clinical relevance. Particularly, they still face major challenges of integrating data from diverse scales and dimensions, and introducing a liver background, which are clinically significant but previously overlooked aspects.

 In addressing these challenges, we provide new insight in two areas. Firstly, we introduce the tensor fusion method into the model, which demonstrates unique advantages in handling the fusion of multi-scale and multi-dimensional data, thus potentially enhancing the model's performance. Secondly, to our best knowledge, it's a precedent to take the impact of the liver background into account. We innovatively incorporate the impact of the liver background into the feature extraction process by using a deep learning segmentation-based algorithm. This inclusion makes the model closer to real-world clinical scenarios, as the liver background may contain vital information related to postoperative recurrence.
 
 We collected radiomics (MRI) and histopathological images of 176 cases diagnosed by experienced clinicians from two independent centers. Our proposed network went through training and 5-fold cross-validation on the dataset of 176 cases and was subsequently validated on an independent external dataset of 40 cases. Finally, our proposed network exhibited excellent performance in predicting the postoperative early recurrence of HCC with an AUC of 0.883. These results suggest significant progress in addressing the challenges related to multimodal data fusion which provides potential value for more accurate predictions of clinical outcomes.
PMID:38776945 | DOI:10.1088/1361-6560/ad4f45
A systematic evaluation of Euclidean alignment with deep learning for EEG decoding
J Neural Eng. 2024 May 22. doi: 10.1088/1741-2552/ad4f18. Online ahead of print.
ABSTRACT
Electroencephalography (EEG) signals are frequently used for various Brain-Computer Interface (BCI) tasks. While Deep Learning (DL) techniques have shown promising results, they are hindered by the substantial data requirements. By leveraging data from multiple subjects, transfer learning enables more effective training of DL models. A technique that is gaining popularity is Euclidean Alignment (EA) due to its ease of use, low computational complexity, and compatibility with Deep Learning models. However, few studies evaluate its impact on the training performance of shared and individual DL models. In this work, we systematically evaluate the effect of EA combined with DL for decoding BCI signals. We used EA to train shared models with data from multiple subjects and evaluated its transferability to new subjects. Our experimental results show that it improves decoding in the target subject by 4.33% and decreases convergence time by more than 70%. We also trained individual models for each subject to use as a majority-voting ensemble classifier. In this scenario, using EA improved the 3-model ensemble accuracy by 3.7%. However, when compared to the shared model with EA, the ensemble accuracy was 3.62% lower.
PMID:38776898 | DOI:10.1088/1741-2552/ad4f18
MONAI Label: A framework for AI-assisted interactive labeling of 3D medical images
Med Image Anal. 2024 May 15;95:103207. doi: 10.1016/j.media.2024.103207. Online ahead of print.
ABSTRACT
The lack of annotated datasets is a major bottleneck for training new task-specific supervised machine learning models, considering that manual annotation is extremely expensive and time-consuming. To address this problem, we present MONAI Label, a free and open-source framework that facilitates the development of applications based on artificial intelligence (AI) models that aim at reducing the time required to annotate radiology datasets. Through MONAI Label, researchers can develop AI annotation applications focusing on their domain of expertise. It allows researchers to readily deploy their apps as services, which can be made available to clinicians via their preferred user interface. Currently, MONAI Label readily supports locally installed (3D Slicer) and web-based (OHIF) frontends and offers two active learning strategies to facilitate and speed up the training of segmentation algorithms. MONAI Label allows researchers to make incremental improvements to their AI-based annotation application by making them available to other researchers and clinicians alike. Additionally, MONAI Label provides sample AI-based interactive and non-interactive labeling applications, that can be used directly off the shelf, as plug-and-play to any given dataset. Significant reduced annotation times using the interactive model can be observed on two public datasets.
PMID:38776843 | DOI:10.1016/j.media.2024.103207
A comprehensive survey on deep active learning in medical image analysis
Med Image Anal. 2024 May 21;95:103201. doi: 10.1016/j.media.2024.103201. Online ahead of print.
ABSTRACT
Deep learning has achieved widespread success in medical image analysis, leading to an increasing demand for large-scale expert-annotated medical image datasets. Yet, the high cost of annotating medical images severely hampers the development of deep learning in this field. To reduce annotation costs, active learning aims to select the most informative samples for annotation and train high-performance models with as few labeled samples as possible. In this survey, we review the core methods of active learning, including the evaluation of informativeness and sampling strategy. For the first time, we provide a detailed summary of the integration of active learning with other label-efficient techniques, such as semi-supervised, self-supervised learning, and so on. We also summarize active learning works that are specifically tailored to medical image analysis. Additionally, we conduct a thorough comparative analysis of the performance of different AL methods in medical image analysis with experiments. In the end, we offer our perspectives on the future trends and challenges of active learning and its applications in medical image analysis. An accompanying paper list and code for the comparative analysis is available in https://github.com/LightersWang/Awesome-Active-Learning-for-Medical-Image-Analysis.
PMID:38776841 | DOI:10.1016/j.media.2024.103201
The dynamic-static dual-branch deep neural network for urban speeding hotspot identification using street view image data
Accid Anal Prev. 2024 May 21;203:107636. doi: 10.1016/j.aap.2024.107636. Online ahead of print.
ABSTRACT
The visual information regarding the road environment can influence drivers' perception and judgment, often resulting in frequent speeding incidents. Identifying speeding hotspots in cities can prevent potential speeding incidents, thereby improving traffic safety levels. We propose the Dual-Branch Contextual Dynamic-Static Feature Fusion Network based on static panoramic images and dynamically changing sequence data, aiming to capture global features in the macro scene of the area and dynamically changing information in the micro view for a more accurate urban speeding hotspot area identification. For the static branch, we propose the Multi-scale Contextual Feature Aggregation Network for learning global spatial contextual association information. In the dynamic branch, we construct the Multi-view Dynamic Feature Fusion Network to capture the dynamically changing features of a scene from a continuous sequence of street view images. Additionally, we designed the Dynamic-Static Feature Correlation Fusion Structure to correlate and fuse dynamic and static features. The experimental results show that the model has good performance, and the overall recognition accuracy reaches 99.4%. The ablation experiments show that the recognition effect after the fusion of dynamic and static features is better than that of static and dynamic branches. The proposed model also shows better performance than other deep learning models. In addition, we combine image processing methods and different Class Activation Mapping (CAM) methods to extract speeding frequency visual features from the model perception results. The results show that more accurate speeding frequency features can be obtained by using LayerCAM and GradCAM-Plus for static global scenes and dynamic local sequences, respectively. In the static global scene, the speeding frequency features are mainly concentrated on the buildings and green layout on both sides of the road, while in the dynamic scene, the speeding frequency features shift with the scene changes and are mainly concentrated on the dynamically changing transition areas of greenery, roads, and surrounding buildings. The code and model used for identifying hotspots of urban traffic accidents in this study are available for access: https://github.com/gwt-ZJU/DCDSFF-Net.
PMID:38776837 | DOI:10.1016/j.aap.2024.107636