Deep learning

Analysis of anterior segment in primary angle closure suspect with deep learning models

Mon, 2024-09-09 06:00

BMC Med Inform Decis Mak. 2024 Sep 9;24(1):251. doi: 10.1186/s12911-024-02658-1.

ABSTRACT

OBJECTIVE: To analyze primary angle closure suspect (PACS) patients' anatomical characteristics of anterior chamber configuration, and to establish artificial intelligence (AI)-aided diagnostic system for PACS screening.

METHODS: A total of 1668 scans of 839 patients were included in this cross-sectional study. The subjects were divided into two groups: PACS group and normal group. With anterior segment optical coherence tomography scans, the anatomical diversity between two groups was compared, and anterior segment structure features of PACS were extracted. Then, AI-aided diagnostic system was constructed, which based different algorithms such as classification and regression tree (CART), random forest (RF), logistic regression (LR), VGG-16 and Alexnet. Then the diagnostic efficiencies of different algorithms were evaluated, and compared with junior physicians and experienced ophthalmologists.

RESULTS: RF [sensitivity (Se) = 0.84; specificity (Sp) = 0.92; positive predict value (PPV) = 0.82; negative predict value (NPV) = 0.95; area under the curve (AUC) = 0.90] and CART (Se = 0.76, Sp = 0.93, PPV = 0.85, NPV = 0.92, AUC = 0.90) showed better performance than LR (Se = 0.68, Sp = 0.91, PPV = 0.79, NPV = 0.90, AUC = 0.86). In convolutional neural networks (CNN), Alexnet (Se = 0.83, Sp = 0.95, PPV = 0.92, NPV = 0.87, AUC = 0.85) was better than VGG-16 (Se = 0.84, Sp = 0.90, PPV = 0.85, NPV = 0.90, AUC = 0.79). The performance of 2 CNN algorithms was better than 5 junior physicians, and the mean value of diagnostic indicators of 2 CNN algorithm was similar to experienced ophthalmologists.

CONCLUSION: PACS patients have distinct anatomical characteristics compared with health controls. AI models for PACS screening are reliable and powerful, equivalent to experienced ophthalmologists.

PMID:39251987 | DOI:10.1186/s12911-024-02658-1

Categories: Literature Watch

Clinical performance of deep learning-enhanced ultrafast whole-body scintigraphy in patients with suspected malignancy

Mon, 2024-09-09 06:00

BMC Med Imaging. 2024 Sep 9;24(1):236. doi: 10.1186/s12880-024-01422-1.

ABSTRACT

BACKGROUND: To evaluate the clinical performance of two deep learning methods, one utilizing real clinical pairs and the other utilizing simulated datasets, in enhancing image quality for two-dimensional (2D) fast whole-body scintigraphy (WBS).

METHODS: A total of 83 patients with suspected bone metastasis were retrospectively enrolled. All patients underwent single-photon emission computed tomography (SPECT) WBS at speeds of 20 cm/min (1x), 40 cm/min (2x), and 60 cm/min (3x). Two deep learning models were developed to generate high-quality images from real and simulated fast scans, designated 2x-real and 3x-real (images from real fast data) and 2x-simu and 3x-simu (images from simulated fast data), respectively. A 5-point Likert scale was used to evaluate the image quality of each acquisition. Accuracy, sensitivity, specificity, and the area under the curve (AUC) were used to evaluate diagnostic efficacy. Learned perceptual image patch similarity (LPIPS) and the Fréchet inception distance (FID) were used to assess image quality. Additionally, the count-level consistency of WBS was compared between the two models.

RESULTS: Subjective assessments revealed that the 1x images had the highest general image quality (Likert score: 4.40 ± 0.45). The 2x-real, 2x-simu and 3x-real, 3x-simu images demonstrated significantly better quality than the 2x and 3x images (Likert scores: 3.46 ± 0.47, 3.79 ± 0.55 vs. 2.92 ± 0.41, P < 0.0001; 2.69 ± 0.40, 2.61 ± 0.41 vs. 1.36 ± 0.51, P < 0.0001), respectively. Notably, the quality of the 2x-real images was inferior to that of the 2x-simu images (Likert scores: 3.46 ± 0.47 vs. 3.79 ± 0.55, P = 0.001). The diagnostic efficacy for the 2x-real and 2x-simu images was indistinguishable from that of the 1x images (accuracy: 81.2%, 80.7% vs. 84.3%; sensitivity: 77.27%, 77.27% vs. 87.18%; specificity: 87.18%, 84.63% vs. 87.18%. All P > 0.05), whereas the diagnostic efficacy for the 3x-real and 3x-simu was better than that for the 3x images (accuracy: 65.1%, 66.35% vs. 59.0%; sensitivity: 63.64%, 63.64% vs. 64.71%; specificity: 66.67%, 69.23% vs. 55.1%. All P < 0.05). Objectively, both the real and simulated models achieved significantly enhanced image quality from the accelerated scans in the 2x and 3x groups (FID: 0.15 ± 0.18, 0.18 ± 0.18 vs. 0.47 ± 0.34; 0.19 ± 0.23, 0.20 ± 0.22 vs. 0.98 ± 0.59.

LPIPS: 0.17 ± 0.05, 0.16 ± 0.04 vs. 0.19 ± 0.05; 0.18 ± 0.05, 0.19 ± 0.05 vs. 0.23 ± 0.04. All P < 0.05). The count-level consistency with the 1x images was excellent for all four sets of model-generated images (P < 0.0001).

CONCLUSIONS: Ultrafast 2x speed (real and simulated) images achieved comparable diagnostic value to that of standardly acquired images, but the simulation algorithm does not necessarily reflect real data.

PMID:39251959 | DOI:10.1186/s12880-024-01422-1

Categories: Literature Watch

Combining propensity score methods with variational autoencoders for generating synthetic data in presence of latent sub-groups

Mon, 2024-09-09 06:00

BMC Med Res Methodol. 2024 Sep 9;24(1):198. doi: 10.1186/s12874-024-02327-x.

ABSTRACT

In settings requiring synthetic data generation based on a clinical cohort, e.g., due to data protection regulations, heterogeneity across individuals might be a nuisance that we need to control or faithfully preserve. The sources of such heterogeneity might be known, e.g., as indicated by sub-groups labels, or might be unknown and thus reflected only in properties of distributions, such as bimodality or skewness. We investigate how such heterogeneity can be preserved and controlled when obtaining synthetic data from variational autoencoders (VAEs), i.e., a generative deep learning technique that utilizes a low-dimensional latent representation. To faithfully reproduce unknown heterogeneity reflected in marginal distributions, we propose to combine VAEs with pre-transformations. For dealing with known heterogeneity due to sub-groups, we complement VAEs with models for group membership, specifically from propensity score regression. The evaluation is performed with a realistic simulation design that features sub-groups and challenging marginal distributions. The proposed approach faithfully recovers the latter, compared to synthetic data approaches that focus purely on marginal distributions. Propensity scores add complementary information, e.g., when visualized in the latent space, and enable sampling of synthetic data with or without sub-group specific characteristics. We also illustrate the proposed approach with real data from an international stroke trial that exhibits considerable distribution differences between study sites, in addition to bimodality. These results indicate that describing heterogeneity by statistical approaches, such as propensity score regression, might be more generally useful for complementing generative deep learning for obtaining synthetic data that faithfully reflects structure from clinical cohorts.

PMID:39251921 | DOI:10.1186/s12874-024-02327-x

Categories: Literature Watch

Semi-supervised meta-learning elucidates understudied molecular interactions

Mon, 2024-09-09 06:00

Commun Biol. 2024 Sep 9;7(1):1104. doi: 10.1038/s42003-024-06797-z.

ABSTRACT

Many biological problems are understudied due to experimental limitations and human biases. Although deep learning is promising in accelerating scientific discovery, its power compromises when applied to problems with scarcely labeled data and data distribution shifts. We develop a deep learning framework-Meta Model Agnostic Pseudo Label Learning (MMAPLE)-to address these challenges by effectively exploring out-of-distribution (OOD) unlabeled data when conventional transfer learning fails. The uniqueness of MMAPLE is to integrate the concept of meta-learning, transfer learning and semi-supervised learning into a unified framework. The power of MMAPLE is demonstrated in three applications in an OOD setting where chemicals or proteins in unseen data are dramatically different from those in training data: predicting drug-target interactions, hidden human metabolite-enzyme interactions, and understudied interspecies microbiome metabolite-human receptor interactions. MMAPLE achieves 11% to 242% improvement in the prediction-recall on multiple OOD benchmarks over various base models. Using MMAPLE, we reveal novel interspecies metabolite-protein interactions that are validated by activity assays and fill in missing links in microbiome-human interactions. MMAPLE is a general framework to explore previously unrecognized biological domains beyond the reach of present experimental and computational techniques.

PMID:39251833 | DOI:10.1038/s42003-024-06797-z

Categories: Literature Watch

Fed-CL- an atrial fibrillation prediction system using ECG signals employing federated learning mechanism

Mon, 2024-09-09 06:00

Sci Rep. 2024 Sep 9;14(1):21038. doi: 10.1038/s41598-024-71366-7.

ABSTRACT

Deep learning has shown great promise in predicting Atrial Fibrillation using ECG signals and other vital signs. However, a major hurdle lies in the privacy concerns surrounding these datasets, which often contain sensitive patient information. Balancing accurate AFib prediction with robust user privacy remains a critical challenge to address. We suggest Federated Learning , a privacy-preserving machine learning technique, to address this privacy barrier. Our approach makes use of FL by presenting Fed-CL, a advanced method that combines Long Short-Term Memory networks and Convolutional Neural Networks to accurately predict AFib. In addition, the article explores the importance of analysing mean heart rate variability to differentiate between healthy and abnormal heart rhythms. This combined approach within the proposed system aims to equip healthcare professionals with timely alerts and valuable insights. Ultimately, the goal is to facilitate early detection of AFib risk and enable preventive care for susceptible individuals.

PMID:39251753 | DOI:10.1038/s41598-024-71366-7

Categories: Literature Watch

Commonalities and variations in emotion representation across modalities and brain regions

Mon, 2024-09-09 06:00

Sci Rep. 2024 Sep 9;14(1):20992. doi: 10.1038/s41598-024-71690-y.

ABSTRACT

Humans express emotions through various modalities such as facial expressions and natural language. However, the relationships between emotions expressed through different modalities and their correlations with neural activities remain uncertain. Here, we aimed to unveil some of these uncertainties by investigating the similarity of emotion representations across modalities and brain regions. First, we represented various emotion categories as multi-dimensional vectors derived from visual (face), linguistic, and visio-linguistic data, and used representational similarity analysis to compare these modalities. Second, we examined the linear transferability of emotion representation from other modalities to the visual modality. Third, we compared the representational structure derived in the first step with those from brain activities across 360 regions. Our findings revealed that emotion representations share commonalities across modalities with modality-type dependent variations, and they can be linearly mapped from other modalities to the visual modality. Additionally, emotion representations in uni-modalities showed relatively higher similarity with specific brain regions, while multi-modal emotion representation was most similar to representations across the entire brain region. These findings suggest that emotional experiences are represented differently across various brain regions with varying degrees of similarity to different modality types, and that they may be multi-modally conveyable in visual and linguistic domains.

PMID:39251743 | DOI:10.1038/s41598-024-71690-y

Categories: Literature Watch

Training robust T1-weighted magnetic resonance imaging liver segmentation models using ensembles of datasets with different contrast protocols and liver disease etiologies

Mon, 2024-09-09 06:00

Sci Rep. 2024 Sep 9;14(1):20988. doi: 10.1038/s41598-024-71674-y.

ABSTRACT

Image segmentation of the liver is an important step in treatment planning for liver cancer. However, manual segmentation at a large scale is not practical, leading to increasing reliance on deep learning models to automatically segment the liver. This manuscript develops a generalizable deep learning model to segment the liver on T1-weighted MR images. In particular, three distinct deep learning architectures (nnUNet, PocketNet, Swin UNETR) were considered using data gathered from six geographically different institutions. A total of 819 T1-weighted MR images were gathered from both public and internal sources. Our experiments compared each architecture's testing performance when trained both intra-institutionally and inter-institutionally. Models trained using nnUNet and its PocketNet variant achieved mean Dice-Sorensen similarity coefficients>0.9 on both intra- and inter-institutional test set data. The performance of these models suggests that nnUNet and PocketNet liver segmentation models trained on a large and diverse collection of T1-weighted MR images would on average achieve good intra-institutional segmentation performance.

PMID:39251664 | DOI:10.1038/s41598-024-71674-y

Categories: Literature Watch

Bridging auditory perception and natural language processing with semantically informed deep neural networks

Mon, 2024-09-09 06:00

Sci Rep. 2024 Sep 9;14(1):20994. doi: 10.1038/s41598-024-71693-9.

ABSTRACT

Sound recognition is effortless for humans but poses a significant challenge for artificial hearing systems. Deep neural networks (DNNs), especially convolutional neural networks (CNNs), have recently surpassed traditional machine learning in sound classification. However, current DNNs map sounds to labels using binary categorical variables, neglecting the semantic relations between labels. Cognitive neuroscience research suggests that human listeners exploit such semantic information besides acoustic cues. Hence, our hypothesis is that incorporating semantic information improves DNN's sound recognition performance, emulating human behaviour. In our approach, sound recognition is framed as a regression problem, with CNNs trained to map spectrograms to continuous semantic representations from NLP models (Word2Vec, BERT, and CLAP text encoder). Two DNN types were trained: semDNN with continuous embeddings and catDNN with categorical labels, both with a dataset extracted from a collection of 388,211 sounds enriched with semantic descriptions. Evaluations across four external datasets, confirmed the superiority of semantic labeling from semDNN compared to catDNN, preserving higher-level relations. Importantly, an analysis of human similarity ratings for natural sounds, showed that semDNN approximated human listener behaviour better than catDNN, other DNNs, and NLP models. Our work contributes to understanding the role of semantics in sound recognition, bridging the gap between artificial systems and human auditory perception.

PMID:39251659 | DOI:10.1038/s41598-024-71693-9

Categories: Literature Watch

Enhancing early Parkinson's disease detection through multimodal deep learning and explainable AI: insights from the PPMI database

Mon, 2024-09-09 06:00

Sci Rep. 2024 Sep 9;14(1):20941. doi: 10.1038/s41598-024-70165-4.

ABSTRACT

Parkinson's is the second most common neurodegenerative disease, affecting nearly 8.5M people and steadily increasing. In this research, Multimodal Deep Learning is investigated for the Prodromal stage detection of Parkinson's Disease (PD), combining different 3D architectures with the novel Excitation Network (EN) and supported by Explainable Artificial Intelligence (XAI) techniques. Utilizing data from the Parkinson's Progression Markers Initiative, this study introduces a joint co-learning approach for multimodal fusion, enabling end-to-end training of deep neural networks and facilitating the learning of complementary information from both imaging and clinical modalities. DenseNet with EN outperformed other models, showing a substantial increase in accuracy when supplemented with clinical data. XAI methods, such as Integrated Gradients for ResNet and DenseNet, and Attention Heatmaps for Vision Transformer (ViT), revealed that DenseNet focused on brain regions believed to be critical to prodromal pathophysiology, including the right temporal and left pre-frontal areas. Similarly, ViT highlighted the lateral ventricles associated with cognitive decline, indicating their potential in the Prodromal stage. These findings underscore the potential of these regions as early-stage PD biomarkers and showcase the proposed framework's efficacy in predicting subtypes of PD and aiding in early diagnosis, paving the way for innovative diagnostic tools and precision medicine.

PMID:39251639 | DOI:10.1038/s41598-024-70165-4

Categories: Literature Watch

EpiScan: accurate high-throughput mapping of antibody-specific epitopes using sequence information

Mon, 2024-09-09 06:00

NPJ Syst Biol Appl. 2024 Sep 9;10(1):101. doi: 10.1038/s41540-024-00432-7.

ABSTRACT

The identification of antibody-specific epitopes on virus proteins is crucial for vaccine development and drug design. Nonetheless, traditional wet-lab approaches for the identification of epitopes are both costly and labor-intensive, underscoring the need for the development of efficient and cost-effective computational tools. Here, EpiScan, an attention-based deep learning framework for predicting antibody-specific epitopes, is presented. EpiScan adopts a multi-input and single-output strategy by designing independent blocks for different parts of antibodies, including variable heavy chain (VH), variable light chain (VL), complementary determining regions (CDRs), and framework regions (FRs). The block predictions are weighted and integrated for the prediction of potential epitopes. Using multiple experimental data samples, we show that EpiScan, which only uses antibody sequence information, can accurately map epitopes on specific antigen structures. The antibody-specific epitopes on the receptor binding domain (RBD) of SARS coronavirus 2 (SARS-CoV-2) were located by EpiScan, and the potentially valuable vaccine epitope was identified. EpiScan can expedite the epitope mapping process for high-throughput antibody sequencing data, supporting vaccine design and drug development. Availability: For the convenience of related wet-experimental researchers, the source code and web server of EpiScan are publicly available at https://github.com/gzBiomedical/EpiScan .

PMID:39251627 | DOI:10.1038/s41540-024-00432-7

Categories: Literature Watch

Sub-sampling graph neural networks for genomic prediction of quantitative phenotypes

Mon, 2024-09-09 06:00

G3 (Bethesda). 2024 Sep 9:jkae216. doi: 10.1093/g3journal/jkae216. Online ahead of print.

ABSTRACT

In genomics, use of deep learning (DL) is rapidly growing and DL has successfully demonstrated its ability to uncover complex relationships in large biological and biomedical data sets. With the development of high-throughput sequencing techniques, genomic markers can now be allocated to large sections of a genome. By analysing allele sharing between individuals, one may calculate realized genomic relationships from single nucleotide polymorphisms (SNPs) data rather than relying on known pedigree relationships under polygenic model. The traditional approaches in genome-wide prediction (GWP) of quantitative phenotypes utilise genomic relationships in fixed global covariance modelling, possibly with some non-linear kernel mapping (for example Gaussian processes). On the other hand, the DL approaches proposed so far for GWP fail to take into account the non-Euclidean graph structure of relationships between individuals over several generations. In this paper, we propose one global convolutional neural network (GCN) and one local sub-sampling architecture (GCN-RS) that are specifically designed to perform regression analysis based on genomic relationship information. A GCN is tailored to non-Euclidean spaces and consists of several layers of graph convolutions. The GCN-RS architecture is designed to further improve the GCN's performance by sub-sampling the graph to reduce the dimensionality of the input data. Through these graph convolutional layers, the GCN maps input genomic markers to their quantitative phenotype values. The graphs are constructed using an iterative nearest neighbour approach. Comparisons show that the GCN-RS outperforms the popular Genomic Best Linear Unbiased Predictor (GBLUP) method on one simulated and three real data sets from wheat, mice and pig with a predictive improvement of 4.4% to 49.4% in terms of test mean squared error (MSE). This indicates that GCN-RS is a promising tool for genomic predictions in plants and animals. Furthermore, GCN-RS is computationally efficient, making it a viable option for large-scale applications.

PMID:39250757 | DOI:10.1093/g3journal/jkae216

Categories: Literature Watch

Precise ablation zone segmentation on CT images after liver cancer ablation using semi-automatic CNN-based segmentation

Mon, 2024-09-09 06:00

Med Phys. 2024 Sep 9. doi: 10.1002/mp.17373. Online ahead of print.

ABSTRACT

BACKGROUND: Ablation zone segmentation in contrast-enhanced computed tomography (CECT) images enables the quantitative assessment of treatment success in the ablation of liver lesions. However, fully automatic liver ablation zone segmentation in CT images still remains challenging, such as low accuracy and time-consuming manual refinement of the incorrect regions.

PURPOSE: Therefore, in this study, we developed a semi-automatic technique to address the remaining drawbacks and improve the accuracy of the liver ablation zone segmentation in the CT images.

METHODS: Our approach uses a combination of a CNN-based automatic segmentation method and an interactive CNN-based segmentation method. First, automatic segmentation is applied for coarse ablation zone segmentation in the whole CT image. Human experts then visually validate the segmentation results. If there are errors in the coarse segmentation, local corrections can be performed on each slice via an interactive CNN-based segmentation method. The models were trained and the proposed method was evaluated using two internal datasets of post-interventional CECT images ( n 1 $n_{1}$ = 22, n 2 $n_{2}$ = 145; 62 patients in total) and then further tested using an external benchmark dataset ( n 3 $n_{3}$ = 12; 10 patients).

RESULTS: To evaluate the accuracy of the proposed approach, we used Dice similarity coefficient (DSC), average symmetric surface distance (ASSD), Hausdorff distance (HD), and volume difference (VD). The quantitative evaluation results show that the proposed approach obtained mean DSC, ASSD, HD, and VD scores of 94.0%, 0.4 mm, 8.4 mm, 0.02, respectively, on the internal dataset, and 87.8%, 0.9 mm, 9.5 mm, and -0.03, respectively, on the benchmark dataset. We also compared the performance of the proposed approach to that of five well-known segmentation methods; the proposed semi-automatic method achieved state-of-the-art performance on ablation segmentation accuracy, and on average, 2 min are required to correct the segmentation. Furthermore, we found that the accuracy of the proposed method on the benchmark dataset is comparable to that of manual segmentation by human experts ( p $p$ = 0.55, t $t$ -test).

CONCLUSIONS: The proposed semi-automatic CNN-based segmentation method can be used to effectively segment the ablation zones, increasing the value of CECT for an assessment of treatment success. For reproducibility, the trained models, source code, and demonstration tool are publicly available at https://github.com/lqanh11/Interactive_AblationZone_Segmentation.

PMID:39250658 | DOI:10.1002/mp.17373

Categories: Literature Watch

Innovation in public health surveillance for social distancing during the COVID-19 pandemic: A deep learning and object detection based novel approach

Mon, 2024-09-09 06:00

PLoS One. 2024 Sep 9;19(9):e0308460. doi: 10.1371/journal.pone.0308460. eCollection 2024.

ABSTRACT

The Corona Virus Disease (COVID-19) has a huge impact on all of humanity, and people's disregard for COVID-19 regulations has sped up the disease's spread. Our study uses a state-of-the-art object detection model like YOLOv4 (You Only Look Once, version 4), a very effective tool, on real-time 25fps, 1920 X 1080 video data streamed live by a camera-mounted Unmanned Aerial Vehicle (UAV) quad-copter to observe proper maintenance of social distance in an area of 35m range in this study. The model has demonstrated remarkable efficacy in identifying and quantifying instances of social distancing, with an accuracy of 82% and little latency. It has been able to work efficiently with real-time streaming at 25-30 ms. Our model is based on CSPDarkNet-53, which was trained on the MS COCO dataset for image classification. It includes additional layers to capture feature maps from different phases. Additionally, the model's neck is made up of PANet, which is used to aggregate the parameters from various CSPDarkNet-53 layers. The CSPDarkNet-53's 53 convolutional layers are followed by 53 more layers in the model head, for a total of 106 completely convolutional layers in the design. This architecture is further integrated with YOLOv3, resulting in the YOLOv4 model, which will be used by our detection model. Furthermore, to differentiate humans The aforementioned method was used to evaluate drone footage and count social distance violations in real time. Our findings show that our model was reliable and successful at detecting social distance violations in real-time with an average accuracy of 82%.

PMID:39250511 | DOI:10.1371/journal.pone.0308460

Categories: Literature Watch

Backward induction-based deep image search

Mon, 2024-09-09 06:00

PLoS One. 2024 Sep 9;19(9):e0310098. doi: 10.1371/journal.pone.0310098. eCollection 2024.

ABSTRACT

Conditional image retrieval (CIR), which involves retrieving images by a query image along with user-specified conditions, is essential in computer vision research for efficient image search and automated image analysis. The existing approaches, such as composed image retrieval (CoIR) methods, have been actively studied. However, these methods face challenges as they require either a triplet dataset or richly annotated image-text pairs, which are expensive to obtain. In this work, we demonstrate that CIR at the image-level concept can be achieved using an inverse mapping approach that explores the model's inductive knowledge. Our proposed CIR method, called Backward Search, updates the query embedding to conform to the condition. Specifically, the embedding of the query image is updated by predicting the probability of the label and minimizing the difference from the condition label. This enables CIR with image-level concepts while preserving the context of the query. In this paper, we introduce the Backward Search method that enables single and multi-conditional image retrieval. Moreover, we efficiently reduce the computation time by distilling the knowledge. We conduct experiments using the WikiArt, aPY, and CUB benchmark datasets. The proposed method achieves an average mAP@10 of 0.541 on the datasets, demonstrating a marked improvement compared to the CoIR methods in our comparative experiments. Furthermore, by employing knowledge distillation with the Backward Search model as the teacher, the student model achieves a significant reduction in computation time, up to 160 times faster with only a slight decrease in performance. The implementation of our method is available at the following URL: https://github.com/dhlee-work/BackwardSearch.

PMID:39250472 | DOI:10.1371/journal.pone.0310098

Categories: Literature Watch

Smartphone region-wise image indoor localization using deep learning for indoor tourist attraction

Mon, 2024-09-09 06:00

PLoS One. 2024 Sep 9;19(9):e0307569. doi: 10.1371/journal.pone.0307569. eCollection 2024.

ABSTRACT

Smart indoor tourist attractions, such as smart museums and aquariums, require a significant investment in indoor localization devices. The use of Global Positioning Systems on smartphones is unsuitable for scenarios where dense materials such as concrete and metal blocks weaken GPS signals, which is most often the case in indoor tourist attractions. With the help of deep learning, indoor localization can be done region by region using smartphone images. This approach requires no investment in infrastructure and reduces the cost and time needed to turn museums and aquariums into smart museums or smart aquariums. In this paper, we propose using deep learning algorithms to classify locations based on smartphone camera images for indoor tourist attractions. We evaluate our proposal in a real-world scenario in Brazil. We extensively collect images from ten different smartphones to classify biome-themed fish tanks in the Pantanal Biopark, creating a new dataset of 3654 images. We tested seven state-of-the-art neural networks, three of them based on transformers. On average, we achieved a precision of about 90% and a recall and f-score of about 89%. The results show that the proposal is suitable for most indoor tourist attractions.

PMID:39250439 | DOI:10.1371/journal.pone.0307569

Categories: Literature Watch

RingGesture: A Ring-Based Mid-Air Gesture Typing System Powered by a Deep-Learning Word Prediction Framework

Mon, 2024-09-09 06:00

IEEE Trans Vis Comput Graph. 2024 Sep 9;PP. doi: 10.1109/TVCG.2024.3456179. Online ahead of print.

ABSTRACT

Text entry is a critical capability for any modern computing experience, with lightweight augmented reality (AR) glasses being no exception. Designed for all-day wearability, a limitation of lightweight AR glass is the restriction to the inclusion of multiple cameras for extensive field of view in hand tracking. This constraint underscores the need for an additional input device. We propose a system to address this gap: a ring-based mid-air gesture typing technique, RingGesture, utilizing electrodes to mark the start and end of gesture trajectories and inertial measurement units (IMU) sensors for hand tracking. This method offers an intuitive experience similar to raycast-based mid-air gesture typing found in VR headsets, allowing for a seamless translation of hand movements into cursor navigation. To enhance both accuracy and input speed, we propose a novel deep-learning word prediction framework, Score Fusion, comprised of three key components: a) a word-gesture decoding model, b) a spatial spelling correction model, and c) a lightweight contextual language model. In contrast, this framework fuses the scores from the three models to predict the most likely words with higher precision. We conduct comparative and longitudinal studies to demonstrate two key findings: firstly, the overall effectiveness of RingGesture, which achieves an average text entry speed of 27.3 words per minute (WPM) and a peak performance of 47.9 WPM. Secondly, we highlight the superior performance of the Score Fusion framework, which offers a 28.2% improvement in uncorrected Character Error Rate over a conventional word prediction framework, Naive Correction, leading to a 55.2% improvement in text entry speed for RingGesture. Additionally, RingGesture received a System Usability Score of 83 signifying its excellent usability.

PMID:39250409 | DOI:10.1109/TVCG.2024.3456179

Categories: Literature Watch

ParamsDrag: Interactive Parameter Space Exploration via Image-Space Dragging

Mon, 2024-09-09 06:00

IEEE Trans Vis Comput Graph. 2024 Sep 9;PP. doi: 10.1109/TVCG.2024.3456338. Online ahead of print.

ABSTRACT

Numerical simulation serves as a cornerstone in scientific modeling, yet the process of fine-tuning simulation parameters poses significant challenges. Conventionally, parameter adjustment relies on extensive numerical simulations, data analysis, and expert insights, resulting in substantial computational costs and low efficiency. The emergence of deep learning in recent years has provided promising avenues for more efficient exploration of parameter spaces. However, existing approaches often lack intuitive methods for precise parameter adjustment and optimization. To tackle these challenges, we introduce ParamsDrag, a model that facilitates parameter space exploration through direct interaction with visualizations. Inspired by DragGAN, our ParamsDrag model operates in three steps. First, the generative component of ParamsDrag generates visualizations based on the input simulation parameters. Second, by directly dragging structure-related features in the visualizations, users can intuitively understand the controlling effect of different parameters. Third, with the understanding from the earlier step, users can steer ParamsDrag to produce dynamic visual outcomes. Through experiments conducted on real-world simulations and comparisons with state-of-the-art deep learning-based approaches, we demonstrate the efficacy of our solution.

PMID:39250408 | DOI:10.1109/TVCG.2024.3456338

Categories: Literature Watch

Interactive Design-of-Experiments: Optimizing a Cooling System

Mon, 2024-09-09 06:00

IEEE Trans Vis Comput Graph. 2024 Sep 9;PP. doi: 10.1109/TVCG.2024.3456356. Online ahead of print.

ABSTRACT

The optimization of cooling systems is important in many cases, for example for cabin and battery cooling in electric cars. Such an optimization is governed by multiple, conflicting objectives and it is performed across a multi-dimensional parameter space. The extent of the parameter space, the complexity of the non-linear model of the system, as well as the time needed per simulation run and factors that are not modeled in the simulation necessitate an iterative, semi-automatic approach. We present an interactive visual optimization approach, where the user works with a p-h diagram to steer an iterative, guided optimization process. A deep learning (DL) model provides estimates for parameters, given a target characterization of the system, while numerical simulation is used to compute system characteristics for an ensemble of parameter sets. Since the DL model only serves as an approximation of the inverse of the cooling system and since target characteristics can be chosen according to different, competing objectives, an iterative optimization process is realized, developing multiple sets of intermediate solutions, which are visually related to each other. The standard p-h diagram, integrated interactively in this approach, is complemented by a dual, also interactive visual representation of additional expressive measures representing the system characteristics. We show how the known four-points semantic of the p-h diagram meaningfully transfers to the dual data representation. When evaluating this approach in the automotive domain, we found that our solution helped with the overall comprehension of the cooling system and that it lead to a faster convergence during optimization.

PMID:39250379 | DOI:10.1109/TVCG.2024.3456356

Categories: Literature Watch

SurroFlow: A Flow-Based Surrogate Model for Parameter Space Exploration and Uncertainty Quantification

Mon, 2024-09-09 06:00

IEEE Trans Vis Comput Graph. 2024 Sep 9;PP. doi: 10.1109/TVCG.2024.3456372. Online ahead of print.

ABSTRACT

Existing deep learning-based surrogate models facilitate efficient data generation, but fall short in uncertainty quantification, efficient parameter space exploration, and reverse prediction. In our work, we introduce SurroFlow, a novel normalizing flow-based surrogate model, to learn the invertible transformation between simulation parameters and simulation outputs. The model not only allows accurate predictions of simulation outcomes for a given simulation parameter but also supports uncertainty quantification in the data generation process. Additionally, it enables efficient simulation parameter recommendation and exploration. We integrate SurroFlow and a genetic algorithm as the backend of a visual interface to support effective user-guided ensemble simulation exploration and visualization. Our framework significantly reduces the computational costs while enhancing the reliability and exploration capabilities of scientific surrogate models.

PMID:39250378 | DOI:10.1109/TVCG.2024.3456372

Categories: Literature Watch

Vision-Centric BEV Perception: A Survey

Mon, 2024-09-09 06:00

IEEE Trans Pattern Anal Mach Intell. 2024 Sep 9;PP. doi: 10.1109/TPAMI.2024.3449912. Online ahead of print.

ABSTRACT

Object detection, a fundamental and challenging problem in computer vision, has experienced rapid development due to the effectiveness of deep learning. The current objects to be detected are mostly rigid solid substances with apparent and distinct visual characteristics. In this paper, we endeavor on a scarcely explored task named Gaseous Object Detection (GOD), which is undertaken to explore whether the object detection techniques can be extended from solid substances to gaseous substances. Nevertheless, the gas exhibits significantly different visual characteristics: 1) saliency deficiency, 2) arbitrary and ever-changing shapes, 3) lack of distinct boundaries. To facilitate the study on this challenging task, we construct a GOD-Video dataset comprising 600 videos (141,017 frames) that cover various attributes with multiple types of gases. A comprehensive benchmark is established based on this dataset, allowing for a rigorous evaluation of frame-level and video-level detectors. Deduced from the Gaussian dispersion model, the physics-inspired Voxel Shift Field (VSF) is designed to model geometric irregularities and ever-changing shapes in potential 3D space. By integrating VSF into Faster RCNN, the VSF RCNN serves as a simple but strong baseline for gaseous object detection. Our work aims to attract further research into this valuable albeit challenging area.

PMID:39250358 | DOI:10.1109/TPAMI.2024.3449912

Categories: Literature Watch

Pages