Deep learning
Contrastive Learning vs. Self-Learning vs. Deformable Data Augmentation in Semantic Segmentation of Medical Images
J Imaging Inform Med. 2024 Jun 10. doi: 10.1007/s10278-024-01159-x. Online ahead of print.
ABSTRACT
To develop a robust segmentation model, encoding the underlying features/structures of the input data is essential to discriminate the target structure from the background. To enrich the extracted feature maps, contrastive learning and self-learning techniques are employed, particularly when the size of the training dataset is limited. In this work, we set out to investigate the impact of contrastive learning and self-learning on the performance of the deep learning-based semantic segmentation. To this end, three different datasets were employed used for brain tumor and hippocampus delineation from MR images (BraTS and Decathlon datasets, respectively) and kidney segmentation from CT images (Decathlon dataset). Since data augmentation techniques are also aimed at enhancing the performance of deep learning methods, a deformable data augmentation technique was proposed and compared with contrastive learning and self-learning frameworks. The segmentation accuracy for the three datasets was assessed with and without applying data augmentation, contrastive learning, and self-learning to individually investigate the impact of these techniques. The self-learning and deformable data augmentation techniques exhibited comparable performance with Dice indices of 0.913 ± 0.030 and 0.920 ± 0.022 for kidney segmentation, 0.890 ± 0.035 and 0.898 ± 0.027 for hippocampus segmentation, and 0.891 ± 0.045 and 0.897 ± 0.040 for lesion segmentation, respectively. These two approaches significantly outperformed the contrastive learning and the original model with Dice indices of 0.871 ± 0.039 and 0.868 ± 0.042 for kidney segmentation, 0.872 ± 0.045 and 0.865 ± 0.048 for hippocampus segmentation, and 0.870 ± 0.049 and 0.860 ± 0.058 for lesion segmentation, respectively. The combination of self-learning with deformable data augmentation led to a robust segmentation model with no outliers in the outcomes. This work demonstrated the beneficial impact of self-learning and deformable data augmentation on organ and lesion segmentation, where no additional training datasets are needed.
PMID:38858260 | DOI:10.1007/s10278-024-01159-x
Deep Learning Reconstructed New-Generation 0.55 T MRI of the Knee-A Prospective Comparison With Conventional 3 T MRI
Invest Radiol. 2024 Jun 11. doi: 10.1097/RLI.0000000000001093. Online ahead of print.
ABSTRACT
OBJECTIVES: The aim of this study was to compare deep learning reconstructed (DLR) 0.55 T magnetic resonance imaging (MRI) quality, identification, and grading of structural anomalies and reader confidence levels with conventional 3 T knee MRI in patients with knee pain following trauma.
MATERIALS AND METHODS: This prospective study of 26 symptomatic patients (5 women) includes 52 paired DLR 0.55 T and conventional 3 T MRI examinations obtained in 1 setting. A novel, commercially available DLR algorithm was employed for 0.55 T image reconstruction. Four board-certified radiologists reviewed all images independently and graded image quality, noted structural anomalies and their respective reporting confidence levels for the presence or absence, as well as grading of bone, cartilage, meniscus, ligament, and tendon lesions. Image quality and reader confidence levels were compared (P < 0.05, significant), and MRI findings were correlated between 0.55 T and 3 T MRI using Cohen kappa (κ).
RESULTS: In reader's consensus, good image quality was found for DLR 0.55 T MRI and 3 T MRI (3.8 vs 4.1/5 points, P = 0.06). There was near-perfect agreement between 0.55 T DLR and 3 T MRI regarding the identification of structural anomalies for all readers (each κ ≥ 0.80). Substantial to near-perfection agreement between 0.55 T and 3 T MRI was reported for grading of cartilage (κ = 0.65-0.86) and meniscus lesions (κ = 0.71-1.0). High confidence levels were found for all readers for DLR 0.55 T and 3 T MRI, with 3 readers showing higher confidence levels for reporting cartilage lesions on 3 T MRI.
CONCLUSIONS: In conclusion, new-generation 0.55 T DLR MRI provides good image quality, comparable to conventional 3 T MRI, and allows for reliable identification of internal derangement of the knee with high reader confidence.
PMID:38857414 | DOI:10.1097/RLI.0000000000001093
SSL-CPCD: Self-supervised learning with composite pretext-class discrimination for improved generalisability in endoscopic image analysis
IEEE Trans Med Imaging. 2024 Jun 10;PP. doi: 10.1109/TMI.2024.3411933. Online ahead of print.
ABSTRACT
Data-driven methods have shown tremendous progress in medical image analysis. In this context, deep learning-based supervised methods are widely popular. However, they require a large amount of training data and face issues in generalisability to unseen datasets that hinder clinical translation. Endoscopic imaging data is characterised by large inter- and intra-patient variability that makes these models more challenging to learn representative features for downstream tasks. Thus, despite the publicly available datasets and datasets that can be generated within hospitals, most supervised models still underperform. While self-supervised learning has addressed this problem to some extent in natural scene data, there is a considerable performance gap in the medical image domain. In this paper, we propose to explore patch-level instance-group discrimination and penalisation of inter-class variation using additive angular margin within the cosine similarity metrics. Our novel approach enables models to learn to cluster similar representations, thereby improving their ability to provide better separation between different classes. Our results demonstrate significant improvement on all metrics over the state-of-the-art (SOTA) methods on the test set from the same and diverse datasets. We evaluated our approach for classification, detection, and segmentation. SSL-CPCD attains notable Top 1 accuracy of 79.77% in ulcerative colitis classification, an 88.62% mean average precision (mAP) for detection, and an 82.32% dice similarity coefficient for segmentation tasks. These represent improvements of over 4%, 2%, and 3%, respectively, compared to the baseline architectures. We demonstrate that our method generalises better than all SOTA methods to unseen datasets, reporting over 7% improvement.
PMID:38857149 | DOI:10.1109/TMI.2024.3411933
An Attention-Based Hemispheric Relation Inference Network for Perinatal Brain Age Prediction
IEEE J Biomed Health Inform. 2024 Jun 10;PP. doi: 10.1109/JBHI.2024.3411620. Online ahead of print.
ABSTRACT
Brain anatomical age is an effective feature to assess the status of the brain, such as atypical development and aging. Although some deep learning models have been developed for estimating infant brain age, the performance of these models was unsatisfactory because few of them considered the developmental characteristics of brain anatomy during the perinatal period-the most rapid and complex developmental stage across the lifespan. The present study proposed an attention-based hemispheric relation inference network (HRINet) that takes advantage of the nature of brain structural lateralization during early development. This model captures the inter-hemispheric relationship using a graph attention mechanism and transmits lateralization information as features to describe the interactive development between bilateral hemispheres. The HRINet was used to estimate the brain age of 531 preterm and full-term neonates from the Developing Human Connectome Project (dHCP) database based on two metrics (mean curvature and sulcal depth) characterizing the folding morphology of the cortex. Our results showed that the HRINet outperformed other benchmark models in fitting the perinatal brain age, with mean absolute error of 0.53 and determination coefficient of 0.89. We also verified the generalizability of the HRINet on an extra independent dataset collected from the Gansu Provincial Maternity and Child-care Hospital. Furthermore, by applying the best-performing model to an independent dataset consisting of 47 scans of preterm infants at term-equivalent age, we showed that the predicted age was significantly lower than the chronological age, suggesting a delayed development of premature brains. Our results demonstrate the effectiveness and generalizability of the HRINet in estimating infant brain age, providing promising clinical applications for assessing neonatal brain maturity.
PMID:38857141 | DOI:10.1109/JBHI.2024.3411620
A Generalisable Heartbeat Classifier Leveraging Self-Supervised Learning for ECG Analysis During Magnetic Resonance Imaging
IEEE J Biomed Health Inform. 2024 Jun 10;PP. doi: 10.1109/JBHI.2024.3411792. Online ahead of print.
ABSTRACT
Electrocardiogram (ECG) is acquired during Magnetic Resonance Imaging (MRI) to monitor patients and synchronize image acquisition with the heart motion. ECG signals are highly distorted during MRI due to the complex electromagnetic environment. Automated ECG analysis is therefore complicated in this context and there is no reference technique in MRI to classify pathological heartbeats. Imaging arrhythmic patients is hence difficult in MRI. Deep Learning based heartbeat classifier have been suggested but require large databases whereas existing annotated sets of ECG in MRI are very small. We proposed a Siamese network to leverage a large database of unannotated ECGs outside MRI. This was used to develop an efficient representation of ECG signals, further used to develop a heartbeat classifier. We extensively tested several data augmentations and self-supervised learning (SSL) techniques and assessed the generalization of the obtained classifier to ECG signals acquired in MRI. These augmentations included random noises and a model simulating MRI specific artefacts. SSL pretraining improved the generalizability of heartbeat classifiers in MRI (F1=0.75) compared to Deep Learning not relying on SSL (F1=0.46) and another classical machine learning approach (F1=0.40). These promising results seem to indicate that the use of SSL techniques can learn efficient ECG signal representation, and are useful for the development of Deep Learning models even when only scarce annotated medical data are available.
PMID:38857140 | DOI:10.1109/JBHI.2024.3411792
Time-frequency-space EEG decoding model based on dense graph convolutional network for stroke
IEEE J Biomed Health Inform. 2024 Jun 10;PP. doi: 10.1109/JBHI.2024.3411646. Online ahead of print.
ABSTRACT
Stroke, a sudden cerebrovascular ailment resulting from brain tissue damage, has prompted the use of motor imagery (MI)-based Brain-Computer Interface (BCI) systems in stroke rehabilitation. However, analyzing EEG signals from stroke patients is challenging because of their low signal-to-noise ratio and high variability. Therefore, we propose a novel approach that combines the modified S-transform (MST) and a dense graph convolutional network (DenseGCN) algorithm to enhance the MI-BCI performance across time, frequency, and space domains. MST is a time-frequency analysis method that efficiently concentrates energy in EEG signals, while DenseGCN is a deep learning model that uses EEG feature maps from each layer as inputs for subsequent layers, facilitating feature reuse and hyper-parameters optimization. Our approach outperforms conventional networks, achieving a peak classification accuracy of 90.22% and an average information transfer rate (ITR) of 68.52 bits per minute. Moreover, we conduct an in-depth analysis of the event-related desynchronization/event-related synchronization (ERD/ERS) phenomenon in the deep-level EEG features of stroke patients. Our experimental results confirm the feasibility and efficacy of the proposed approach for MI-BCI rehabilitation systems.
PMID:38857138 | DOI:10.1109/JBHI.2024.3411646
NDDepth: Normal-Distance Assisted Monocular Depth Estimation and Completion
IEEE Trans Pattern Anal Mach Intell. 2024 Jun 10;PP. doi: 10.1109/TPAMI.2024.3411571. Online ahead of print.
ABSTRACT
Over the past few years, monocular depth estimation and completion have been paid more and more attention from the computer vision community because of their widespread applications. In this paper, we introduce novel physics (geometry)-driven deep learning frameworks for these two tasks by assuming that 3D scenes are constituted with piece-wise planes. Instead of directly estimating the depth map or completing the sparse depth map, we propose to estimate the surface normal and plane-to-origin distance maps or complete the sparse surface normal and distance maps as intermediate outputs. To this end, we develop a normal-distance head that outputs pixel-level surface normal and distance. Afterthat, the surface normal and distance maps are regularized by a developed plane-aware consistency constraint, which are then transformed into depth maps. Furthermore, we integrate an additional depth head to strengthen the robustness of the proposed frameworks. Extensive experiments on the NYU-Depth-v2, KITTI and SUN RGB-D datasets demonstrate that our method exceeds in performance prior state-of-the-art monocular depth estimation and completion competitors.
PMID:38857129 | DOI:10.1109/TPAMI.2024.3411571
Prediction of Inter-residue Multiple Distances and Exploration of Protein Multiple Conformations by Deep Learning
IEEE/ACM Trans Comput Biol Bioinform. 2024 Jun 10;PP. doi: 10.1109/TCBB.2024.3411825. Online ahead of print.
ABSTRACT
AlphaFold2 has achieved a major breakthrough in end-to-end prediction for static protein structures. However, protein conformational change is considered to be a key factor in protein biological function. Inter-residue multiple distances prediction is of great significance for research on protein multiple conformations exploration. In this study, we proposed an inter-residue multiple distances prediction method, DeepMDisPre, based on an improved network which integrates triangle update, axial attention and ResNet to predict multiple distances of residue pairs. We built a dataset which contains proteins with a single structure and proteins with multiple conformations to train the network. We tested DeepMDisPre on 114 proteins with multiple conformations. The results show that the inter-residue distance distribution predicted by DeepMDisPre tends to have multiple peaks for flexible residue pairs than for rigid residue pairs. On two cases of proteins with multiple conformations, we modeled the multiple conformations relatively accurately by using the predicted inter-residue multiple distances. In addition, we also tested the performance of DeepMDisPre on 279 proteins with a single structure. Experimental results demonstrate that the average contact accuracy of DeepMDisPre is higher than that of the comparative method. In terms of static protein modeling, the average TM-score of the 3D models built by DeepMDisPre is also improved compared with the comparative method. The executable program is freely available at https://github.com/iobio-zjut/DeepMDisPre.
PMID:38857126 | DOI:10.1109/TCBB.2024.3411825
Machine-to-Machine Transfer Function in Deep Learning-Based Quantitative Ultrasound
IEEE Trans Ultrason Ferroelectr Freq Control. 2024 Jun;71(6):687-697. doi: 10.1109/TUFFC.2024.3384815.
ABSTRACT
A transfer function approach was recently demonstrated to mitigate data mismatches at the acquisition level for a single ultrasound scanner in deep learning (DL)-based quantitative ultrasound (QUS). As a natural progression, we further investigate the transfer function approach and introduce a machine-to-machine (M2M) transfer function, which possesses the ability to mitigate data mismatches at a machine level. This ability opens the door to unprecedented opportunities for reducing DL model development costs, enabling the combination of data from multiple sources or scanners, or facilitating the transfer of DL models between machines. We tested the proposed method utilizing a SonixOne machine and a Verasonics machine with an L9-4 array and an L11-5 array. We conducted two types of acquisitions to obtain calibration data: stable and free-hand, using two different calibration phantoms. Without the proposed method, the mean classification accuracy when applying a model on data acquired from one system to data acquired from another system was 50%, and the mean average area under the receiver operator characteristic (ROC) curve (AUC) was 0.405. With the proposed method, mean accuracy increased to 99%, and the AUC rose to the 0.999. Additional observations include the choice of the calibration phantom led to statistically significant changes in the performance of the proposed method. Moreover, robust implementation inspired by Wiener filtering provided an effective method for transferring the domain from one machine to another machine, and it can succeed using just a single calibration view. Lastly, the proposed method proved effective when a different transducer was used in the test machine.
PMID:38857123 | DOI:10.1109/TUFFC.2024.3384815
Ultralow-Power Single-Sensor-Based E-Nose System Powered by Duty Cycling and Deep Learning for Real-Time Gas Identification
ACS Sens. 2024 Jun 10. doi: 10.1021/acssensors.4c00471. Online ahead of print.
ABSTRACT
This study presents a novel, ultralow-power single-sensor-based electronic nose (e-nose) system for real-time gas identification, distinguishing itself from conventional sensor-array-based e-nose systems, whose power consumption and cost increase with the number of sensors. Our system employs a single metal oxide semiconductor (MOS) sensor built on a suspended 1D nanoheater, driven by duty cycling─characterized by repeated pulsed power inputs. The sensor's ultrafast thermal response, enabled by its small size, effectively decouples the effects of temperature and surface charge exchange on the MOS nanomaterial's conductivity. This provides distinct sensing signals that alternate between responses coupled with and decoupled from the thermally enhanced conductivity, all within a single time domain during duty cycling. The magnitude and ratio of these dual responses vary depending on the gas type and concentration, facilitating the early stage gas identification of five gas types within 30 s via a convolutional neural network (classification accuracy = 93.9%, concentration regression error = 19.8%). Additionally, the duty-cycling mode significantly reduces power consumption by up to 90%, lowering it to 160 μW to heat the sensor to 250 °C. Manufactured using only wafer-level batch microfabrication processes, this innovative e-nose system promises the facile implementation of battery-driven, long-term, and cost-effective IoT monitoring systems.
PMID:38857120 | DOI:10.1021/acssensors.4c00471
Data set terminology of deep learning in medicine: a historical review and recommendation
Jpn J Radiol. 2024 Jun 10. doi: 10.1007/s11604-024-01608-1. Online ahead of print.
ABSTRACT
Medicine and deep learning-based artificial intelligence (AI) engineering represent two distinct fields each with decades of published history. The current rapid convergence of deep learning and medicine has led to significant advancements, yet it has also introduced ambiguity regarding data set terms common to both fields, potentially leading to miscommunication and methodological discrepancies. This narrative review aims to give historical context for these terms, accentuate the importance of clarity when these terms are used in medical deep learning contexts, and offer solutions to mitigate misunderstandings by readers from either field. Through an examination of historical documents, including articles, writing guidelines, and textbooks, this review traces the divergent evolution of terms for data sets and their impact. Initially, the discordant interpretations of the word 'validation' in medical and AI contexts are explored. We then show that in the medical field as well, terms traditionally used in the deep learning domain are becoming more common, with the data for creating models referred to as the 'training set', the data for tuning of parameters referred to as the 'validation (or tuning) set', and the data for the evaluation of models as the 'test set'. Additionally, the test sets used for model evaluation are classified into internal (random splitting, cross-validation, and leave-one-out) sets and external (temporal and geographic) sets. This review then identifies often misunderstood terms and proposes pragmatic solutions to mitigate terminological confusion in the field of deep learning in medicine. We support the accurate and standardized description of these data sets and the explicit definition of data set splitting terminologies in each publication. These are crucial methods for demonstrating the robustness and generalizability of deep learning applications in medicine. This review aspires to enhance the precision of communication, thereby fostering more effective and transparent research methodologies in this interdisciplinary field.
PMID:38856878 | DOI:10.1007/s11604-024-01608-1
Continuous monitoring of left ventricular function in postoperative intensive care patients using artificial intelligence and transesophageal echocardiography
Intensive Care Med Exp. 2024 Jun 10;12(1):54. doi: 10.1186/s40635-024-00640-9.
ABSTRACT
BACKGROUND: Continuous monitoring of mitral annular plane systolic excursion (MAPSE) using transesophageal echocardiography (TEE) may improve the evaluation of left ventricular (LV) function in postoperative intensive care patients. We aimed to assess the utility of continuous monitoring of LV function using TEE and artificial intelligence (autoMAPSE) in postoperative intensive care patients.
METHODS: In this prospective observational study, we monitored 50 postoperative intensive care patients for 120 min immediately after cardiac surgery. We recorded a set of two-chamber and four-chamber TEE images every five minutes. We defined monitoring feasibility as how often the same wall from the same patient could be reassessed, and categorized monitoring feasibility as excellent if the same LV wall could be reassessed in ≥ 90% of the total recordings. To compare autoMAPSE with manual measurements, we rapidly recorded three sets of repeated images to assess precision (least significant change), bias, and limits of agreement (LOA). To assess the ability to identify changes (trending ability), we compared changes in autoMAPSE with the changes in manual measurements in images obtained during the initiation of cardiopulmonary bypass as well as before and after surgery.
RESULTS: Monitoring feasibility was excellent in most patients (88%). Compared with manual measurements, autoMAPSE was more precise (least significant change 2.2 vs 3.1 mm, P < 0.001), had low bias (0.4 mm), and acceptable agreement (LOA - 2.7 to 3.5 mm). AutoMAPSE had excellent trending ability, as its measurements changed in the same direction as manual measurements (concordance rate 96%).
CONCLUSION: Continuous monitoring of LV function was feasible using autoMAPSE. Compared with manual measurements, autoMAPSE had excellent trending ability, low bias, acceptable agreement, and was more precise.
PMID:38856861 | DOI:10.1186/s40635-024-00640-9
Machine learning-based classification of structured light modes under turbulence and eavesdropping effects
Appl Opt. 2024 Jun 1;63(16):4405-4413. doi: 10.1364/AO.520510.
ABSTRACT
This paper considers the classification of multiplexed structured light modes, aiming to bolster communication reliability and data transfer rates, particularly in challenging scenarios marked by turbulence and potential eavesdropping. An experimental free-space optic (FSO) system is established to transmit 16 modes [8-ary Laguerre Gaussian (LG) and 8-ary superposition LG (Mux-LG) mode patterns] over a 3-m FSO channel, accounting for interception threats and turbulence effects. To the best of authors' knowledge, this paper is the first to consider both factors concurrently. We propose four machine/deep learning algorithms-artificial neural network, support vector machine, 1D convolutional neural network, and 2D convolutional neural network-for classification purposes. By fusing the outputs of these methods, we achieve promising classification results exceeding 92%, 81%, and 69% in cases of weak, moderate, and strong turbulence, respectively. Structured light modes exhibit significant potential for a variety of real-world applications where reliable and high-capacity data transmission is crucial.
PMID:38856620 | DOI:10.1364/AO.520510
Estimation of modified Zernike coefficients from turbulence-degraded multispectral imagery using deep learning
Appl Opt. 2024 Jun 1;63(16):E28-E34. doi: 10.1364/AO.521072.
ABSTRACT
We investigate how wavelength diversity affects the performance of a deep-learning model that predicts the modified Zernike coefficients of turbulence-induced wavefront error from multispectral images. The ability to perform accurate predictions of the coefficients from images collected in turbulent conditions has potential applications in image restoration. The source images for this work were a point object and extended objects taken from a character-based dataset, and a wavelength-dependent simulation was developed that applies the effects of isoplanatic atmospheric turbulence to the images. The simulation utilizes a phase screen resampling technique to emulate the simultaneous collection of each band of a multispectral image through the same turbulence realization. Simulated image data were generated for the point and extended objects at various turbulence levels, and a deep neural network architecture based on AlexNet was used to predict the modified Zernike coefficients. Mean squared error results demonstrate a significant improvement in predicting modified Zernike coefficients for both the point object and extended objects as the number of spectral bands is increased. However, the improvement with the number of bands was limited when using extended objects with additive noise.
PMID:38856589 | DOI:10.1364/AO.521072
Res-U2Net: untrained deep learning for phase retrieval and image reconstruction
J Opt Soc Am A Opt Image Sci Vis. 2024 May 1;41(5):766-773. doi: 10.1364/JOSAA.511074.
ABSTRACT
Conventional deep learning-based image reconstruction methods require a large amount of training data, which can be hard to obtain in practice. Untrained deep learning methods overcome this limitation by training a network to invert a physical model of the image formation process. Here we present a novel, to our knowledge, untrained Res-U2Net model for phase retrieval. We use the extracted phase information to determine changes in an object's surface and generate a mesh representation of its 3D structure. We compare the performance of Res-U2Net phase retrieval against UNet and U2Net using images from the GDXRAY dataset.
PMID:38856563 | DOI:10.1364/JOSAA.511074
Unsupervised speckle denoising in digital holographic interferometry based on 4-f optical simulation integrated cycle-consistent generative adversarial network
Appl Opt. 2024 May 1;63(13):3557-3569. doi: 10.1364/AO.521701.
ABSTRACT
The speckle noise generated during digital holographic interferometry (DHI) is unavoidable and difficult to eliminate, thus reducing its accuracy. We propose a self-supervised deep-learning speckle denoising method using a cycle-consistent generative adversarial network to mitigate the effect of speckle noise. The proposed method integrates a 4-f optical speckle noise simulation module with a parameter generator. In addition, it uses an unpaired dataset for training to overcome the difficulty in obtaining noise-free images and paired data from experiments. The proposed method was tested on both simulated and experimental data, with results showing a 6.9% performance improvement compared with a conventional method and a 2.6% performance improvement compared with unsupervised deep learning in terms of the peak signal-to-noise ratio. Thus, the proposed method exhibits superior denoising performance and potential for DHI, being particularly suitable for processing large datasets.
PMID:38856541 | DOI:10.1364/AO.521701
Coded aperture compressive temporal imaging via unsupervised lightweight local-global networks with geometric characteristics
Appl Opt. 2024 May 20;63(15):4109-4117. doi: 10.1364/AO.510414.
ABSTRACT
Coded aperture compressive temporal imaging (CACTI) utilizes compressive sensing (CS) theory to compress three dimensional (3D) signals into 2D measurements for sampling in a single snapshot measurement, which in turn acquires high-dimensional (HD) visual signals. To solve the problems of low quality and slow runtime often encountered in reconstruction, deep learning has become the mainstream for signal reconstruction and has shown superior performance. Currently, however, impressive networks are typically supervised networks with large-sized models and require vast training sets that can be difficult to obtain or expensive. This limits their application in real optical imaging systems. In this paper, we propose a lightweight reconstruction network that recovers HD signals only from compressed measurements with noise and design a block consisting of convolution to extract and fuse local and global features, stacking multiple features to form a lightweight architecture. In addition, we also obtain unsupervised loss functions based on the geometric characteristics of the signal to guarantee the powerful generalization capability of the network in order to approximate the reconstruction process of real optical systems. Experimental results show that our proposed network significantly reduces the model size and not only has high performance in recovering dynamic scenes, but the unsupervised video reconstruction network can approximate its supervised version in terms of reconstruction performance.
PMID:38856504 | DOI:10.1364/AO.510414
Imaging through thick scattering media based on envelope-informed learning with a simulated training dataset
Appl Opt. 2024 May 20;63(15):4049-4056. doi: 10.1364/AO.521140.
ABSTRACT
Computational imaging faces significant challenges in dealing with multiple scattering through thick complex media. While deep learning has addressed some ill-posed problems in scattering imaging, its practical application is limited by the acquisition of the training dataset. In this study, the Gaussian-distributed envelope of the speckle image is employed to simulate the point spread function (PSF), and the training dataset is obtained by the convolution of the handwritten digits with the PSF. This approach reduces the requirement of time and conditions for constructing the training dataset and enables a neural network trained on this dataset to reconstruct objects obscured by an unknown scattering medium in real experiments. The quality of reconstructed objects is negatively correlated with the thickness of the scattering medium. Our proposed method provides a new way, to the best of our knowledge, to apply deep learning in scattering imaging by reducing the time needed for constructing the training dataset.
PMID:38856497 | DOI:10.1364/AO.521140
Adaptive noise-resilient deep learning for image reconstruction in multimode fiber scattering
Appl Opt. 2024 Apr 20;63(12):3003-3014. doi: 10.1364/AO.519285.
ABSTRACT
This research offers a comprehensive exploration of three pivotal aspects within the realm of fiber optics and piezoelectric materials. The study delves into the influence of voltage variation on piezoelectric displacement, examines the effects of bending multimode fiber (MMF) on data transmission, and scrutinizes the performance of an autoencoder in MMF image reconstruction with and without additional noise. To assess the impact of voltage variation on piezoelectric displacement, experiments were conducted by applying varying voltages to a piezoelectric material, meticulously measuring its radial displacement. The results revealed a notable increase in displacement with higher voltage, presenting implications for fiber stability and overall performance. Additionally, the investigation into the effects of bending MMF on data transmission highlighted that the bending process causes the fiber to become leaky and radiate power radially, potentially affecting data transmission. This crucial insight emphasizes the necessity for further research to optimize data transmission in practical fiber systems. Furthermore, the performance of an autoencoder model was evaluated using a dataset of MMF images, in diverse scenarios. The autoencoder exhibited impressive accuracy in reconstructing MMF images with high fidelity. The results underscore the significance of ongoing research in these domains, propelling advancements in fiber optic technology.
PMID:38856444 | DOI:10.1364/AO.519285
Identifying the twist factor of twisted partially coherent optical beams
J Opt Soc Am A Opt Image Sci Vis. 2024 Jun 1;41(6):1221-1228. doi: 10.1364/JOSAA.522975.
ABSTRACT
Twisted partially coherent light, characterized by its unique twist factor, offers novel control over the statistical properties of random light. However, the recognition of the twist factor remains a challenge due to the low coherence and the stochastic nature of the optical beam. This paper introduces a method for the recognition of twisted partially coherent beams by utilizing a circular aperture at the source plane. This aperture produces a characteristic hollow intensity structure due to the twist phase. A deep learning model is then trained to identify the twist factor of these beams based on this signature. The model, while simple in structure, effectively eliminates the need for complex optimization layers, streamlining the recognition process. This approach offers a promising solution for enhancing the detection of twisted light and paves the way for future research in this field.
PMID:38856440 | DOI:10.1364/JOSAA.522975