Literature Watch
Large-scale Cross-modality Search via Collective Matrix Factorization Hashing.
Large-scale Cross-modality Search via Collective Matrix Factorization Hashing.
IEEE Trans Image Process. 2016 Sep 8;
Authors: Ding G, Guo Y, Zhou J, Gao Y
Abstract
By transforming data into binary representation, i.e., Hashing, we can perform high-speed search with low storage cost, and thus Hashing has collected increasing research interest in the recent years. Recently, how to generate Hashcode for multimodal data (e.g., images with textual tags, documents with photos, etc) for large-scale cross-modality search (e.g., searching semantically related images in database for a document query) is an important research issue because of the fast growth of multimodal data in the Web. To address this issue, a novel framework for multimodal Hashing is proposed, termed as Collective Matrix Factorization Hashing (CMFH). The key idea of CMFH is to learn unified Hashcodes for different modalities of one multimodal instance in the shared latent semantic space in which different modalities can be effectively connected. Therefore, accurate cross-modality search is supported. Based on the general framework, we extend it in the unsupervised scenario where it tries to preserve the Euclidean structure, and in the supervised scenario where it fully exploits the label information of data. The corresponding theoretical analysis and the optimization algorithms are given. We conducted comprehensive experiments on three benchmark datasets for cross-modality search. The experimental results demonstrate that CMFH can significantly outperform several state-of-the-art cross-modality Hashing methods, which validates the effectiveness of the proposed CMFH.
PMID: 27623584 [PubMed - as supplied by publisher]
The pharmacogenomics of drug resistance to protein kinase inhibitors.
The pharmacogenomics of drug resistance to protein kinase inhibitors.
Drug Resist Updat. 2016 Sep;28:28-42
Authors: Gillis NK, McLeod HL
Abstract
Dysregulation of growth factor cell signaling is a major driver of most human cancers. This has led to development of numerous drugs targeting protein kinases, with demonstrated efficacy in the treatment of a wide spectrum of cancers. Despite their high initial response rates and survival benefits, the majority of patients eventually develop resistance to these targeted therapies. This review article discusses examples of established mechanisms of drug resistance to anticancer therapies, including drug target mutations or gene amplifications, emergence of alternate signaling pathways, and pharmacokinetic variation. This reveals a role for pharmacogenomic analysis to identify and monitor for resistance, with possible therapeutic strategies to combat chemoresistance.
PMID: 27620953 [PubMed - in process]
A Systems Biology Analysis Unfolds the Molecular Pathways and Networks of Two Proteobacteria in Spaceflight and Simulated Microgravity Conditions.
A Systems Biology Analysis Unfolds the Molecular Pathways and Networks of Two Proteobacteria in Spaceflight and Simulated Microgravity Conditions.
Astrobiology. 2016 Sep;16(9):677-689
Authors: Roy R, Shilpa PP, Bagh S
Abstract
Bacteria are important organisms for space missions due to their increased pathogenesis in microgravity that poses risks to the health of astronauts and for projected synthetic biology applications at the space station. We understand little about the effect, at the molecular systems level, of microgravity on bacteria, despite their significant incidence. In this study, we proposed a systems biology pipeline and performed an analysis on published gene expression data sets from multiple seminal studies on Pseudomonas aeruginosa and Salmonella enterica serovar Typhimurium under spaceflight and simulated microgravity conditions. By applying gene set enrichment analysis on the global gene expression data, we directly identified a large number of new, statistically significant cellular and metabolic pathways involved in response to microgravity. Alteration of metabolic pathways in microgravity has rarely been reported before, whereas in this analysis metabolic pathways are prevalent. Several of those pathways were found to be common across studies and species, indicating a common cellular response in microgravity. We clustered genes based on their expression patterns using consensus non-negative matrix factorization. The genes from different mathematically stable clusters showed protein-protein association networks with distinct biological functions, suggesting the plausible functional or regulatory network motifs in response to microgravity. The newly identified pathways and networks showed connection with increased survival of pathogens within macrophages, virulence, and antibiotic resistance in microgravity. Our work establishes a systems biology pipeline and provides an integrated insight into the effect of microgravity at the molecular systems level.
KEY WORDS: Systems biology-Microgravity-Pathways and networks-Bacteria. Astrobiology 16, 677-689.
PMID: 27623197 [PubMed - as supplied by publisher]
Origin of a folded repeat protein from an intrinsically disordered ancestor.
Origin of a folded repeat protein from an intrinsically disordered ancestor.
Elife. 2016 Sep 13;5
Authors: Zhu H, Sepulveda E, Hartmann MD, Kogenaru M, Ursinus A, Sulz E, Albrecht R, Coles M, Martin J, Lupas AN
Abstract
Repetitive proteins are thought to have arisen through the amplification of subdomain-sized peptides. Many of these originated in a non-repetitive context as cofactors of RNA-based replication and catalysis, and required the RNA to assume their active conformation. In search of the origins of one of the most widespread repeat protein families, the tetratricopeptide repeat (TPR), we identified several potential homologs of its repeated helical hairpin in non-repetitive proteins, including the putatively ancient ribosomal protein S20 (RPS20), which only becomes structured in the context of the ribosome. We evaluated the ability of the RPS20 hairpin to form a TPR fold by amplification and obtained structures identical to natural TPRs for variants with 2-5 point mutations per repeat. The mutations were neutral in the parent organism, suggesting that they could have been sampled in the course of evolution. TPRs could thus have plausibly arisen by amplification from an ancestral helical hairpin.
PMID: 27623012 [PubMed - as supplied by publisher]
A computational approach to map nucleosome positions and alternative chromatin states with base pair resolution.
A computational approach to map nucleosome positions and alternative chromatin states with base pair resolution.
Elife. 2016 Sep 13;5
Authors: Zhou X, Blocker AW, Airoldi EM, O'Shea EK
Abstract
Understanding chromatin function requires knowing the precise location of nucleosomes. MNase-seq methods have been widely applied to characterize nucleosome organization in vivo, but generally lack the accuracy to determine the precise nucleosome positions. Here we develop a computational approach leveraging digestion variability to determine nucleosome positions at base-pair resolution from MNase-seq data. We generate a variability template as a simple error model for how MNase digestion affects mapping of individual nucleosomes. Applied to both yeast and human cells, this analysis reveals that alternatively positioned nucleosomes are prevalent and create significant heterogeneity in a cell population. We show that the periodic occurrences of dinucleotide sequences relative to nucleosome dyads can be directly determined from genome-wide nucleosome positions from MNase-seq. Alternatively positioned nucleosomes near transcription start sites likely represent different states of promoter nucleosomes during transcription initiation. Our method can be applied to map nucleosome positions in diverse organisms at base-pair resolution.
PMID: 27623011 [PubMed - as supplied by publisher]
Plant-Derived Terpenes: A Feedstock for Specialty Biofuels.
Plant-Derived Terpenes: A Feedstock for Specialty Biofuels.
Trends Biotechnol. 2016 Sep 9;
Authors: Mewalal R, Rai DK, Kainer D, Chen F, Külheim C, Peter GF, Tuskan GA
Abstract
Research toward renewable and sustainable energy has identified specific terpenes capable of supplementing or replacing current petroleum-derived fuels. Despite being naturally produced and stored by many plants, there are few examples of commercial recovery of terpenes from plants because of low yields. Plant terpene biosynthesis is regulated at multiple levels, leading to wide variability in terpene content and chemistry. Advances in the plant molecular toolkit, including annotated genomes, high-throughput omics profiling, and genome editing, have begun to elucidate plant terpene metabolism, and such information is useful for bioengineering metabolic pathways for specific terpenes. We review here the status of terpenes as a specialty biofuel and discuss the potential of plants as a viable agronomic solution for future terpene-derived biofuels.
PMID: 27622303 [PubMed - as supplied by publisher]
Metabolic flux control in glycosylation.
Metabolic flux control in glycosylation.
Curr Opin Struct Biol. 2016 Sep 9;40:97-103
Authors: McDonald AG, Hayes JM, Davey GP
Abstract
Glycosylation is a common post-translational protein modification, in which glycans are built onto proteins through the sequential addition of monosaccharide units, in reactions catalysed by glycosyltransferases. Glycosylation influences the physicochemical and biological properties of proteins, with subsequent effects on subcellular and extracellular protein trafficking, cell-cell recognition, and ligand-receptor interactions. Glycan structures can be complex, as is the regulation of their biosynthesis, and it is only recently that the systems biology of metabolic flux control and glycosyltransferase networks has become a study in its own right. We review various models of glycosylation that have been proposed to date, based on current knowledge of Golgi structure and function, and consider how metabolic flux through glycosyltransferase networks regulates glycosylation events in the cell.
PMID: 27620650 [PubMed - as supplied by publisher]
Formal Derivation of Qualitative Dynamical Models from Biochemical Networks.
Formal Derivation of Qualitative Dynamical Models from Biochemical Networks.
Biosystems. 2016 Sep 9;
Authors: Abou-Jaoudé W, Thieffry D, Feret J
Abstract
As technological advances allow a better identification of cellular networks, large-scale molecular data are swiftly produced, allowing the construction of large and detailed molecular interaction maps. One approach to unravel the dynamical properties of such complex systems consists in deriving coarse-grained dynamical models from these maps, which would make the salient properties emerge. We present here a method to automatically derive such models, relying on the abstract interpretation framework to formally relate model behaviour at different levels of description. We illustrate our approach on two relevant case studies: the formation of a complex involving a protein adaptor, and a race between two competing biochemical reactions. States and traces of reaction networks are first abstracted by sampling the number of instances of chemical species within a finite set of intervals. We show that the qualitative models induced by this abstraction are too coarse to reproduce properties of interest. We then refine our approach by taking into account additional constraints, the mass invariants and the limiting resources for interval crossing, and by introducing information on the reaction kinetics. The resulting qualitative models are able to capture sophisticated properties of interest, such as a sequestration effect, which arise in the case studies and, more generally, participate in shaping the dynamics of cell signaling and regulatory networks. Our methodology offers new trade-offs between complexity and accuracy, and clarifies the implicit assumptions made in the process of qualitative modelling of biological networks.
PMID: 27619217 [PubMed - as supplied by publisher]
Retrieving relevant time-course experiments: a study on Arabidopsis microarrays.
Retrieving relevant time-course experiments: a study on Arabidopsis microarrays.
IET Syst Biol. 2016 Jun;10(3):87-93
Authors: Şener DD, Oğul H
Abstract
Understanding time-course regulation of genes in response to a stimulus is a major concern in current systems biology. The problem is usually approached by computational methods to model the gene behaviour or its networked interactions with the others by a set of latent parameters. The model parameters can be estimated through a meta-analysis of available data obtained from other relevant experiments. The key question here is how to find the relevant experiments which are potentially useful in analysing current data. In this study, the authors address this problem in the context of time-course gene expression experiments from an information retrieval perspective. To this end, they introduce a computational framework that takes a time-course experiment as a query and reports a list of relevant experiments retrieved from a given repository. These retrieved experiments can then be used to associate the environmental factors of query experiment with the findings previously reported. The model is tested using a set of time-course Arabidopsis microarrays. The experimental results show that relevant experiments can be successfully retrieved based on content similarity.
PMID: 27187987 [PubMed - indexed for MEDLINE]
Learning a hierarchical representation of the yeast transcriptomic machinery using an autoencoder model.
Learning a hierarchical representation of the yeast transcriptomic machinery using an autoencoder model.
BMC Bioinformatics. 2016;17 Suppl 1:9
Authors: Chen L, Cai C, Chen V, Lu X
Abstract
BACKGROUND: A living cell has a complex, hierarchically organized signaling system that encodes and assimilates diverse environmental and intracellular signals, and it further transmits signals that control cellular responses, including a tightly controlled transcriptional program. An important and yet challenging task in systems biology is to reconstruct cellular signaling system in a data-driven manner. In this study, we investigate the utility of deep hierarchical neural networks in learning and representing the hierarchical organization of yeast transcriptomic machinery.
RESULTS: We have designed a sparse autoencoder model consisting of a layer of observed variables and four layers of hidden variables. We applied the model to over a thousand of yeast microarrays to learn the encoding system of yeast transcriptomic machinery. After model selection, we evaluated whether the trained models captured biologically sensible information. We show that the latent variables in the first hidden layer correctly captured the signals of yeast transcription factors (TFs), obtaining a close to one-to-one mapping between latent variables and TFs. We further show that genes regulated by latent variables at higher hidden layers are often involved in a common biological process, and the hierarchical relationships between latent variables conform to existing knowledge. Finally, we show that information captured by the latent variables provide more abstract and concise representations of each microarray, enabling the identification of better separated clusters in comparison to gene-based representation.
CONCLUSIONS: Contemporary deep hierarchical latent variable models, such as the autoencoder, can be used to partially recover the organization of transcriptomic machinery.
PMID: 26818848 [PubMed - indexed for MEDLINE]
Weakly supervised learning of biomedical information extraction from curated data.
Weakly supervised learning of biomedical information extraction from curated data.
BMC Bioinformatics. 2016;17 Suppl 1:1
Authors: Jain S, Tumkur KR, Kuo TT, Bhargava S, Lin G, Hsu CN
Abstract
BACKGROUND: Numerous publicly available biomedical databases derive data by curating from literatures. The curated data can be useful as training examples for information extraction, but curated data usually lack the exact mentions and their locations in the text required for supervised machine learning. This paper describes a general approach to information extraction using curated data as training examples. The idea is to formulate the problem as cost-sensitive learning from noisy labels, where the cost is estimated by a committee of weak classifiers that consider both curated data and the text.
RESULTS: We test the idea on two information extraction tasks of Genome-Wide Association Studies (GWAS). The first task is to extract target phenotypes (diseases or traits) of a study and the second is to extract ethnicity backgrounds of study subjects for different stages (initial or replication). Experimental results show that our approach can achieve 87% of Precision-at-2 (P@2) for disease/trait extraction, and 0.83 of F1-Score for stage-ethnicity extraction, both outperforming their cost-insensitive baseline counterparts.
CONCLUSIONS: The results show that curated biomedical databases can potentially be reused as training examples to train information extractors without expert annotation or refinement, opening an unprecedented opportunity of using "big data" in biomedical text mining.
PMID: 26817711 [PubMed - indexed for MEDLINE]
Cell line name recognition in support of the identification of synthetic lethality in cancer from text.
Cell line name recognition in support of the identification of synthetic lethality in cancer from text.
Bioinformatics. 2016 Jan 15;32(2):276-82
Authors: Kaewphan S, Van Landeghem S, Ohta T, Van de Peer Y, Ginter F, Pyysalo S
Abstract
MOTIVATION: The recognition and normalization of cell line names in text is an important task in biomedical text mining research, facilitating for instance the identification of synthetically lethal genes from the literature. While several tools have previously been developed to address cell line recognition, it is unclear whether available systems can perform sufficiently well in realistic and broad-coverage applications such as extracting synthetically lethal genes from the cancer literature. In this study, we revisit the cell line name recognition task, evaluating both available systems and newly introduced methods on various resources to obtain a reliable tagger not tied to any specific subdomain. In support of this task, we introduce two text collections manually annotated for cell line names: the broad-coverage corpus Gellus and CLL, a focused target domain corpus.
RESULTS: We find that the best performance is achieved using NERsuite, a machine learning system based on Conditional Random Fields, trained on the Gellus corpus and supported with a dictionary of cell line names. The system achieves an F-score of 88.46% on the test set of Gellus and 85.98% on the independently annotated CLL corpus. It was further applied at large scale to 24 302 102 unannotated articles, resulting in the identification of 5 181 342 cell line mentions, normalized to 11 755 unique cell line database identifiers.
AVAILABILITY AND IMPLEMENTATION: The manually annotated datasets, the cell line dictionary, derived corpora, NERsuite models and the results of the large-scale run on unannotated texts are available under open licenses at http://turkunlp.github.io/Cell-line-recognition/.
CONTACT: sukaew@utu.fi.
PMID: 26428294 [PubMed - indexed for MEDLINE]
("orphan disease" OR "rare disease" OR "orphan diseases" OR "rare diseases"); +11 new citations
11 new pubmed citations were retrieved for your search. Click on the search hyperlink below to display the complete search results:
("orphan disease" OR "rare disease" OR "orphan diseases" OR "rare diseases")
These pubmed results were generated on 2016/09/13
PubMed comprises more than 24 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites.
Pharmacogenomics[Title/Abstract] AND ("2005/01/01"[PDAT] : "3000"[PDAT]); +12 new citations
12 new pubmed citations were retrieved for your search. Click on the search hyperlink below to display the complete search results:
Pharmacogenomics[Title/Abstract] AND ("2005/01/01"[PDAT] : "3000"[PDAT])
These pubmed results were generated on 2016/09/13
PubMed comprises more than 24 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites.
"Cystic Fibrosis"; +6 new citations
6 new pubmed citations were retrieved for your search. Click on the search hyperlink below to display the complete search results:
These pubmed results were generated on 2016/09/13
PubMed comprises more than 24 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites.
FoodWiki: a Mobile App Examines Side Effects of Food Additives Via Semantic Web.
FoodWiki: a Mobile App Examines Side Effects of Food Additives Via Semantic Web.
J Med Syst. 2016 Feb;40(2):41
Authors: Çelik Ertuğrul D
Abstract
In this article, a research project on mobile safe food consumption system (FoodWiki) is discussed that performs its own inferencing rules in its own knowledge base. Currently, the developed rules examines the side effects that are causing some health risks: heart disease, diabetes, allergy, and asthma as initial. There are thousands compounds added to the processed food by food producers with numerous effects on the food: to add color, stabilize, texturize, preserve, sweeten, thicken, add flavor, soften, emulsify, and so forth. Those commonly used ingredients or compounds in manufactured foods may have many side effects that cause several health risks such as heart disease, hypertension, cholesterol, asthma, diabetes, allergies, alzheimer etc. according to World Health Organization. Safety in food consumption, especially by patients in these risk groups, has become crucial, given that such health problems are ranked in the top ten health risks around the world. It is needed personal e-health knowledge base systems to help patients take control of their safe food consumption. The systems with advanced semantic knowledge base can provide recommendations of appropriate foods before consumption by individuals. The proposed FoodWiki system is using a concept based search mechanism that performs on thousands food compounds to provide more relevant information.
PMID: 26590979 [PubMed - indexed for MEDLINE]
D-VASim - An Interactive Virtual Laboratory Environment for the Simulation and Analysis of Genetic Circuits.
D-VASim - An Interactive Virtual Laboratory Environment for the Simulation and Analysis of Genetic Circuits.
Bioinformatics. 2016 Sep 11;
Authors: Baig H, Madsen J
Abstract
Simulation and behavioral analysis of genetic circuits is a standard approach of functional verification prior to their physical implementation. Many software tools have been developed to perform in silico analysis for this purpose, but none of them allow users to interact with the model during runtime. The runtime interaction gives the user a feeling of being in the lab performing a real world experiment. In this work, we present a user-friendly software tool named D-VASim (Dynamic Virtual Analyzer and Simulator), which provides a virtual laboratory environment to simulate and analyze the behavior of genetic logic circuit models represented in an SBML (Systems Biology Markup Language). Hence, SBML models developed in other software environments can be analyzed and simulated in D-VASim. D-VASim offers deterministic as well as stochastic simulation; and differs from other software tools by being able to extract and validate the Boolean logic from the SBML model. D-VASim is also capable of analyzing the threshold value and propagation delay of a genetic circuit model.
AVAILABILITY: D-VASim is available for Windows and Mac OS and can be downloaded from bda.compute.dtu.dk/downloads/ CONTACT: haba@dtu.dk, jama@dtu.dk.
PMID: 27616709 [PubMed - as supplied by publisher]
Biomarkers, Early Diagnosis, and Clinical Predictors of Bronchopulmonary Dysplasia.
Biomarkers, Early Diagnosis, and Clinical Predictors of Bronchopulmonary Dysplasia.
Clin Perinatol. 2015 Dec;42(4):739-54
Authors: Lal CV, Ambalavanan N
Abstract
The pathogenesis of bronchopulmonary dysplasia (BPD) is multifactorial, and the clinical phenotype of BPD is extremely variable. Several clinical and laboratory biomarkers have been proposed for the early identification of infants at higher risk of BPD and for determination of prognosis of infants with a diagnosis of BPD. The authors review available literature on prediction tools and biomarkers of BPD, using clinical variables and biomarkers based on imaging, lung function measures, and measurements of various analytes in different body fluids that have been determined to be associated with BPD either in a targeted manner or by unbiased omic profiling.
PMID: 26593076 [PubMed - indexed for MEDLINE]
Applicability of gene expression and systems biology to develop pharmacogenetic predictors; antipsychotic-induced extrapyramidal symptoms as an example.
Applicability of gene expression and systems biology to develop pharmacogenetic predictors; antipsychotic-induced extrapyramidal symptoms as an example.
Pharmacogenomics. 2015 Nov;16(17):1975-88
Authors: Mas S, Gassó P, Lafuente A
Abstract
Pharmacogenetics has been driven by a candidate gene approach. The disadvantage of this approach is that is limited by our current understanding of the mechanisms by which drugs act. Gene expression could help to elucidate the molecular signatures of antipsychotic treatments searching for dysregulated molecular pathways and the relationships between gene products, especially protein-protein interactions. To embrace the complexity of drug response, machine learning methods could help to identify gene-gene interactions and develop pharmacogenetic predictors of drug response. The present review summarizes the applicability of the topics presented here (gene expression, network analysis and gene-gene interactions) in pharmacogenetics. In order to achieve this, we present an example of identifying genetic predictors of extrapyramidal symptoms induced by antipsychotic.
PMID: 26556470 [PubMed - indexed for MEDLINE]
Tracking the Dynamic Relationship between Cellular Systems and Extracellular Subproteomes in Pseudomonas aeruginosa Biofilms.
Tracking the Dynamic Relationship between Cellular Systems and Extracellular Subproteomes in Pseudomonas aeruginosa Biofilms.
J Proteome Res. 2015 Nov 6;14(11):4524-37
Authors: Park AJ, Murphy K, Surette MD, Bandoro C, Krieger JR, Taylor P, Khursigara CM
Abstract
The transition of the opportunistic pathogen Pseudomonas aeruginosa from free-living bacteria into surface-associated biofilm communities represents a viable target for the prevention and treatment of chronic infectious disease. We have established a proteomics platform that identified 2443 and 1142 high-confidence proteins in P. aeruginosa whole cells and outer-membrane vesicles (OMVs), respectively, at three time points during biofilm development (ProteomeXchange identifier PXD002605). The analysis of cellular systems, specifically the phenazine biosynthetic pathway, demonstrates that whole-cell protein abundance correlates to end product (i.e., pyocyanin) concentrations in biofilm but not in planktonic cultures. Furthermore, increased cellular protein abundance in this pathway results in quantifiable pyocyanin in early biofilm OMVs and OMVs from both growth modes isolated at later time points. Overall, our data indicate that the OMVs being released from the surface of the biofilm whole cells have unique proteomes in comparison to their planktonic counterparts. The relative abundance of OMV proteins from various subcellular sources showed considerable differences between the two growth modes over time, supporting the existence and preferential activation of multiple OMV biogenesis mechanisms under different conditions. The consistent detection of cytoplasmic proteins in all of the OMV subproteomes challenges the notion that OMVs are composed of outer membrane and periplasmic proteins alone. Direct comparisons of outer-membrane protein abundance levels between OMVs and whole cells shows ratios that vary greatly from 1:1 and supports previous studies that advocate the specific inclusion, or "packaging", of proteins into OMVs. The quantitative analysis of packaged protein groups suggests biogenesis mechanisms that involve untethered, rather than absent, peptidoglycan-binding proteins. Collectively, individual protein and biological system analyses of biofilm OMVs show that drug-binding cytoplasmic proteins and porins are potentially shuttled from the whole cell into the OMVs and may contribute to the antibiotic resistance of P. aeruginosa whole cells within biofilms.
PMID: 26378716 [PubMed - indexed for MEDLINE]
Pages
