Drug-induced Adverse Events

Changing ideas in forestry: A comparison of concepts in Swedish and American forestry journals during the early twentieth and twenty-first centuries.
Changing ideas in forestry: A comparison of concepts in Swedish and American forestry journals during the early twentieth and twenty-first centuries.
Ambio. 2016 Feb;45 Suppl 2:74-86
Authors: Mårald E, Langston N, Sténs A, Moen J
Abstract
By combining digital humanities text-mining tools and a qualitative approach, we examine changing concepts in forestry journals in Sweden and the United States (US) in the early twentieth and early twenty-first centuries. Our first hypothesis is that foresters at the beginning of the twentieth century were more concerned with production and less concerned with ecology than foresters at the beginning of the twenty-first century. Our second hypothesis is that US foresters in the early twentieth century were less concerned with local site conditions than Swedish foresters. We find that early foresters in both countries had broader-and often ecologically focused-concerns than hypothesized. Ecological concerns in the forestry literature have increased, but in the Nordic countries, production concerns have increased as well. In both regions and both time periods, timber management is closely connected to concerns about governance and state power, but the forms that governance takes have changed.
PMID: 26744044 [PubMed - indexed for MEDLINE]
Symptom clusters in women with breast cancer: an analysis of data from social media and a research study.
Symptom clusters in women with breast cancer: an analysis of data from social media and a research study.
Qual Life Res. 2016 Mar;25(3):547-57
Authors: Marshall SA, Yang CC, Ping Q, Zhao M, Avis NE, Ip EH
Abstract
PURPOSE: User-generated content on social media sites, such as health-related online forums, offers researchers a tantalizing amount of information, but concerns regarding scientific application of such data remain. This paper compares and contrasts symptom cluster patterns derived from messages on a breast cancer forum with those from a symptom checklist completed by breast cancer survivors participating in a research study.
METHODS: Over 50,000 messages generated by 12,991 users of the breast cancer forum on MedHelp.org were transformed into a standard form and examined for the co-occurrence of 25 symptoms. The k-medoid clustering method was used to determine appropriate placement of symptoms within clusters. Findings were compared with a similar analysis of a symptom checklist administered to 653 breast cancer survivors participating in a research study.
RESULTS: The following clusters were identified using forum data: menopausal/psychological, pain/fatigue, gastrointestinal, and miscellaneous. Study data generated the clusters: menopausal, pain, fatigue/sleep/gastrointestinal, psychological, and increased weight/appetite. Although the clusters are somewhat different, many symptoms that clustered together in the social media analysis remained together in the analysis of the study participants. Density of connections between symptoms, as reflected by rates of co-occurrence and similarity, was higher in the study data.
CONCLUSIONS: The copious amount of data generated by social media outlets can augment findings from traditional data sources. When different sources of information are combined, areas of overlap and discrepancy can be detected, perhaps giving researchers a more accurate picture of reality. However, data derived from social media must be used carefully and with understanding of its limitations.
PMID: 26476836 [PubMed - indexed for MEDLINE]
Text analysis tools for identification of emerging topics and research gaps in conservation science.
Text analysis tools for identification of emerging topics and research gaps in conservation science.
Conserv Biol. 2015 Dec;29(6):1606-14
Authors: Westgate MJ, Barton PS, Pierson JC, Lindenmayer DB
Abstract
Keeping track of conceptual and methodological developments is a critical skill for research scientists, but this task is increasingly difficult due to the high rate of academic publication. As a crisis discipline, conservation science is particularly in need of tools that facilitate rapid yet insightful synthesis. We show how a common text-mining method (latent Dirichlet allocation, or topic modeling) and statistical tests familiar to ecologists (cluster analysis, regression, and network analysis) can be used to investigate trends and identify potential research gaps in the scientific literature. We tested these methods on the literature on ecological surrogates and indicators. Analysis of topic popularity within this corpus showed a strong emphasis on monitoring and management of fragmented ecosystems, while analysis of research gaps suggested a greater role for genetic surrogates and indicators. Our results show that automated text analysis methods need to be used with care, but can provide information that is complementary to that given by systematic reviews and meta-analyses, increasing scientists' capacity for research synthesis.
PMID: 26271213 [PubMed - indexed for MEDLINE]
Automated Neuroanatomical Relation Extraction: A Linguistically Motivated Approach with a PVT Connectivity Graph Case Study.
Automated Neuroanatomical Relation Extraction: A Linguistically Motivated Approach with a PVT Connectivity Graph Case Study.
Front Neuroinform. 2016;10:39
Authors: Gökdeniz E, Özgür A, Canbeyli R
Abstract
Identifying the relations among different regions of the brain is vital for a better understanding of how the brain functions. While a large number of studies have investigated the neuroanatomical and neurochemical connections among brain structures, their specific findings are found in publications scattered over a large number of years and different types of publications. Text mining techniques have provided the means to extract specific types of information from a large number of publications with the aim of presenting a larger, if not necessarily an exhaustive picture. By using natural language processing techniques, the present paper aims to identify connectivity relations among brain regions in general and relations relevant to the paraventricular nucleus of the thalamus (PVT) in particular. We introduce a linguistically motivated approach based on patterns defined over the constituency and dependency parse trees of sentences. Besides the presence of a relation between a pair of brain regions, the proposed method also identifies the directionality of the relation, which enables the creation and analysis of a directional brain region connectivity graph. The approach is evaluated over the manually annotated data sets of the WhiteText Project. In addition, as a case study, the method is applied to extract and analyze the connectivity graph of PVT, which is an important brain region that is considered to influence many functions ranging from arousal, motivation, and drug-seeking behavior to attention. The results of the PVT connectivity graph show that PVT may be a new target of research in mood assessment.
PMID: 27708573 [PubMed - in process]
Development and validation of algorithms for the detection of statin myopathy signals from electronic medical records.
Development and validation of algorithms for the detection of statin myopathy signals from electronic medical records.
Clin Pharmacol Ther. 2016 Oct 5;:
Authors: Chan SL, Tham MY, Tan SH, Loke C, Foo B, Fan Y, Ang PS, Brunham LR, Sung C
Abstract
The aim of this study was to develop and validate sensitive algorithms to detect hospitalized statin-induced myopathy (SIM) cases from electronic medical records (EMRs). We developed 4 algorithms on a training set of 31,211 patient records from a large tertiary hospital. We determined the performance of these algorithms against manually curated records. The best algorithm used a combination of elevated creatine kinase (>4x upper limit of normal), discharge summary, diagnosis, and absence of statin in discharge medications. This algorithm achieved a positive predictive value of 52-71% and a sensitivity of 72-78% on two validation sets of >30,000 records each. Using this algorithm, the incidence of SIM was estimated at 0.18%. This algorithm captured three times more rhabdomyolysis cases than spontaneous reports (95% vs. 30% of manually curated gold standard cases). Our results show the potential power of utilizing data and text mining of EMRs to enhance pharmacovigilance activities. This article is protected by copyright. All rights reserved.
PMID: 27706800 [PubMed - as supplied by publisher]
PubMedPortable: A Framework for Supporting the Development of Text Mining Applications.
PubMedPortable: A Framework for Supporting the Development of Text Mining Applications.
PLoS One. 2016;11(10):e0163794
Authors: Döring K, Grüning BA, Telukunta KK, Thomas P, Günther S
Abstract
Information extraction from biomedical literature is continuously growing in scope and importance. Many tools exist that perform named entity recognition, e.g. of proteins, chemical compounds, and diseases. Furthermore, several approaches deal with the extraction of relations between identified entities. The BioCreative community supports these developments with yearly open challenges, which led to a standardised XML text annotation format called BioC. PubMed provides access to the largest open biomedical literature repository, but there is no unified way of connecting its data to natural language processing tools. Therefore, an appropriate data environment is needed as a basis to combine different software solutions and to develop customised text mining applications. PubMedPortable builds a relational database and a full text index on PubMed citations. It can be applied either to the complete PubMed data set or an arbitrary subset of downloaded PubMed XML files. The software provides the infrastructure to combine stand-alone applications by exporting different data formats, e.g. BioC. The presented workflows show how to use PubMedPortable to retrieve, store, and analyse a disease-specific data set. The provided use cases are well documented in the PubMedPortable wiki. The open-source software library is small, easy to use, and scalable to the user's system requirements. It is freely available for Linux on the web at https://github.com/KerstenDoering/PubMedPortable and for other operating systems as a virtual container. The approach was tested extensively and applied successfully in several projects.
PMID: 27706202 [PubMed - in process]
The BEL information extraction workflow (BELIEF): evaluation in the BioCreative V BEL and IAT track.
The BEL information extraction workflow (BELIEF): evaluation in the BioCreative V BEL and IAT track.
Database (Oxford). 2016;2016:
Authors: Madan S, Hodapp S, Senger P, Ansari S, Szostak J, Hoeng J, Peitsch M, Fluck J
Abstract
Network-based approaches have become extremely important in systems biology to achieve a better understanding of biological mechanisms. For network representation, the Biological Expression Language (BEL) is well designed to collate findings from the scientific literature into biological network models. To facilitate encoding and biocuration of such findings in BEL, a BEL Information Extraction Workflow (BELIEF) was developed. BELIEF provides a web-based curation interface, the BELIEF Dashboard, that incorporates text mining techniques to support the biocurator in the generation of BEL networks. The underlying UIMA-based text mining pipeline (BELIEF Pipeline) uses several named entity recognition processes and relationship extraction methods to detect concepts and BEL relationships in literature. The BELIEF Dashboard allows easy curation of the automatically generated BEL statements and their context annotations. Resulting BEL statements and their context annotations can be syntactically and semantically verified to ensure consistency in the BEL network. In summary, the workflow supports experts in different stages of systems biology network building. Based on the BioCreative V BEL track evaluation, we show that the BELIEF Pipeline automatically extracts relationships with an F-score of 36.4% and fully correct statements can be obtained with an F-score of 30.8%. Participation in the BioCreative V Interactive task (IAT) track with BELIEF revealed a systems usability scale (SUS) of 67. Considering the complexity of the task for new users-learning BEL, working with a completely new interface, and performing complex curation-a score so close to the overall SUS average highlights the usability of BELIEF.Database URL: BELIEF is available at http://www.scaiview.com/belief/.
PMID: 27694210 [PubMed - in process]
Data Mining of Web-Based Documents on Social Networking Sites That Included Suicide-Related Words Among Korean Adolescents.
Data Mining of Web-Based Documents on Social Networking Sites That Included Suicide-Related Words Among Korean Adolescents.
J Adolesc Health. 2016 Sep 29;:
Authors: Song J, Song TM, Seo DC, Jin JH
Abstract
PURPOSE: To investigate online search activity of suicide-related words in South Korean adolescents through data mining of social media Web sites as the suicide rate in South Korea is one of the highest in the world.
METHODS: Out of more than 2.35 billion posts for 2 years from January 1, 2011 to December 31, 2012 on 163 social media Web sites in South Korea, 99,693 suicide-related documents were retrieved by Crawler and analyzed using text mining and opinion mining. These data were further combined with monthly employment rate, monthly rental prices index, monthly youth suicide rate, and monthly number of reported bully victims to fit multilevel models as well as structural equation models.
RESULTS: The link from grade pressure to suicide risk showed the largest standardized path coefficient (beta = .357, p < .001) in structural models and a significant random effect (p < .01) in multilevel models. Depression was a partial mediator between suicide risk and grade pressure, low body image, victims of bullying, and concerns about disease. The largest total effect was observed in the grade pressure to depression to suicide risk. The multilevel models indicate about 27% of the variance in the daily suicide-related word search activity is explained by month-to-month variations. A lower employment rate, a higher rental prices index, and more bullying were associated with an increased suicide-related word search activity.
CONCLUSIONS: Academic pressure appears to be the biggest contributor to Korean adolescents' suicide risk. Real-time suicide-related word search activity monitoring and response system needs to be developed.
PMID: 27693129 [PubMed - as supplied by publisher]
Sentiment prediction by text mining medical documents using optimized swarm search-based feature selection.
Sentiment prediction by text mining medical documents using optimized swarm search-based feature selection.
Comput Med Imaging Graph. 2016 Aug 5;:
Authors: Zeng D, Peng J, Fong S, Qiu Y, Wong R, Mon YJ
Abstract
Sentiment prediction emerged as an important machine learning topic to gain insights from unstructured texts, recently gained popularity in health-care industries. Text mining has long been a fundamental data analytic for sentiment prediction. A popular pre-processing step in text mining is transforming text strings to word vectors which form a high-dimensional sparse matrix. This sparse matrix poses computational challenges to induction of accurate sentiment prediction model. Feature selection has been a popular dimensionality reduction technique that finds a subset of features from all the original features from the sparse matrix, in order to enhance the accuracy of the prediction model. In this paper, a new feature selection method called Optimized Swarm Search-based Feature Selection (OSS-FS) is applied. OSS-FS is a swarm-type of searching function that selects an ideal subset of features for enhanced classification accuracy. The swarm search in OSS-FS is optimized by a simple feature evaluation technique called Clustering-by-Coefficient-of-Variation (CCV). The proposed scheme is applied and verified via a case scenario where 279 medical articles related to 'meaningful use functionalities on health care quality, safety, and efficiency' from a systematic review of the health IT literature from January 2010 to August 2013. A multi-class of sentiments, positive, mixed-positive, neutral and negative would have to be recognized from the document contents, by computer using text mining. The results show superiority of OSS-FS over the traditional feature selection methods. The proposed sentiment prediction model will be useful for estimating the sentiments of the readers from some medical literatures. Authors may gauge the potential sentiments of their articles before they get published out.
PMID: 27693005 [PubMed - as supplied by publisher]
SparkText: Biomedical Text Mining on Big Data Framework.
SparkText: Biomedical Text Mining on Big Data Framework.
PLoS One. 2016;11(9):e0162721
Authors: Ye Z, Tafti AP, He KY, Wang K, He MM
Abstract
BACKGROUND: Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment.
RESULTS: In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes.
CONCLUSIONS: This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research.
PMID: 27685652 [PubMed - as supplied by publisher]
Life priorities in the HIV-positive Asians: a text-mining analysis in young vs. old generation.
Life priorities in the HIV-positive Asians: a text-mining analysis in young vs. old generation.
AIDS Care. 2016 Aug 12;:1-4
Authors: Chen WT, Barbour R
Abstract
HIV/AIDS is one of the most urgent and challenging public health issues, especially since it is now considered a chronic disease. In this project, we used text mining techniques to extract meaningful words and word patterns from 45 transcribed in-depth interviews of people living with HIV/AIDS (PLWHA) conducted in Taipei, Beijing, Shanghai, and San Francisco from 2006 to 2013. Text mining analysis can predict whether an emerging field will become a long-lasting source of academic interest or whether it is simply a passing source of interest that will soon disappear. The data were analyzed by age group (45 and older vs. 44 and younger). The highest ranking fragments in the order of frequency were: "care", "daughter", "disease", "family", "HIV", "hospital", "husband", "medicines", "money", "people", "son", "tell/disclosure", "thought", "want", and "years". Participants in the 44-year-old and younger group were focused mainly on disease disclosure, their families, and their financial condition. In older PLWHA, social supports were one of the main concerns. In this study, we learned that different age groups perceive the disease differently. Therefore, when designing intervention, researchers should consider to tailor an intervention to a specific population and to help PLWHA achieve a better quality of life. Promoting self-management can be an effective strategy for every encounter with HIV-positive individuals.
PMID: 27684610 [PubMed - as supplied by publisher]
The Feasibility of Using Large-Scale Text Mining to Detect Adverse Childhood Experiences in a VA-Treated Population.
The Feasibility of Using Large-Scale Text Mining to Detect Adverse Childhood Experiences in a VA-Treated Population.
J Trauma Stress. 2015 Dec;28(6):505-14
Authors: Hammond KW, Ben-Ari AY, Laundry RJ, Boyko EJ, Samore MH
Abstract
Free text in electronic health records resists large-scale analysis. Text records facts of interest not found in encoded data, and text mining enables their retrieval and quantification. The U.S. Department of Veterans Affairs (VA) clinical data repository affords an opportunity to apply text-mining methodology to study clinical questions in large populations. To assess the feasibility of text mining, investigation of the relationship between exposure to adverse childhood experiences (ACEs) and recorded diagnoses was conducted among all VA-treated Gulf war veterans, utilizing all progress notes recorded from 2000-2011. Text processing extracted ACE exposures recorded among 44.7 million clinical notes belonging to 243,973 veterans. The relationship of ACE exposure to adult illnesses was analyzed using logistic regression. Bias considerations were assessed. ACE score was strongly associated with suicide attempts and serious mental disorders (ORs = 1.84 to 1.97), and less so with behaviorally mediated and somatic conditions (ORs = 1.02 to 1.36) per unit. Bias adjustments did not remove persistent associations between ACE score and most illnesses. Text mining to detect ACE exposure in a large population was feasible. Analysis of the relationship between ACE score and adult health conditions yielded patterns of association consistent with prior research.
PMID: 26579624 [PubMed - indexed for MEDLINE]
Automatic semantic classification of scientific literature according to the hallmarks of cancer.
Automatic semantic classification of scientific literature according to the hallmarks of cancer.
Bioinformatics. 2016 Feb 1;32(3):432-40
Authors: Baker S, Silins I, Guo Y, Ali I, Högberg J, Stenius U, Korhonen A
Abstract
MOTIVATION: The hallmarks of cancer have become highly influential in cancer research. They reduce the complexity of cancer into 10 principles (e.g. resisting cell death and sustaining proliferative signaling) that explain the biological capabilities acquired during the development of human tumors. Since new research depends crucially on existing knowledge, technology for semantic classification of scientific literature according to the hallmarks of cancer could greatly support literature review, knowledge discovery and applications in cancer research.
RESULTS: We present the first step toward the development of such technology. We introduce a corpus of 1499 PubMed abstracts annotated according to the scientific evidence they provide for the 10 currently known hallmarks of cancer. We use this corpus to train a system that classifies PubMed literature according to the hallmarks. The system uses supervised machine learning and rich features largely based on biomedical text mining. We report good performance in both intrinsic and extrinsic evaluations, demonstrating both the accuracy of the methodology and its potential in supporting practical cancer research. We discuss how this approach could be developed and applied further in the future.
AVAILABILITY AND IMPLEMENTATION: The corpus of hallmark-annotated PubMed abstracts and the software for classification are available at: http://www.cl.cam.ac.uk/∼sb895/HoC.html.
CONTACT: simon.baker@cl.cam.ac.uk.
PMID: 26454282 [PubMed - indexed for MEDLINE]
Expansion of medical vocabularies using distributional semantics on Japanese patient blogs.
Expansion of medical vocabularies using distributional semantics on Japanese patient blogs.
J Biomed Semantics. 2016;7(1):58
Authors: Ahltorp M, Skeppstedt M, Kitajima S, Henriksson A, Rzepka R, Araki K
Abstract
BACKGROUND: Research on medical vocabulary expansion from large corpora has primarily been conducted using text written in English or similar languages, due to a limited availability of large biomedical corpora in most languages. Medical vocabularies are, however, essential also for text mining from corpora written in other languages than English and belonging to a variety of medical genres. The aim of this study was therefore to evaluate medical vocabulary expansion using a corpus very different from those previously used, in terms of grammar and orthographics, as well as in terms of text genre. This was carried out by applying a method based on distributional semantics to the task of extracting medical vocabulary terms from a large corpus of Japanese patient blogs.
METHODS: Distributional properties of terms were modelled with random indexing, followed by agglomerative hierarchical clustering of 3 ×100 seed terms from existing vocabularies, belonging to three semantic categories: Medical Finding, Pharmaceutical Drug and Body Part. By automatically extracting unknown terms close to the centroids of the created clusters, candidates for new terms to include in the vocabulary were suggested. The method was evaluated for its ability to retrieve the remaining n terms in existing medical vocabularies.
RESULTS: Removing case particles and using a context window size of 1+1 was a successful strategy for Medical Finding and Pharmaceutical Drug, while retaining case particles and using a window size of 8+8 was better for Body Part. For a 10n long candidate list, the use of different cluster sizes affected the result for Pharmaceutical Drug, while the effect was only marginal for the other two categories. For a list of top n candidates for Body Part, however, clusters with a size of up to two terms were slightly more useful than larger clusters. For Pharmaceutical Drug, the best settings resulted in a recall of 25 % for a candidate list of top n terms and a recall of 68 % for top 10n. For a candidate list of top 10n candidates, the second best results were obtained for Medical Finding: a recall of 58 %, compared to 46 % for Body Part. Only taking the top n candidates into account, however, resulted in a recall of 23 % for Body Part, compared to 16 % for Medical Finding.
CONCLUSIONS: Different settings for corpus pre-processing, window sizes and cluster sizes were suitable for different semantic categories and for different lengths of candidate lists, showing the need to adapt parameters, not only to the language and text genre used, but also to the semantic category for which the vocabulary is to be expanded. The results show, however, that the investigated choices for pre-processing and parameter settings were successful, and that a Japanese blog corpus, which in many ways differs from those used in previous studies, can be a useful resource for medical vocabulary expansion.
PMID: 27671202 [PubMed - as supplied by publisher]
Using Text Analytics of AJPE Article Titles to Reveal Trends In Pharmacy Education Over the Past Two Decades.
Using Text Analytics of AJPE Article Titles to Reveal Trends In Pharmacy Education Over the Past Two Decades.
Am J Pharm Educ. 2016 Aug 25;80(6):104
Authors: Pedrami F, Asenso P, Devi S
Abstract
Objective. To identify trends in pharmacy education during last two decades using text mining. Methods. Articles published in the American Journal of Pharmaceutical Education (AJPE) in the past two decades were compiled in a database. Custom text analytics software was written using Visual Basic programming language in the Visual Basic for Applications (VBA) editor of Excel 2007. Frequency of words appearing in article titles was calculated using the custom VBA software. Data were analyzed to identify the emerging trends in pharmacy education. Results. Three educational trends emerged: active learning, interprofessional, and cultural competency. Conclusion. The text analytics program successfully identified trends in article topics and may be a useful compass to predict the future course of pharmacy education.
PMID: 27667841 [PubMed - in process]
The environment ontology in 2016: bridging domains with increased scope, semantic density, and interoperation.
The environment ontology in 2016: bridging domains with increased scope, semantic density, and interoperation.
J Biomed Semantics. 2016;7(1):57
Authors: Buttigieg PL, Pafilis E, Lewis SE, Schildhauer MP, Walls RL, Mungall CJ
Abstract
BACKGROUND: The Environment Ontology (ENVO; http://www.environmentontology.org/ ), first described in 2013, is a resource and research target for the semantically controlled description of environmental entities. The ontology's initial aim was the representation of the biomes, environmental features, and environmental materials pertinent to genomic and microbiome-related investigations. However, the need for environmental semantics is common to a multitude of fields, and ENVO's use has steadily grown since its initial description. We have thus expanded, enhanced, and generalised the ontology to support its increasingly diverse applications.
METHODS: We have updated our development suite to promote expressivity, consistency, and speed: we now develop ENVO in the Web Ontology Language (OWL) and employ templating methods to accelerate class creation. We have also taken steps to better align ENVO with the Open Biological and Biomedical Ontologies (OBO) Foundry principles and interoperate with existing OBO ontologies. Further, we applied text-mining approaches to extract habitat information from the Encyclopedia of Life and automatically create experimental habitat classes within ENVO.
RESULTS: Relative to its state in 2013, ENVO's content, scope, and implementation have been enhanced and much of its existing content revised for improved semantic representation. ENVO now offers representations of habitats, environmental processes, anthropogenic environments, and entities relevant to environmental health initiatives and the global Sustainable Development Agenda for 2030. Several branches of ENVO have been used to incubate and seed new ontologies in previously unrepresented domains such as food and agronomy. The current release version of the ontology, in OWL format, is available at http://purl.obolibrary.org/obo/envo.owl .
CONCLUSIONS: ENVO has been shaped into an ontology which bridges multiple domains including biomedicine, natural and anthropogenic ecology, 'omics, and socioeconomic development. Through continued interactions with our users and partners, particularly those performing data archiving and sythesis, we anticipate that ENVO's growth will accelerate in 2017. As always, we invite further contributions and collaboration to advance the semantic representation of the environment, ranging from geographic features and environmental materials, across habitats and ecosystems, to everyday objects in household settings.
PMID: 27664130 [PubMed - as supplied by publisher]
Identifying the Uncertainty in Physician Practice Location through Spatial Analytics and Text Mining.
Identifying the Uncertainty in Physician Practice Location through Spatial Analytics and Text Mining.
Int J Environ Res Public Health. 2016;13(9)
Authors: Shi X, Xue B, Xierali IM
Abstract
In response to the widespread concern about the adequacy, distribution, and disparity of access to a health care workforce, the correct identification of physicians' practice locations is critical to access public health services. In prior literature, little effort has been made to detect and resolve the uncertainty about whether the address provided by a physician in the survey is a practice address or a home address. This paper introduces how to identify the uncertainty in a physician's practice location through spatial analytics, text mining, and visual examination. While land use and zoning code, embedded within the parcel datasets, help to differentiate resident areas from other types, spatial analytics may have certain limitations in matching and comparing physician and parcel datasets with different uncertainty issues, which may lead to unforeseen results. Handling and matching the string components between physicians' addresses and the addresses of the parcels could identify the spatial uncertainty and instability to derive a more reasonable relationship between different datasets. Visual analytics and examination further help to clarify the undetectable patterns. This research will have a broader impact over federal and state initiatives and policies to address both insufficiency and maldistribution of a health care workforce to improve the accessibility to public health services.
PMID: 27657100 [PubMed - as supplied by publisher]
Extracting kinetic information from literature with KineticRE.
Extracting kinetic information from literature with KineticRE.
J Integr Bioinform. 2015;12(4):282
Authors: Freitas AA, Costa H, Rocha M, Rocha I
Abstract
To better understand the dynamic behavior of metabolic networks in a wide variety of conditions, the field of Systems Biology has increased its interest in the use of kinetic models. The different databases, available these days, do not contain enough data regarding this topic. Given that a significant part of the relevant information for the development of such models is still wide spread in the literature, it becomes essential to develop specific and powerful text mining tools to collect these data. In this context, this work has as main objective the development of a text mining tool to extract, from scientific literature, kinetic parameters, their respective values and their relations with enzymes and metabolites. The approach proposed integrates the development of a novel plug-in over the text mining framework @Note2. In the end, the pipeline developed was validated with a case study on Kluyveromyces lactis, spanning the analysis and results of 20 full text documents.
PMID: 26673933 [PubMed - indexed for MEDLINE]
RetroMine, or how to provide in-depth retrospective studies from Medline in a glance: the hepcidin use-case.
RetroMine, or how to provide in-depth retrospective studies from Medline in a glance: the hepcidin use-case.
J Integr Bioinform. 2015;12(3):275
Authors: Ameline de Cadeville B, Loréal O, Moussouni-Marzolf F
Abstract
The rapid expansion of biomedical literature has provoked an increased development of advanced text mining tools to rapidly extract relevant events from the continuously increasing amount of knowledge published periodically in PubMed. However, bioinvestigators are still reluctant to use these tools for two reasons: i) a large volume of events is often extracted upon a query, and this volume is hard to manage, and ii) background events dominate search results and overshadow more pertinent published information, especially for domain experts. In this paper, we propose an approach that incorporates the temporal dimension of published events to the process of information extraction to improve data selection and prioritize more pertinent periodically published knowledge for scientists. Indeed, instead of providing the total knowledge associated with a PubMed query, which is usually a mix of trivial background information and non-background information, we propose a method that incorporates time and selects non background and highly relevant biological entities and events published over time for bioinvestigators. Before excluding background events from the total knowledge extracted, a quantification of their amount is also provided. This work is illustrated by a case study regarding Hepcidin gene publications over a decade, a duration that is sufficiently long enough to generate alternative views on the overall data extracted.
PMID: 26673791 [PubMed - indexed for MEDLINE]
A corpus for plant-chemical relationships in the biomedical domain.
A corpus for plant-chemical relationships in the biomedical domain.
BMC Bioinformatics. 2016;17(1):386
Authors: Choi W, Kim B, Cho H, Lee D, Lee H
Abstract
BACKGROUND: Plants are natural products that humans consume in various ways including food and medicine. They have a long empirical history of treating diseases with relatively few side effects. Based on these strengths, many studies have been performed to verify the effectiveness of plants in treating diseases. It is crucial to understand the chemicals contained in plants because these chemicals can regulate activities of proteins that are key factors in causing diseases. With the accumulation of a large volume of biomedical literature in various databases such as PubMed, it is possible to automatically extract relationships between plants and chemicals in a large-scale way if we apply a text mining approach. A cornerstone of achieving this task is a corpus of relationships between plants and chemicals.
RESULTS: In this study, we first constructed a corpus for plant and chemical entities and for the relationships between them. The corpus contains 267 plant entities, 475 chemical entities, and 1,007 plant-chemical relationships (550 and 457 positive and negative relationships, respectively), which are drawn from 377 sentences in 245 PubMed abstracts. Inter-annotator agreement scores for the corpus among three annotators were measured. The simple percent agreement scores for entities and trigger words for the relationships were 99.6 and 94.8 %, respectively, and the overall kappa score for the classification of positive and negative relationships was 79.8 %. We also developed a rule-based model to automatically extract such plant-chemical relationships. When we evaluated the rule-based model using the corpus and randomly selected biomedical articles, overall F-scores of 68.0 and 61.8 % were achieved, respectively.
CONCLUSION: We expect that the corpus for plant-chemical relationships will be a useful resource for enhancing plant research. The corpus is available at http://combio.gist.ac.kr/plantchemicalcorpus .
PMID: 27650402 [PubMed - as supplied by publisher]